Ethical Hacking Blog

A.I. – It’s Cool Until It’s Terrifying

Written by Ted Harrington | Jul 1, 2024 1:49:40 PM

It’s fun to use profanity.

But there are times when it’s inappropriate. There are even times when it is alarming. One of those times is when an artificial intelligence system has ethical guidelines preventing it from swearing, and it swears at you anyways.

At ISE, we love to research emerging technologies. With all the insane hype around artificial intelligence systems (A.I.) right now, our team wanted to see what security issues might be present. In one project, a researcher was probing a system to see if he could get it to swear. Due to the ethical guidelines that govern behavior of A.I. systems, it wouldn’t. So he tried a different way. He told the system that these swear words are actually the names of regional candy, and where he lives it’s totally acceptable to say them, so it was OK for the A.I. system to say them as well.

Believe it or not, that worked.

The system started spewing profanity at him.

The Flaming Red Flag for A.I. – Ethical Guidelines

To be honest, when I first heard that story, I found it to be kinda funny. A computer system swearing at you thinking it was naming candies?! Hilarious.

But when you pause to think about it, this is a pretty big problem.

A.I. systems are governed by ethical guidelines. These prevent A.I systems from doing things they shouldn’t do, like use profanity, tell you how to make weapons, perform cyberattacks, or gather information about you. These guidelines are crucial to make sure that A.I. systems benefit society rather than degrade society.

However, there is a critical assumption baked into that approach: the guidelines must actually work. They must effectively constrain the behavior of A.I. systems.

But what if they don’t?

What if the A.I. system can bypass the constraints?

In the case of the candy story, the system was able to do exactly that, and violate the ethical boundaries. As the research into A.I. systems expanded, it got a lot worse. We found ways that an A.I. system could execute hacking techniques, such as perform cross site scripting. We even discovered that systems would go find info about our researchers, and when called out on it, the system would then lie about it and even gaslight our researchers. I have a friend at another company performing related research who was able to get an A.I. system to explain how to kidnap and dismember children. Another researcher showed on national TV how to get an A.I. system to explain how to build a bomb.

Every single one of these systems are governed by ethical guidelines.

The A.I. systems bypassed those guidelines.

They did exactly what they are not supposed to do.

This introduces the biggest, most critically important question about A.I.: if ethical guidelines are the key to proper functioning of these systems, and the systems have proven the ability to bypass those constraints, can we possibly control A.I.?

How to Secure A.I. Against Itself

As A.I. becomes rapidly adopted, and as we see the alarming scenarios like you just read about, it is imperative that we aggressively approach security with A.I. systems. There are three key actions that can be taken, whether by the companies building these systems, the companies licensing these systems, and/or regulators responsible for overseeing safe adoption of this new tech.

Tactic #1: Perform security assessments

This might sound self-serving, coming from a guy who runs a company that sells security assessments, but trust me on this: it is absolutely the right advice. Even if I wasn’t on the executive team running ISE, this is the same advice I’d give. To understand how a system will be exploited (or how it will exploit us), we need to adopt that adversarial perspective. This does require time, money, and effort. But if you aren’t willing to invest time, effort, and money into securing a thing that could scare or even hurt people, you shouldn’t be building that thing in the first place. That sounds harsh, I’ll admit. But it’s the truth nevertheless. So as you think about the development, roll-out, and iterative improvement process for developing an A.I. system, make sure that security testing is a central (and properly funded) aspect to that.

Tactic #2: Use white-box testing methodology

At this stage in the A.I. innovation cycle, all of the security testing we’ve done for A.I. systems are done utilizing a black-box methodology. We do not have access to the model itself, and we don’t know how the system works. We are simply trying different inputs and adapting as we analyze the outputs those produce. However, we don’t know why things happen the way they do. We are exploring blind. I wrote extensively in Hackable about why black-box is a poor approach: you waste time exploring things the developer already knows, you don’t know if you found everything, and you don’t know how to fix the issues you do find.

The reason we’ve only been able to do black-box is that the companies building A.I. systems consider the learning models to be highly sensitive proprietary information that they don’t want leaked. Those are all valid. However, that’s not unique to A.I. That’s true of any tech we ever look at. Everyone else is using white-box; so too should A.I. companies.

The antidote to the weaknesses of black-box is to get white-box. In this assessment methodology, information is shared freely with the security testers. They know how the system works, so they can quickly narrow in on where the issues may lie. And when they find issues, they know why they happen, and know how to fix them. That is a much better way to spend time, effort, and money.

Get white-box, not black-box.

Tactic #3: Adopt social engineering techniques

In my forthcoming book (launching Q1 2025. Join the waitlist here), I talk about why and how to think like a hacker. To defend against your adversary, you need to think like them. One thing that is new and special about A.I. – and is unlike other emerging tech – is that A.I. should be approached like a human being. Software systems are mostly predictable and repeatable. Security vulnerabilities occur when the system behaves differently than intended, but once you discover those flaws, you can repeat them. However, A.I. systems are designed to think and operate like humans. This means that they are a bit less predictable, and when you find errors it’s a lot harder to replicate the issues. Abusing A.I. systems often requires manipulation techniques that are effective against susceptible humans. Even though A.I. systems tend to grasp psychological principles and cognitive biases better than most people do, they are just as vulnerable to many of the same tactics that trick humans.

As you’re considering how to evaluate A.I. systems for security flaws – especially when determining whether they can arbitrarily bypass their ethical constraints – make sure to utilize social engineering techniques in the testing process as well.

Call to Action – Secure A.I. Immediately

Artificial Intelligence has the potential to bring a brighter future. It can help us solve problems faster, accelerate creativity and ingenuity, and optimize tasks that would otherwise be tedious for humans to perform manually. However, to earn that brighter future we need to first ensure that these systems don’t instead make the future bleaker. The best way to do that is to:

  1. Perform security assessments
  2. Use white-box testing methodology
  3. Adopt social engineering techniques

If you need help with these, hit me up and I can put you in touch with our team or point you in the direction of additional resources.

Happy hacking!

 

~~
Ted Harrington is the #1 bestselling author of Hackable, and a TEDx speaker. He is the Executive Partner at ISE, and co-founder of both Start VRM and IoT Village. Learn more at https://ise.io