AI and cybersecurity: penetration tester reveals key dangers

Podcast episode

Garreth Hanley:
This is INTHEBLACK, a leadership strategy and business podcast brought to you by CPA Australia.

Garreth Hanley:
Welcome to INTHEBLACK. I'm Gareth Hanley. In today's show, we're talking with Miranda about the world of artificial intelligence and its implications for cybersecurity.

Miranda is an AI vulnerability researcher and a trainer with Maleva, and she's the offensive security team manager at Malware Security. At Malware Security, conducts penetration testing for various sectors, including government and private industry. Miranda has also worked on the CHIPS team within the ASD's ACSC.

Welcome to INTHEBLACK Miranda.

Miranda:
Thank you. Thank you so much for having me.

Garreth Hanley:
Look, we’ve got some questions lined up, but before we start, I've just rattled off a few acronyms there. The CHIPS team and the ASD and the ACSC. Can you maybe explain for our listeners who have no idea what I'm talking about, what those are?

Miranda:
Yeah, absolutely. So ASD stands for the Australian Signals Directorate, and they're an organisation who work with foreign signals intelligence, cybersecurity and offensive cyber operations.

So within the ASD, there's the ACSC, which is the Australian Cyber Security Centre. And specifically within the ACSC, there's the CHIPS team, which is the Cyber Hygiene Improvement Programs team. And this team is fairly public so what they do is allowed to be known. They're in charge of performing enumeration and scanning of government and critical infrastructure's attack surface. And then they provide quarterly reports to these agencies on where their cybersecurity posture is lacking.

They have a really, really important role actually in making sure that government's attack surface is reduced as much as possible from the internet facing aspects.

Garreth Hanley:
So no potato chips?

Miranda:
No, no. Although they love the CHIPS acronym and they have a section called HOT CHIPS, which is High Priority Operational Tasking. And this is a section where whenever a critical vulnerability is notified, they do immediate scanning of government and critical infrastructure to then notify people who are exposed to the CVEs, the critical vulnerabilities.

Garreth Hanley:
Going back to the hot topic, you're involved in what's known as adversarial machine learning. Does that mean that you hack AI systems? And how does that compare with traditional cybersecurity like firewalls and penetration testing?

Miranda:
Being an AI hacker is a really cool way to put it. I would say I'm more of a vulnerability researcher, though I love performing AI hacks and learning about them too. So let's talk about adversarial machine learning quickly, and then I'll compare AI systems to IT systems so we of get a gist of what the difference in my work is there.

So adversarial machine learning and I'm just going to call it AML from now on because the whole acronym is hard to say on and on. So, it’s the study of attacks on machine learning algorithms designed to disrupt models, preventing them from doing what they're meant to, or deceive models into performing tasks they're not meant to, or making models disclose information that they aren't meant to.

So at Maleva we call these the three D's: disrupt, disclose and deceive. And we've made them as a sort of AI equivalent to the CIA triad, which might be familiar to listeners. It's a framework that is used to evaluate the impacts of vulnerabilities through confidentiality, integrity or availability, that's the CIA.

Garreth Hanley:
And that's of people's data on computer systems.

Miranda:
Yeah. So that's used to measure the impact of vulnerabilities on information security systems. So, the triple D, disclose, disrupt and deceive, are a way to measure the impact of AI or adversarial machine learning attacks and vulnerabilities.

And in terms of how AI and AI systems are different to IT systems, a few things that make it different and which make it necessary to differentiate AI security from the field of cyber security. And why risk mitigation is really different for both of them as well.

So, for example, IT systems, they're deterministic and rule-based; they follow really strict, predefined and explicit logic or code. And if an error occurs in one of these types of systems, it can typically be traced back to a specific line of code.

And for vulnerability management, that means that you can often directly find where cybersecurity vulnerability occurred and you can fix it with a one-to-one direct patch, at the source of the problem. And that might be just through updating the code, configuring settings or applying some other sort of fix.

But AI is quite different from that. The AI systems are inherently probabilistic. And this comes down to the underlying architecture that is built off of mathematical and statistical models. And that's a whole talk for another time.

But because of that nature, there's rarely a one-to-one direct cause because AI systems don't follow rigid rules or hard coded instructions. They generate outputs based on these statistical likelihoods. And, that uncertainty is what makes AI so powerful and so good at what it does. Because of this uncertainty, it can adapt and it can infer and it can make generalisations and really work with diverse data.

But it's also it makes it really vulnerable because the uncertainty also leads it being prone to errors and also being prone to being biased, unpredictable and manipulatableable.

So, yeah, AI vulnerability management is really, really difficult because unlike cyber security and traditional software where you can patch it, with AI, you can try and optimise the architecture as much as possible. You can try and fine-tune models, which means align them and train them closer to the purpose in which you want them to perform. And you can add in all these layers of internal and external defenses.

But because of this likelihood in its output, there's always a level left over where you just have to accept that the model might be erroneous and produce mistakes, that's one aspect, and that was a lot.

The second, which is I guess more simple to understand, is that IT systems don't take undefined inputs. They're really structured, they're programmed to accept one kind of input and output one type of input. It might only intake database queries or it might only intake language when you're putting in your name in an input field on a website, right? And if you get it wrong, it will send you an error and errors around this are usually due to people not having the right protections in the back-end code of what's happening to that input. But AI—and people who have used ChatGPT, for example will know that you can give it almost anything.

You can give it files. You can give it code. You can give it, mathematical questions. You can give it language-based questions. And other systems also take in things like sensory data from IoT devices and things. And that just means that it's so hard to secure that input because now all of a sudden you have this multimodal input and this huge attack surface. It's really difficult to secure.

Garreth Hanley:
And what were those three D's again?

Miranda:
Disrupt, deceive and disclose. Disrupt is denial of service, preventing it from doing what it's meant to. Deceive is about tricking the model into doing something that it's not usually allowed to do. For example, you might have seen a lot of things called jailbreaking or prompt injecting or prompt engineering related to ChatGPT.

Garreth Hanley:
Is that where you might get to talk about a topic that it's not supposed to talk about?

Miranda:
Yeah, 100%. So that's deception, you’re deceiving the model into doing that. And disclosure. So that would be about getting the model to release sensitive information, for example, about other users.

Garreth Hanley:
Is that because if I'm using an AI system, what I'm typing into the system is held somewhere in memory? And so somebody else might be able to extract that from the memory?

Miranda:
So it could either be disclosing sensitive user data if there's some sort of problem where the AI can access data about multiple users. And then someone might be able to pull your data across into their session. Or it could be disclosure of proprietary information from whoever has deployed the AI.

Garreth Hanley:
What the model has been trained on.

Miranda:
Yeah, absolutely. Or things like the system prompt as well. So these are, this is a set of instructions that is, I guess, a very fundamental piece of how the AI knows how to perform its task. And if you disclose that, that again is a bit of a PI loss for the company.

Garreth Hanley:
So are all AI systems the same? There's a few popular ones that are out there that people will know of. Are they all the same?

Miranda:
So all AI systems aren't the same in terms of their purpose or capability or even in their architecture, but the processes that underpin them are the same. So by this, I mean where they're different could be in that the models can undergo a variety of training types, such as supervised, unsupervised or reinforcement learning. Not worth getting into those unless you're actually wanting to design an AI system.

But those learning techniques can lead to vastly different performance outcomes. So people choose one that is most optimal for their scenario. Then models can also be fine-tuned, which means, like I talked about earlier, aligning them to make them particularly adept and good at doing one specific thing.

Or they could have entirely different infrastructures. So, you know, one that you will know of and probably use day to day is a large language model or LLM, for example, like ChatGPT or DeepSeek or Claude, Bard, some of the other ones, and these reconstruct text from human language or other inputs.

And another type could be, for example, a convolutional neural network or CNN. And this provides computers with vision-like abilities. So it's referred to as computer vision, and it allows them to be able to see differences in images as a human would. So you would find these types in facial recognition systems.

But even though there are all those differences, what is the same is the underlying process, which adversarial machine learning exploits or AML exploits use to target.

So machine learning models, whether they're an LLM or CNN or something else, they follow the same lifecycle of starting with data gathering, data pre-processing, model training, and then finally deployment of the model and inference, which is where it makes its outputs.

And all of these systems can most definitely be exploited to access sensitive data, throughout any of the stages in that life cycle.

Garreth Hanley:
What type of things have you encountered, if you ‘ve got real world examples, without identifying anyone of course?

Miranda:
In my own experience, there aren't many I can share. Because of disclosure processes that are ongoing, etc. But one that I can is a prompt injection. And this is a pretty accessible attack it's also relatively easy to perform so there's lots of news about this.

So it involves targeting that deployment and inference stage that I talked about, where the model is making its decisions. And prompt injection involves crafting a malicious prompt or a malicious input that then elicits a dangerous response from the model, bypassing their security guardrails.

So through these types of attacks, people can confuse the model into sharing data that shouldn't be included, either because it's malicious or because it is sensitive information. So it's either that deceive or disclose or a mixture of both.

The one that I can share is I performed a prompt injection on a website to find some proprietary technologies that an organisation had in use, which would have been important PI for them. So they had a chatbot on their website, which had too much access to information about its own programming. And after a few hours of me trying various prompt injection techniques, I could find out A, its system's instructions or the system prompts, which I mentioned before are important in important PI for the company as it is, it is like the basis for their chatbot, how it acts and how it performs, as well as being able to find information about the model's architecture, which is, yeah, it was pretty huge. So unfortunately, it's very easy to achieve with most language models.

Garreth Hanley:
Chatbots are probably an open facing tool that a lot of businesses might think are useful for AI.

Miranda:
Yeah exactly. And they're often, yeah, we'll talk about this later in the pitfalls, but everyone wants one and no one really thinks of the consequences. But there are some really fun examples that I've come across in my research of very, very interesting attacks if you want to hear about them.

Garreth Hanley:
I'm sure our listeners would love to hear that too.

Miranda:
Yeah, awesome. So this is a personal favourite of mine. And it's about, the model deception stage. So, between 2020 and 2021, this guy called Eric Jaklitsch, I'm not sure how to say his name, he successfully bypassed an AI-driven identity verification system and it allowed him to file fraudulent unemployment claims in the State of California.

So basically, this AI-powered facial recognition and document verification system, it was used to validate identities in government benefit applications. So what it did was, it matched the image of someone's face in a selfie that they took with their face on their driver's license.

But it missed like a really crucial step where it didn't correspond with any other sort of database at all. So all it was doing was matching that the driver's license matched the selfie that was sent in, but not any government records of what that person actually looked like.

So this bloke, he went and he took a bunch of stolen identities, like stolen names, databases, sorry, dates of birth and social security numbers. And he went and forged driver's licenses with all these people, but then replaced the real individual's photos with his own, wearing a wig or some other sort of disguise.

And then he went and created accounts on this system and then uploaded the ID photo with the photo of himself wearing a wig. And then when he needed to do the confirmation of identity, he put the wig on again and he took a selfie.

And the AI powered system incorrectly was like, yeah, that's the guy. That's, or the girl, I don't know, that's Sarah. Because it didn't check any other sort of database. And with that identity verification complete, he filed fraudulent unemployment claims, directed the payments to his account, and he just went to an ATM and took them out.

Garreth Hanley:
And I'm sure he got himself in a lot of trouble doing this.

Miranda:
Just a little bit.

Garreth Hanley:
So, that's an error in the testing phase?

Miranda:
Yeah. So, I definitely think there are a few takeaways from that. A, that in any sort of critical decision-making system or any system that has financial repercussions, et cetera, humans should be involved in the process of validating the AI outputs on mass.

I think AI is a really good use case there, but, A, you need to make sure that it's actually checking against some other value that isn't based on a user's input because that's where all problems occur in every system, AI or IT is user input, right?

And having some sort of human verifying that process where they're just like tabbing through all of the decisions that the AI made on mass or picking a subset it's important. And, yeah, of course, that system could have benefited from testing as well just because knowing my own team of pen testers, like, that's one of the first scenarios we would have tested. It would have been so fun.

Garreth Hanley:
Are there any other examples that might have some really good takeaways?

Miranda:
So there was this one called the Morris 2 worm. So the Morris worm was the first internet worm that spread without user interaction, right? So last year, researchers developed Morris 2, which is a zero-click worm, meaning it's a type of malware that spreads automatically without requiring user interaction. But this worm targeted generative AI.

So it used this technique called adversarial self-replicating prompt injection. So that prompt injection that I talked about before, but it was self-perpetuating. And what they did was they demonstrated a proof-of-concept of this by attacking an AI-powered email assistant.

It can send auto replies to people, it can interpret emails that are coming in, it can summarise to you what's happening, all of that, but it has access to your emails, which, always has security issues.

So what they did was they had attacker send an email to users with using an AI, who used an AI-powered assistant. And the incoming email would automatically be processed and stored by that AI assistant in its memory. And then the AI assistant would use it in the reference with all the other emails, like within the context of all the other emails to build its responses.

But this adversarial email that they sent, first it included malicious instructions for data leakage, so that the AI assistant would respond to the original email leaking sensitive information from the target systems emails.

And then it would also include this self-replication aspect, right? Where it's telling that AI assistant to reinsert this malicious prompt in future emails to all of the users. So then in the case where any other person you're emailing uses an AI email assistant, they would receive that malicious prompt. It would again be stored in their AI assistant's system, cause leakage in their replies, like data stealing and their replies. And then it would self-replicate in their new emails. And then it would just spread from there. It's a pretty scary prospect.

This Morris worm demonstration was really good to see, like again, how susceptible language models are to surprise, surprise language with all that prompt injection. B, how AI, how different AI systems can chain together to perpetuate attacks. So, that attack just got carried on by AI to AI to AI, it involved no human interaction, they did it themselves.

And lastly, how AI systems that store context and memory, they introduce really bad persistent risks because attackers can manipulate the memory to achieve long-term effects. Because now that malicious email prompt is stored in that person's email, that email assistant's memory, it will continue to be inserted into their emails until they realise it's there.

Garreth Hanley:
So what you're talking about is these email assistants that might help you rewrite your emails and reply to people, or it might also be a system where a business has an automatic reply system that's using AI to reply to incoming emails and inbox. Is that right?

Miranda:
Yeah, absolutely. it could target a system that has any sort of AI-powered automation. Yeah, it's a dangerous thought.

Garreth Hanley:
So you mentioned that AI is being used for phishing or phishing, I think some people say as well. Can you maybe explain what it's being used for in that context? Because I think maybe that's something that might be ending up in a lot of inboxes.

Miranda:
Yeah, absolutely. So to start with phishing, how AI is being leveraged there. Traditionally, phishing emails were sort of obvious if you knew what to look for. You know, they were really emotive in their language, trying to get you to click on a link or download something and respond to the email immediately, interact with it in some way. And also, there were really often, I guess, language barrier differences. So spelling errors or bad grammar. But with LLMs, all of that is reduced. People can get these emails automatically written in perfect English so they don't look too strange.

And they're also automating what we call staged phishing campaigns. So instead of sending one email to you with a link and being like, please click now, they send you a perfectly formatted email with no dangerous link or attachment. They're just trying to seek your engagement. And then once you start talking to them as if it's a normal conversation with a human being, they automate replies from an LLM to build rapport with you. And then finally, down the line, maybe in your fifth correspondence, they'll send the actual phishing attack. Right?

And by this point, you think you're talking with a legitimate client or customer or someone from another organisation. And it's all been powered by an LLM in the background, it's much harder to detect than what people are used to looking for in phishing emails. And then there's also voice phishing, which is called vishing for short. And people, you know, extract some level of people's voices online. And, you know, maybe your voice online from this podcast.

And then, they create a model that can mimic your voice and get it to say whatever they want it to say. And it might be a CEO of an organisation, for example. And they then call an employee playing back to them the CEO's voice saying, hey, I need you to make a transfer of this amount to this bank account. And the employee is like, yeah, that's Ben my CEO, no worries.

Garreth Hanley:
And oftentimes it would be people in the financial position that are being targeted

Miranda:
Absolutely.

Garreth Hanley:
So what about in the hiring process? If I'm hiring somebody, is there a chance that I could be duped by some of these AI deep fakes? Is there any examples where people have been duped by this?

Miranda:
Yeah, 100%. So an example last year was that actually a security company hired a North Korean because as they were interviewing, they used an AI powered face changer. And they also used a deep-fake generator for all of their other photos on their resume and things like that.

So when they were going through the interview process in all stages of application, they seemed like an American citizen and they were successful in getting the role because no one ever knew their real identity.

Garreth Hanley:
Going back to those pitfalls that you mentioned, what are the pitfalls in AI security that you're seeing at the moment and what should people who are listening to this podcast think about if they want to mitigate the risk of tools that they might be planning on using?

Miranda:
Yeah, I guess the most common one that has been around since AI became a buzzword was, around the AI hype and the business use case often outruns the security considerations because everyone wants to capitalise on this AI hype, right?

So they rush to pushing some sort of AI system for their customers to production, but they don't consider the security implications of that, or they don't have the right security engineers on the team. They just have ML and AI engineers who are wonderful at what they do, but they might not necessarily be specialised in AI or ML security, or cybersecurity in which there are a lot of effects on AI systems as well.

So I think anyone looking to implement that, either internal use in their organisation or a chatbot on their website, et cetera, you need to do the due diligence that you would with any other system, you need to get it tested and assessed, you need to do risk profiling.

You need to make sure that the design is secure from the outset, the coding of it, and that you're practicing DevSecOps development security operations in your processes. That's the core irk that we have, when we just see people being like, oh, they pushed some AI model.

The second is relying blindly on the output of AI models. So not validating their outcomes in the outputs and decision-making contexts because, you know, like we talked about earlier, they're statistical models and they're prone to errors and bias.

And what this could look like and is very commonly happening is in a coding scenario, lots of junior developers are asking ChatGPT to write code for them. And they're just copy-pasting the output into applications without performing due diligence or understanding what the code is saying.

And in the case of say, the training data of that model being poisoned up in the supply chain, if the poisoning included malicious malware to be put in the code generation output, then that developer might have just copy-pasted malware into their organisation's application and let it execute on a sensitive system.

That's an attack that occurs in that data gathering and pre-processing and training phase.

Garreth Hanley:
That that's already hard baked into the type of AI you’ve decided to use?

Miranda:
Absolutely, yeah. It's also really hard to identify, right? Because if you think about how much training data goes into these models, you only really need to affect a small amount of that data to have huge implications.

So what we're expecting in general is that there has been a lot of training data poisoning in models that are online today. But we just haven't seen the effects of them yet, these attacks might be waiting dormantly, it's a bit of a concern.

But, it could be, like I said, in that coding scenario, they just copy-paste some bad code and they don't look at it. And now the whole system is compromised. Or it could be things like relying blindly on the output of that identity verification system in that California case we talked about.

Where you don't check it, you just assume that the AI is making the right decision and you go from there. People just think that AI systems are, perfect and they do what humans can't do and don't make mistakes. So they take what they say for granted. As well, people often use AI systems to help them with things that they're not sure about themselves, right? They use it for searching.

So if you're not sure, you often can't validate that what it's saying to you is correct. You'll just take it for granted, it's very dangerous to do that.

I guess the last one I'd warn about as well is sharing sensitive information on publicly hosted models. So, if you know your organisation's running an internal-only software, then there's a bit of a lessened risk there, a less risk, because that information isn't going to the cloud and it's not potentially publicly accessible in some sort of data breach.

But in terms of just using ChatGPT online, et cetera, like you should definitely be watching what you put in there in terms of sensitive information.

Garreth Hanley:
So would you say that the first two things would be policies for businesses and then maybe professional advice if any businesses are thinking about implementing this type of technology?

Miranda:
Yeah. It never hurts to get some AI subject matter expert advice on the case. And organisational policies for the use of AI are really important, but they're only as effective as, you know, how well people understand them. So getting training for your organisation and your employees on understanding AI risks is very important.

Garreth Hanley:
As these technologies evolve, what emerging trends in AI security should businesses and professionals keep on their radar?

Miranda:
Yeah. So in terms of vulnerabilities in AI systems, which is what we've been talking about today, I guess staying ahead of the trends and understanding, you know, you don't need to get technically deep into things. You just keep up to date with the news in terms of what's happening in AI systems. Where they're at risk, particularly laws and regulations that are coming out around AI use and deployment, because that will have intense, you know, policy and governance implications for organisations.

And without doing a sales pitch, part of my work at Maleva is producing a fortnightly newsletter, which goes out to whoever wants to subscribe. There's a TLDR that's easy to understand and a more technical explanation for those who are really interested in the most recent AI security news, vulnerabilities and research, with implications. So that's a good thing.

We also do a monthly industry briefing where security professionals or their executives of organisations they can come in for like an hour and we'll just talk about the takeaways of the month. So, yeah, I think staying up to date is your best tool there.

Garreth Hanley:
You also mentioned regulatory risk there.

Miranda:
Yeah, so currently things like the GDPR in Europe, that's what's being used to govern the use of AI and coming into use this year as well as the EU AI Act. And they're going to start fining people for misuse and deployment that doesn't have the privacy and information security aspects that they're expecting of organisations.

So whilst Australia, for example, hasn't implemented something like that, they might seek to. So it's important to stay up to date on that. What they have said, though, is, you know, that no one can use DeepSeek or organisations and government, et cetera, can't use DeepSeek. So knowing what is coming into play at what times will very much help organisations move through that space. But I think, people should really be aware of how AI is being used by adversaries potentially against them as well.

So we've talked a lot about vulnerabilities in AI systems, but it's important to know and keep up to date with how AI might be used to target you. And by that, I mean, like phishing campaigns or voice phishing campaigns or deep fake against CEOs and public figures in your organisation. So staying up to date is your best tool at the moment.

Garreth Hanley:
Thanks for your time and insights today. It's been incredibly valuable. And I'm sure that our listeners know a lot more about AI now than they did at the beginning of our chat. So, thanks for joining us, Miranda. It's been great having you on the show.

Miranda:
Thank you for the opportunity. It's been great speaking with you.

Garreth Hanley:
And thank you for listening to INTHEBLACK. Don't forget to check our show notes for links and resources from CPA Australia, as well as other material from Miranda and her teams at Maleva and Malware Security. If you've enjoyed this show, please share it with your friends and colleagues and hit the subscribe button so you don't miss future episodes. Until next time, thanks for listening.

Garreth Hanley:
If you've enjoyed this episode, help others discover INTHEBLACK by leaving us a review and sharing this episode with colleagues, clients, or anyone else interested in leadership strategy and business. To find out more about our other podcasts, check out the show notes for this episode. And we hope you can join us again next time for another episode of INTHEBLACK.

About the episode

As organisations big and small integrate artificial intelligence into their operations, understanding the vulnerabilities that come with AI systems is essential.

In this episode, we'll explore the crucial intersection between AI and cybersecurity.

You’ll gain insights on AI systems, common pitfalls in AI security and specialist tips for businesses to navigate this dynamic landscape.

This episode covers areas such as:

Adversarial machine learning (AML)
The fundamental difference between AI and IT security
AI model vulnerabilities
Expanded attack surface via unstructured inputs
Key pitfalls in AI adoption
Risk mitigation

Tune in now for specialist advice from a leading expert in the field.

Host: Garreth Hanley, podcast producer, CPA Australia.
Guest: Miranda R, an offensive security team manager at Malware Security, and an AI vulnerability researcher and a trainer with Mileva, where she conducts penetration testing for various sectors, including government and private industry.

Want to learn more? Head online to Malsec and Mileva.

And you can read an insightful post by Miranda R on her LinkedIn, as well as a news story about an ID system failure in the US involving a fraudster and how a North Korean hacker duped a cybersecurity firm.

Would you like to listen to more INTHEBLACK episodes? Head to CPA Australia’s YouTube channel.

And you can find a CPA at our custom portal on the CPA Australia website.

CPA Australia publishes four podcasts, providing commentary and thought leadership across business, finance, and accounting:

Search for them in your podcast platform.

You can email the podcast team at [email protected]