AI systems have become widely used by companies worldwide, ranging from chatbots and their respective Large Language Models (LLMs) to self-driven vehicles. Adversaries choose to deliberately affect the functionality of these AI systems by “poisoning” them and test their defense. In their recent work, Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (NIST.AI.100-2), scientists from the National Institute of Standards and Technology (NIST) have revealed the types of attacks that companies might expect to encounter and how to mitigate them.
NIST has made efforts to provide an “overview of attack techniques and methodologies” that encompasses all types of AI systems. Depending on the attacker’s goals and capabilities, this report identified the four most common types of attacks: evasion, poisoning, privacy attacks (that affect both Predictive and Generative AI models), and abuse attacks (that affect only Generative AI models).
Evasion Attacks
So what are evasion attacks? Well, in the context of artificial intelligence, they refer to a class of attacks that are aimed at deceiving ML models by manipulating testing data in subtle ways, which are often imperceptible to humans but can eventually lead the model to produce incorrect outputs. These types of attacks are typically designed to exploit vulnerabilities in the model’s decision-making process, allowing an attacker to add commands that cause it to behave unexpectedly. NIST classifies evasion attacks into two categories: white-box and black-box evasion attacks.
First let’s see what we understand by white-box attacks: this is the type where the attackers usually have in-depth knowledge of the internal workings, parameters, and architecture of the targeted AI model. Black-box attacks, on the other hand, are based on a realistic adversarial model where the attacker doesn’t have first-hand knowledge of the ML model’s architecture or the training data used.
By using small perturbations such as details added to images or physically realizable attacks, the attacker might be able to generate new details such as glasses or facial features to alter physical appearance. In some cases, the physical alterations can come in the form of road sign alterations to perturb or evade object detectors.
Considering the many types of defenses against evasion attacks, NIST narrows them down and places them into three main classes based on resilience and mitigation efficiency: adversarial training, randomized smoothing, and formal verification. However, there are trade-offs when it comes to these mitigations, between robustness and accuracy, and also increased computational costs required for training the ML models to resist evasion attacks.
Poisoning Attacks
When it comes to poisoning attacks, attackers attempt and sometimes succeed to manipulate the training data of machine learning models to compromise their performance. They inject malicious data during the model’s initial training phase, which leads to skewed predictions in real-world applications.
Poisoning attacks can cause a lot of harm to ML models in the form of either an availability violation or an integrity violation. In availability poisoning, the attacks cause indiscriminate sample degradation of the machine learning model on all samples, whereas targeted and backdoor poisoning attacks appear to be much stealthier by only affecting a small set of samples that the attackers are targeting. Poisoning attacks include an extensive range of adversarial capabilities, such as data poisoning, model poisoning, label control, source code control, and test data control, which eventually lead to several other subcategories of attacks.
One common type of poisoning attack is data poisoning, where by injecting misleading or biased data points during the model’s training phase, attackers can alter the model’s decision boundaries, causing it to misclassify inputs or exhibit unexpected behavior during inference. Other attackers prefer to exploit vulnerabilities in the model’s architecture, fooling it into making incorrect predictions. The different categories of poisoning attacks each have different mitigation methods, being more or less successful and still being a subject for research; more details could be found in the mentioned NIST article.
These attacks can have severe consequences, especially in critical applications such as healthcare, finance and autonomous vehicles. If successful, attackers can manipulate the model’s behavior, leading to incorrect predictions and potentially causing harm.
Privacy Attacks
So what exactly are privacy attacks in AI systems? In simple words, they refer to malicious techniques that are used to exploit vulnerabilities in AI systems with the end goal being to compromise users' data. These attacks can include data reconstruction, membership inference, model extraction, and property inference.
- Data reconstruction attacks are used to reverse engineer private information about an individual user record or sensitive critical infrastructure data based on the access to aggregate information.
- Membership inference attacks determine whether a particular record was included in the dataset used for training a model, compromising user information.
- When it comes to model extraction, attackers target sensitive information about individuals by extracting information about an ML model such as its architecture or model parameters.
- Finally, in property inference attacks, attackers attempt to access global information about the distribution of training data by interacting with an ML model. For example, an attacker could ascertain the fraction of the training set that has a certain sensitive attribute, such as demographic information. This attack could reveal confidential information about the training set that is not meant to be released yet.
All these attacks pose significant threats to individuals' privacy by exploiting AI’s ability to process vast amounts of data and make decisions based on it.
This time, NIST recommends using Differential Privacy mechanisms, setting limitations on user queries to the model, better detection of queries outside the model training and improving robust architectures. There’s also a different approach that can be employed: machine unlearning. This technique can be used by a user to request the removal of certain data and information from a trained ML model. To ensure the security of their data and privacy, companies should look to reduce or eliminate vulnerabilities to avoid loss of information, decreased productivity, fines and penalties, damaged reputation and loss of income.
Abuse Attacks
Abuse violations occur when attackers decide to repurpose a GenAI system’s intended use to fit their own agenda. For example, attackers can use the capabilities of GenAI models to promote hate speech or discrimination, generate media that incites violence against specific groups, or can create and share images, text, or malicious code that can lead to a cyberattack.
This type of attack poses significant risks across various domains, but most of them focus on cybersecurity, finance, transportation, customer service and healthcare. For example, AI-driven email phishing attacks can lead to unauthorized access, data breaches, and financial losses. The personalized and targeted nature of these attacks makes them particularly dangerous, as they can bypass traditional email filters and fool even vigilant users.
In order to mitigate these threats, methods like reinforcement learning from human feedback, filtering retrieved inputs, use of an LLM moderator or of interpretability-based solutions can be used. All of these specific measures should be taken in the larger context where data scientists and engineers need to implement robust security measures, such as rigorous testing, adversarial training, and continual monitoring of AI systems for signs of exploitation or manipulation.
Conclusion
The ever-growing landscape of artificial intelligence requires robust mitigation strategies to safeguard against evolving cybersecurity threats. As AI models become increasingly integrated into critical systems, organizations need to adopt a multi-faceted approach that encompasses both proactive and reactive measures. Implementing rigorous authentication protocols, encryption mechanisms, and continuous monitoring can increase the resilience of AI models. Additionally, fostering a cybersecurity culture that prioritizes awareness and education among stakeholders is crucial.
It’s necessary to make sure industry, academia, and regulatory bodies collaborate in order to establish standardized best practices and frameworks, ensuring a collective defense against emerging threats. As the cyber threat landscape continues to evolve, a dynamic and proactive stance in developing and implementing mitigation strategies will be pivotal in preserving the integrity and security of AI models.
Lumenova AI can help your organization identify and mitigate AI related risks, as well as can support to navigate the complexities of AI deployment and keep data, systems and networks secure. You can request a demo or contact our AI experts for more details.