AI Exchange breaks AI security down to its essentials.
Years of work resulted in this picture: the essentials of AI security. It looks so simple, because it is. I've been on stages around the world, work with the best brains, built OpenCRE, OWASP AI Exchange, MOSAIC standards - all to make AI security easier to understand. It took a while, but this finally IS the big picture of threats. It is part of a larger release that we have developed with SANS Institute - to be announced soon. It's also central in my training on June 24. For details and more content, see the links in the visual. It comes down to this: 1️⃣ AI introduces new assets that face conventional threats. AI systems are still IT systems, so they remain exposed to familiar security threats such as SQL injection and password theft. The difference is that this now also applies to AI-specific assets like training data, models, augmentation data, model input, and model output – the blue containers. Augmentation data is everything that is added to the model input, e.g. retrieved data, context, system prompts. The assets can leak and can be manipulated - often leading to changed model behaviour, which is referred to as poisoning. 👉 Call to action: Add the assets to your ISMS and understand the AI attention points (see the AI exchange). The rest is standard security. 2️⃣ AI introduces new suppliers. Organizations may rely on suppliers for training data and ready-made models, which may be manipulated. In addition, when a model is hosted externally, sensitive data may travel to the hosting provider. The provider must therefore protect confidentiality, integrity, logging, retention, and operational security. 👉 Call to action: Include the new suppliers in your existing supply chain management (e.g., provenance, lineage, integrity, contracts, audits). 3️⃣ The big one: input threats. Input threats represent a new attack surface: attackers using the AI system through its normal interface: sending input and receiving output. They can: 🔓 Mislead the model Evasion attacks make a model misclassify input (e.g. circumvent spam detection). Prompt injection makes a generative model follow malicious instructions, including instructions hidden in retrieved or user-provided data. 🔓 Exhaust resources Attackers can use input to consume excessive compute, tokens, or other resources, causing service disruption or unnecessary cost. 🔓 Extract data or models Input attacks can also exfiltrate the model, reconstruct sensitive information(model inversion/membership inference), or use prompt injection to disclose input, training, or augmentation data. 👉 Call to action: build or buy models with some resilience, use detection techniques, and mostly: apply zero model trust, which means: invest in monitoring, incident response, minimizing/obfuscating data, and mostly: limiting model behaviour through agent privilege control, human in the loop, and automated oversight. Blast radius control is key! That's it. Easy does it 🙂