● 53 entries
AI & ML Security
- Adaptive AttackAn attack on a machine-learning system that is specifically designed to evade or break a known defence, instead of using a generic, defence-agnostic technique.
- Adversarial ExampleAn input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
- Agentic AI SecurityThe discipline of securing autonomous LLM agents that plan, call tools, and act on real-world systems, where prompt injection turns into remote code execution and excessive agency into actual blast radius.
- AI AlignmentThe research and engineering effort to ensure AI systems pursue goals, follow instructions, and behave in ways that match the intentions of their developers and users.
- AI Bill of Materials (AIBOM)A machine-readable inventory of every component that goes into an AI system — datasets, base models, fine-tuning data, libraries, prompts, and evaluation artifacts — used for security, compliance, and accountability.
- AI Content DetectionTools and techniques that estimate whether a piece of text, image, audio, or video was produced by an AI model rather than a human.
- AI GovernanceThe policies, processes, roles, and controls organisations and regulators use to ensure AI systems are developed, deployed, and operated responsibly and lawfully.
- AI HallucinationA failure mode in which a generative AI system outputs content that is fluent and confident but factually wrong, fabricated, or unsupported by its sources.
- AI Incident ResponseThe set of processes, roles, and playbooks an organisation uses to detect, contain, investigate, communicate, and recover from incidents involving AI systems.
- AI JailbreakA technique that causes an aligned AI model to bypass its safety policies and produce content or behaviour the operator intended to forbid.
- AI Model CardA standardised document, introduced by Margaret Mitchell and colleagues in 2018, that describes a machine-learning model's intended use, training data, performance, limitations, and ethical considerations.
- AI Red TeamA specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.
- AI SafetyThe discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.
- AI Supply Chain RiskThe set of threats arising from the third-party datasets, base models, libraries, plug-ins, and infrastructure that organisations combine to build and deploy AI systems.
- AI WatermarkingTechniques that embed a detectable signal into AI-generated content so its provenance, model of origin, or training-set membership can be verified later.
- AI-Generated DisinformationFalse or misleading content produced or amplified by generative AI to deceive audiences, manipulate opinion, or influence elections, markets, or conflicts.
- AI-Generated MalwareMalicious code written, mutated, or assisted by large language models, lowering the skill bar for attackers and accelerating variant production.
- Backdoor Attack (ML)A training-time attack that implants a hidden behaviour in a model so it acts normally on clean inputs but produces an attacker-chosen output whenever a secret trigger appears.
- C2PACoalition for Content Provenance and Authenticity: an open standard for cryptographically signed metadata that records how digital media was created and edited.
- Data PoisoningAn attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors.
- DeepfakeSynthetic audio, image, or video media generated by AI to convincingly depict a real person saying or doing something they did not.
- Embedding AttacksA class of attacks against AI embedding vectors that recover, alter, or abuse the original input or its semantics, including embedding inversion and similarity-based poisoning.
- Evasion Attack (ML)An inference-time attack in which an adversary crafts inputs that bypass a deployed machine-learning model's intended decision, such as evading a malware classifier or content filter.
- Excessive AgencyOWASP LLM06 — granting an LLM-driven system more functionality, permissions, or autonomy than it actually needs, so that a successful prompt injection or model error translates into outsized real-world impact.
- Indirect Prompt InjectionA prompt-injection variant where malicious instructions are hidden inside third-party content (web pages, documents, emails) that an LLM later ingests through retrieval, browsing, or tool use.
- Insecure Output HandlingOWASP LLM02 — passing LLM-generated output directly into downstream systems (browsers, shells, SQL, code execution) without validation, turning a hallucination or prompt injection into XSS, RCE, or SSRF.
- LLM FirewallA security control that sits between users and a large language model to inspect prompts, retrieved context, and outputs in real time, blocking or rewriting traffic that violates policy.
- LLM GuardrailsMechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.
- LLM System Prompt LeakAn attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools.
- LLMjackingAn attack in which adversaries use stolen cloud credentials to access and abuse hosted large language model services, running up large inference bills for the victim or reselling the access.
- MCP AttacksAttacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts.
- Membership Inference AttackA privacy attack that determines whether a specific data record was part of a machine-learning model's training set by analysing the model's behaviour on that record.
- MLSecOpsThe discipline of integrating security and risk controls across the entire machine-learning lifecycle, from data sourcing through training, deployment, monitoring, and retirement.
- Model Context Protocol (MCP)An open protocol introduced by Anthropic in late 2024 that standardizes how LLM clients connect to external tools, data sources, and prompts via servers, making MCP servers a primary security boundary for agentic AI.
- Model Denial of ServiceOWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.
- Model ExtractionAn attack that reconstructs a confidential machine-learning model's parameters, behaviour, or training data by systematically querying its public API.
- Model InversionA privacy attack that reconstructs sensitive features of a model's training data — such as faces or text — by exploiting the model's outputs or gradients.
- Nightshade AttackA data-poisoning technique developed by the University of Chicago's Glaze team that adds imperceptible perturbations to images so that text-to-image models trained on them learn deeply distorted concepts.
- OWASP LLM Top 10An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.
- Prompt InjectionAn attack that overrides an LLM's original instructions by smuggling adversarial text into the prompt, causing the model to ignore safeguards or execute attacker-chosen actions.
- RAGRetrieval-Augmented Generation: an LLM pattern that fetches relevant documents from a knowledge store at query time and injects them into the prompt to ground responses.
- RAG SecurityThe discipline of securing retrieval-augmented generation pipelines so that the documents, vector stores, and retrieval steps that feed an LLM cannot be poisoned, abused, or used to exfiltrate data.
- Shadow AIThe use of AI tools, models, or services by employees without the knowledge or approval of an organisation's security, privacy, or governance functions.
- SlopsquattingA 2024-coined supply-chain attack where adversaries register package names that LLM code assistants frequently hallucinate, so developers who copy-paste the suggested install command end up pulling malicious code.
- Synthetic MediaAny audio, image, video, or text content produced or substantially modified by generative AI rather than captured directly from the physical world.
- System Prompt ExtractionAttacks that coax a deployed LLM into revealing its hidden system prompt, exposing internal instructions, tool definitions, persona constraints, and any confidential data the operator embedded there.
- Token SmugglingA class of jailbreak technique that hides harmful instructions for an LLM inside encodings, languages, or token sequences the safety filter does not recognise as dangerous.
- Tool-Use InjectionAttacks that manipulate an LLM agent's tool-calling layer — forging tool arguments, smuggling instructions through tool outputs, or coaxing the model into calling unsanctioned tools.
- Training Data ExtractionAttacks that recover verbatim training examples from a deployed model by exploiting memorization, exposing copyrighted text, PII, or proprietary content the model was trained on.
- Transferable Adversarial AttackAn attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.
- Vector Database SecurityThe set of controls that protect vector databases used by AI systems from data leakage, poisoning, tenant cross-talk, and supply-chain or operational compromise.
- Video Deepfake AttackAn attack that uses AI-generated synthetic video of a real person, often in a live meeting, to authorise fraudulent transactions or spread disinformation.
- Voice Cloning AttackAn attack that uses AI-generated speech mimicking a real person to bypass voice authentication or trick victims into authorising payments or actions.