● 53 entries

AI & ML Security

Adaptive AttackAn attack on a machine-learning system that is specifically designed to evade or break a known defence, instead of using a generic, defence-agnostic technique.
Adversarial ExampleAn input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
Agentic AI SecurityThe discipline of securing autonomous LLM agents that plan, call tools, and act on real-world systems, where prompt injection turns into remote code execution and excessive agency into actual blast radius.
AI AlignmentThe research and engineering effort to ensure AI systems pursue goals, follow instructions, and behave in ways that match the intentions of their developers and users.
AI Bill of Materials (AIBOM)A machine-readable inventory of every component that goes into an AI system — datasets, base models, fine-tuning data, libraries, prompts, and evaluation artifacts — used for security, compliance, and accountability.
AI Content DetectionTools and techniques that estimate whether a piece of text, image, audio, or video was produced by an AI model rather than a human.
AI GovernanceThe policies, processes, roles, and controls organisations and regulators use to ensure AI systems are developed, deployed, and operated responsibly and lawfully.
AI HallucinationA failure mode in which a generative AI system outputs content that is fluent and confident but factually wrong, fabricated, or unsupported by its sources.
AI Incident ResponseThe set of processes, roles, and playbooks an organisation uses to detect, contain, investigate, communicate, and recover from incidents involving AI systems.
AI JailbreakA technique that causes an aligned AI model to bypass its safety policies and produce content or behaviour the operator intended to forbid.
AI Model CardA standardised document, introduced by Margaret Mitchell and colleagues in 2018, that describes a machine-learning model's intended use, training data, performance, limitations, and ethical considerations.
AI Red TeamA specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.
AI SafetyThe discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.
AI Supply Chain RiskThe set of threats arising from the third-party datasets, base models, libraries, plug-ins, and infrastructure that organisations combine to build and deploy AI systems.
AI WatermarkingTechniques that embed a detectable signal into AI-generated content so its provenance, model of origin, or training-set membership can be verified later.
AI-Generated DisinformationFalse or misleading content produced or amplified by generative AI to deceive audiences, manipulate opinion, or influence elections, markets, or conflicts.
AI-Generated MalwareMalicious code written, mutated, or assisted by large language models, lowering the skill bar for attackers and accelerating variant production.
Backdoor Attack (ML)A training-time attack that implants a hidden behaviour in a model so it acts normally on clean inputs but produces an attacker-chosen output whenever a secret trigger appears.
C2PACoalition for Content Provenance and Authenticity: an open standard for cryptographically signed metadata that records how digital media was created and edited.
Data PoisoningAn attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors.
DeepfakeSynthetic audio, image, or video media generated by AI to convincingly depict a real person saying or doing something they did not.
Embedding AttacksA class of attacks against AI embedding vectors that recover, alter, or abuse the original input or its semantics, including embedding inversion and similarity-based poisoning.
Evasion Attack (ML)An inference-time attack in which an adversary crafts inputs that bypass a deployed machine-learning model's intended decision, such as evading a malware classifier or content filter.
Excessive AgencyOWASP LLM06 — granting an LLM-driven system more functionality, permissions, or autonomy than it actually needs, so that a successful prompt injection or model error translates into outsized real-world impact.
Indirect Prompt InjectionA prompt-injection variant where malicious instructions are hidden inside third-party content (web pages, documents, emails) that an LLM later ingests through retrieval, browsing, or tool use.
Insecure Output HandlingOWASP LLM02 — passing LLM-generated output directly into downstream systems (browsers, shells, SQL, code execution) without validation, turning a hallucination or prompt injection into XSS, RCE, or SSRF.
LLM FirewallA security control that sits between users and a large language model to inspect prompts, retrieved context, and outputs in real time, blocking or rewriting traffic that violates policy.
LLM GuardrailsMechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.
LLM System Prompt LeakAn attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools.
LLMjackingAn attack in which adversaries use stolen cloud credentials to access and abuse hosted large language model services, running up large inference bills for the victim or reselling the access.
MCP AttacksAttacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts.
Membership Inference AttackA privacy attack that determines whether a specific data record was part of a machine-learning model's training set by analysing the model's behaviour on that record.
MLSecOpsThe discipline of integrating security and risk controls across the entire machine-learning lifecycle, from data sourcing through training, deployment, monitoring, and retirement.
Model Context Protocol (MCP)An open protocol introduced by Anthropic in late 2024 that standardizes how LLM clients connect to external tools, data sources, and prompts via servers, making MCP servers a primary security boundary for agentic AI.
Model Denial of ServiceOWASP LLM04 — driving an LLM application into runaway resource consumption (long contexts, infinite loops, expensive tool fan-out) so it slows, becomes unavailable, or generates a ruinous cloud bill.
Model ExtractionAn attack that reconstructs a confidential machine-learning model's parameters, behaviour, or training data by systematically querying its public API.
Model InversionA privacy attack that reconstructs sensitive features of a model's training data — such as faces or text — by exploiting the model's outputs or gradients.
Nightshade AttackA data-poisoning technique developed by the University of Chicago's Glaze team that adds imperceptible perturbations to images so that text-to-image models trained on them learn deeply distorted concepts.
OWASP LLM Top 10An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.
Prompt InjectionAn attack that overrides an LLM's original instructions by smuggling adversarial text into the prompt, causing the model to ignore safeguards or execute attacker-chosen actions.
RAGRetrieval-Augmented Generation: an LLM pattern that fetches relevant documents from a knowledge store at query time and injects them into the prompt to ground responses.
RAG SecurityThe discipline of securing retrieval-augmented generation pipelines so that the documents, vector stores, and retrieval steps that feed an LLM cannot be poisoned, abused, or used to exfiltrate data.
Shadow AIThe use of AI tools, models, or services by employees without the knowledge or approval of an organisation's security, privacy, or governance functions.
SlopsquattingA 2024-coined supply-chain attack where adversaries register package names that LLM code assistants frequently hallucinate, so developers who copy-paste the suggested install command end up pulling malicious code.
Synthetic MediaAny audio, image, video, or text content produced or substantially modified by generative AI rather than captured directly from the physical world.
System Prompt ExtractionAttacks that coax a deployed LLM into revealing its hidden system prompt, exposing internal instructions, tool definitions, persona constraints, and any confidential data the operator embedded there.
Token SmugglingA class of jailbreak technique that hides harmful instructions for an LLM inside encodings, languages, or token sequences the safety filter does not recognise as dangerous.
Tool-Use InjectionAttacks that manipulate an LLM agent's tool-calling layer — forging tool arguments, smuggling instructions through tool outputs, or coaxing the model into calling unsanctioned tools.
Training Data ExtractionAttacks that recover verbatim training examples from a deployed model by exploiting memorization, exposing copyrighted text, PII, or proprietary content the model was trained on.
Transferable Adversarial AttackAn attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.
Vector Database SecurityThe set of controls that protect vector databases used by AI systems from data leakage, poisoning, tenant cross-talk, and supply-chain or operational compromise.
Video Deepfake AttackAn attack that uses AI-generated synthetic video of a real person, often in a live meeting, to authorise fraudulent transactions or spread disinformation.
Voice Cloning AttackAn attack that uses AI-generated speech mimicking a real person to bypass voice authentication or trick victims into authorising payments or actions.