● Category
AI & ML Security
43 entries
- ai-security№ 866
Prompt Injection
An attack that overrides an LLM's original instructions by smuggling adversarial text into the prompt, causing the model to ignore safeguards or execute attacker-chosen actions.
- ai-security№ 528
Indirect Prompt Injection
A prompt-injection variant where malicious instructions are hidden inside third-party content (web pages, documents, emails) that an LLM later ingests through retrieval, browsing, or tool use.
- ai-security№ 030
AI Jailbreak
A technique that causes an aligned AI model to bypass its safety policies and produce content or behaviour the operator intended to forbid.
- ai-security№ 281
Data Poisoning
An attack on a machine-learning system in which adversaries inject, alter, or relabel training data so the resulting model behaves incorrectly or contains hidden backdoors.
- ai-security№ 703
Model Extraction
An attack that reconstructs a confidential machine-learning model's parameters, behaviour, or training data by systematically querying its public API.
- ai-security№ 704
Model Inversion
A privacy attack that reconstructs sensitive features of a model's training data — such as faces or text — by exploiting the model's outputs or gradients.
- ai-security№ 018
Adversarial Example
An input deliberately perturbed — often imperceptibly to humans — so that a machine-learning model produces a wrong or attacker-chosen prediction.
- ai-security№ 393
Evasion Attack (ML)
An inference-time attack in which an adversary crafts inputs that bypass a deployed machine-learning model's intended decision, such as evading a malware classifier or content filter.
- ai-security№ 081
Backdoor Attack (ML)
A training-time attack that implants a hidden behaviour in a model so it acts normally on clean inputs but produces an attacker-chosen output whenever a secret trigger appears.
- ai-security№ 666
Membership Inference Attack
A privacy attack that determines whether a specific data record was part of a machine-learning model's training set by analysing the model's behaviour on that record.
- ai-security№ 032
AI Red Team
A specialised team that simulates adversaries against AI systems to uncover safety, security, and misuse risks before real attackers do.
- ai-security№ 691
MLSecOps
The discipline of integrating security and risk controls across the entire machine-learning lifecycle, from data sourcing through training, deployment, monitoring, and retirement.
- ai-security№ 777
OWASP LLM Top 10
An OWASP-maintained list of the ten most critical security risks affecting applications that build on large language models.
- ai-security№ 028
AI Hallucination
A failure mode in which a generative AI system outputs content that is fluent and confident but factually wrong, fabricated, or unsupported by its sources.
- ai-security№ 024
AI Alignment
The research and engineering effort to ensure AI systems pursue goals, follow instructions, and behave in ways that match the intentions of their developers and users.
- ai-security№ 033
AI Safety
The discipline that aims to prevent AI systems from causing unintended harm to users, operators, and society — covering technical, operational, and societal dimensions.
- ai-security№ 027
AI Governance
The policies, processes, roles, and controls organisations and regulators use to ensure AI systems are developed, deployed, and operated responsibly and lawfully.
- ai-security№ 297
Deepfake
Synthetic audio, image, or video media generated by AI to convincingly depict a real person saying or doing something they did not.
- ai-security№ 1123
Synthetic Media
Any audio, image, video, or text content produced or substantially modified by generative AI rather than captured directly from the physical world.
- ai-security№ 035
AI Watermarking
Techniques that embed a detectable signal into AI-generated content so its provenance, model of origin, or training-set membership can be verified later.
- ai-security№ 1026
Shadow AI
The use of AI tools, models, or services by employees without the knowledge or approval of an organisation's security, privacy, or governance functions.
- ai-security№ 025
AI Bill of Materials (AIBOM)
A machine-readable inventory of every component that goes into an AI system — datasets, base models, fine-tuning data, libraries, prompts, and evaluation artifacts — used for security, compliance, and accountability.
- ai-security№ 898
RAG Security
The discipline of securing retrieval-augmented generation pipelines so that the documents, vector stores, and retrieval steps that feed an LLM cannot be poisoned, abused, or used to exfiltrate data.
- ai-security№ 1163
Token Smuggling
A class of jailbreak technique that hides harmful instructions for an LLM inside encodings, languages, or token sequences the safety filter does not recognise as dangerous.
- ai-security№ 729
Nightshade Attack
A data-poisoning technique developed by the University of Chicago's Glaze team that adds imperceptible perturbations to images so that text-to-image models trained on them learn deeply distorted concepts.
- ai-security№ 034
AI Supply Chain Risk
The set of threats arising from the third-party datasets, base models, libraries, plug-ins, and infrastructure that organisations combine to build and deploy AI systems.
- ai-security№ 026
AI Content Detection
Tools and techniques that estimate whether a piece of text, image, audio, or video was produced by an AI model rather than a human.
- ai-security№ 029
AI Incident Response
The set of processes, roles, and playbooks an organisation uses to detect, contain, investigate, communicate, and recover from incidents involving AI systems.
- ai-security№ 617
LLM Firewall
A security control that sits between users and a large language model to inspect prompts, retrieved context, and outputs in real time, blocking or rewriting traffic that violates policy.
- ai-security№ 618
LLM Guardrails
Mechanisms that constrain what an LLM-based application can input or output, enforcing safety, security, and business rules around the underlying model.
- ai-security№ 657
MCP Attacks
Attacks that exploit the Model Context Protocol (MCP) to inject prompts, abuse tools, or pivot through servers an AI assistant trusts.
- ai-security№ 1208
Voice Cloning Attack
An attack that uses AI-generated speech mimicking a real person to bypass voice authentication or trick victims into authorising payments or actions.
- ai-security№ 1203
Video Deepfake Attack
An attack that uses AI-generated synthetic video of a real person, often in a live meeting, to authorise fraudulent transactions or spread disinformation.
- ai-security№ 036
AI-Generated Disinformation
False or misleading content produced or amplified by generative AI to deceive audiences, manipulate opinion, or influence elections, markets, or conflicts.
- ai-security№ 037
AI-Generated Malware
Malicious code written, mutated, or assisted by large language models, lowering the skill bar for attackers and accelerating variant production.
- ai-security№ 1168
Transferable Adversarial Attack
An attack in which adversarial examples crafted against one machine-learning model also fool other, unseen models, enabling black-box attacks without access to the target.
- ai-security№ 014
Adaptive Attack
An attack on a machine-learning system that is specifically designed to evade or break a known defence, instead of using a generic, defence-agnostic technique.
- ai-security№ 619
LLM System Prompt Leak
An attack that extracts the hidden system prompt or instructions of a deployed large language model application, exposing logic, secrets, and tools.
- ai-security№ 137
C2PA
Coalition for Content Provenance and Authenticity: an open standard for cryptographically signed metadata that records how digital media was created and edited.
- ai-security№ 897
RAG
Retrieval-Augmented Generation: an LLM pattern that fetches relevant documents from a knowledge store at query time and injects them into the prompt to ground responses.
- ai-security№ 376
Embedding Attacks
A class of attacks against AI embedding vectors that recover, alter, or abuse the original input or its semantics, including embedding inversion and similarity-based poisoning.
- ai-security№ 1198
Vector Database Security
The set of controls that protect vector databases used by AI systems from data leakage, poisoning, tenant cross-talk, and supply-chain or operational compromise.
- ai-security№ 031
AI Model Card
A standardised document, introduced by Margaret Mitchell and colleagues in 2018, that describes a machine-learning model's intended use, training data, performance, limitations, and ethical considerations.