Security Fundamentals

Adversarial Machine Learning

AI models see the world differently than humans. Adversarial ML is the study of optical illusions and logical traps that cause models to fail.

The 3 Categories of Attacks

Adversarial attacks generally fall into three main categories, depending on the attacker's goal.

1. Evasion Attacks (Input)

Modifying the input data to cause a misclassification.
Example: Adding invisible noise to a "Stop" sign image so a self-driving car sees it as "Speed Limit 45". In LLMs, this is Prompt Injection.

2. Poisoning Attacks (Training)

Corrupting the training data to compromise the model's integrity.
Example: Injecting malicious samples into a spam filter's feedback loop so it starts marking legitimate emails as spam.

3. Extraction Attacks (Model)

Stealing the model's parameters or functionality.
Example: Querying a paid API thousands of times to train a cheap "knock-off" model (Model Stealing).

Why Does This Happen?

Deep neural networks are highly sensitive to small perturbations in their input space. They learn statistical correlations, not causal reasoning.

An adversarial example exploits these "blind spots" in the high-dimensional decision boundary of the model.

Defense Strategies

Adversarial Training: Including adversarial examples in the training set so the model learns to resist them.
Input Sanitization: Pre-processing inputs to remove noise or malicious patterns (Railguard's core function).
Rate Limiting: Preventing attackers from querying the model rapidly enough to perform extraction.

Secure Your AI Pipeline

Railguard sits in front of your model to detect and block evasion attacks in real-time.

Prove

Enforce

Get Started