Adversarial Attack Tutorial¶

Learn how to use LLM-driven evolution to discover effective adversarial attack algorithms.

Academic Citation

The adversarial attack task is based on L-AutoDA research. If you use this feature in academic work, please cite:

@inproceedings{10.1145/3638530.3664121,
    author = {Guo, Ping and Liu, Fei and Lin, Xi and Zhao, Qingchuan and Zhang, Qingfu},
    title = {L-AutoDA: Large Language Models for Automatically Evolving Decision-based Adversarial Attacks},
    year = {2024},
    isbn = {9798400704956},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3638530.3664121},
    doi = {10.1145/3638530.3664121},
    pages = {1846–1854},
    numpages = {9},
    keywords = {large language models, adversarial attacks, automated algorithm design, evolutionary algorithms},
    location = {Melbourne, VIC, Australia},
    series = {GECCO '24 Companion}
}

Complete Example Code

This tutorial provides complete, runnable examples (click to view/download):

basic_example.py - Basic usage example
README.md - Examples documentation and usage guide

Run locally:

cd examples/adversarial_attack
python basic_example.py

Overview¶

This tutorial demonstrates:

Creating adversarial attack tasks
Using LLM-driven evolution to discover attack algorithms
Understanding the draw_proposals function
Evaluating attacks on neural networks
Evolving effective black-box attacks automatically

Installation¶

GPU Recommended

For best performance, install PyTorch with CUDA support before EvoToolkit. We recommend CUDA 12.9 (latest stable).

Step 1: Install PyTorch with GPU Support¶

# CUDA 12.9 (recommended)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu129

# For other versions, visit: https://pytorch.org/get-started/locally/
# CUDA 12.1
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# CPU only (not recommended, slower performance)
# pip install torch torchvision

Step 2: Install EvoToolkit¶

pip install evotoolkit[adversarial_attack]

This installs:

timm - PyTorch Image Models (provides CIFAR-10 pretrained models from Hugging Face)
foolbox - Adversarial attacks library

Prerequisites:

Python >= 3.11
PyTorch >= 2.0 (with CUDA support recommended)
LLM API access (OpenAI, Claude, or other compatible providers)
Basic understanding of adversarial machine learning

Understanding Adversarial Attack Tasks¶

What is an Adversarial Attack Task?¶

An adversarial attack task evolves proposal generation algorithms to create adversarial examples that fool neural networks with minimal distortion.

Aspect	Scientific Regression	Adversarial Attack
Solution type	Mathematical equation	Proposal algorithm
Function name	`equation`	`draw_proposals`
Inputs	Data + params	Images + noise + hyperparams
Evaluation	MSE on predictions	L2 distance of adversarials
Goal	Minimize prediction error	Minimize distortion

Task Components¶

An adversarial attack task requires:

Target model: Neural network to attack
Test data: Images to generate adversarial examples for
Attack budget: Number of iterations/queries
Evaluation metric: L2 distance between original and adversarial images

Creating Your First Adversarial Attack Task¶

Step 1: Load Target Model and Data¶

import torch
import torch.nn as nn
import timm
from torchvision import datasets, transforms

# Load CIFAR-10 pretrained ResNet18 model (from Hugging Face Hub)
# This model achieves 94.98% accuracy on CIFAR-10
# CIFAR-10 ResNet18 uses modified architecture (3x3 conv1, removed maxpool)
base_model = timm.create_model("resnet18", num_classes=10, pretrained=False)
base_model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
base_model.maxpool = nn.Identity()

# Load pretrained weights
base_model.load_state_dict(
    torch.hub.load_state_dict_from_url(
        "https://huggingface.co/edadaltocg/resnet18_cifar10/resolve/main/pytorch_model.bin",
        map_location="cpu",
        file_name="resnet18_cifar10.pth"
    )
)
base_model.eval()

# Create model wrapper with normalization
# Important: Foolbox expects inputs in [0, 1], so normalization must be inside the model
class NormalizedModel(nn.Module):
    def __init__(self, model, mean, std):
        super().__init__()
        self.model = model
        self.register_buffer('mean', torch.tensor(mean).view(1, 3, 1, 1))
        self.register_buffer('std', torch.tensor(std).view(1, 3, 1, 1))

    def forward(self, x):
        # x is in [0, 1], normalize it
        x_normalized = (x - self.mean) / self.std
        return self.model(x_normalized)

model = NormalizedModel(base_model,
                        mean=[0.4914, 0.4822, 0.4465],
                        std=[0.2471, 0.2435, 0.2616])
model.eval()

if torch.cuda.is_available():
    model.cuda()

# Load CIFAR-10 test set (only ToTensor, no Normalize in transform)
transform = transforms.Compose([
    transforms.ToTensor(),  # Converts to [0, 1] range
])
test_set = datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform
)
test_loader = torch.utils.data.DataLoader(
    test_set,
    batch_size=32,
    shuffle=False
)

Step 2: Create Task and Test Initial Solution¶

from evotoolkit.task.python_task import AdversarialAttackTask

# Create task
task = AdversarialAttackTask(
    model=model,
    test_loader=test_loader,
    attack_steps=1000,
    n_test_samples=10,
    use_mock=False
)

# Get initial solution
init_sol = task.make_init_sol_wo_other_info()

print(f"Initial algorithm:")
print(init_sol.sol_string)
print(f"\nScore: {init_sol.evaluation_res.score:.2f}")
print(f"Avg L2 distance: {init_sol.evaluation_res.additional_info['avg_distance']:.2f}")

Output:

Initial algorithm:
import numpy as np

def draw_proposals(org_img, best_adv_img, std_normal_noise, hyperparams):
    """Baseline proposal generation..."""
    ...

Score: -2.34
Avg L2 distance: 2.34

Step 3: Test Custom Algorithm¶

custom_code = '''import numpy as np

def draw_proposals(org_img, best_adv_img, std_normal_noise, hyperparams):
    """Simple algorithm: move toward original with noise."""
    org = org_img.flatten()
    best = best_adv_img.flatten()
    noise = std_normal_noise.flatten()

    # Move toward original with random perturbation
    direction = org - best
    step = hyperparams[0] * 0.1
    candidate = best + step * direction + step * noise * 0.5

    return candidate.reshape(org_img.shape)
'''

result = task.evaluate_code(custom_code)
print(f"Score: {result.score:.2f}")
print(f"Avg L2 distance: {result.additional_info['avg_distance']:.2f}")

Understanding the draw_proposals Function¶

Function Signature¶

The evolved function must have this exact signature:

def draw_proposals(
    org_img: np.ndarray,         # Original clean image
    best_adv_img: np.ndarray,    # Current best adversarial
    std_normal_noise: np.ndarray,# Random noise for exploration
    hyperparams: np.ndarray      # Adaptive step size
) -> np.ndarray:                 # New candidate adversarial
    """Generate new candidate adversarial example."""
    ...

Input Details¶

org_img (Original Image): - Shape: (3, H, W) for RGB images (e.g., (3, 32, 32) for CIFAR-10) - Values: [0, 1] normalized pixel values - Purpose: The clean image we're attacking

best_adv_img (Best Adversarial): - Shape: (3, H, W) - same as org_img - Values: [0, 1] - Purpose: Current best adversarial example (fools the model, closest to original)

std_normal_noise (Random Noise): - Shape: (3, H, W) - same as org_img - Values: Sampled from standard normal distribution N(0, 1) - Purpose: Provides randomness for exploration

hyperparams (Adaptive Parameters): - Shape: (1,) - single scalar value - Values: Typically in range [0.5, 1.5] - Purpose: Adaptive step size that increases when finding adversarials

Return Value¶

Must return a numpy array with: - Shape: (3, H, W) - same as org_img - Values: Any (will be clipped to [0, 1] automatically) - Purpose: New candidate adversarial example

Algorithm Design Principles¶

1. Exploitation (Refinement)

Move along the direction from org_img toward decision boundary:

direction = org_img - best_adv_img
candidate = best_adv_img + step_size * direction

2. Exploration (Discovery)

Add random noise to discover new regions:

candidate = best_adv_img + noise_component

3. Adaptive Step Size

Use hyperparams to balance exploration/exploitation:

# hyperparams increases when finding adversarials
step = hyperparams[0] * base_step_size

4. Complete Example

import numpy as np

def draw_proposals(org_img, best_adv_img, std_normal_noise, hyperparams):
    """Combine parallel and perpendicular components."""
    # Flatten to vectors
    org = org_img.flatten()
    best = best_adv_img.flatten()
    noise = std_normal_noise.flatten()

    # Compute direction
    direction = org - best
    direction_norm = np.linalg.norm(direction)

    # Parallel component (toward original)
    noise_norm = np.linalg.norm(noise)
    step_size = (noise_norm * hyperparams[0]) ** 2
    d_parallel = step_size * direction

    # Perpendicular component (exploration)
    if direction_norm > 1e-8:
        dot_product = np.dot(direction, noise)
        projection = (dot_product / direction_norm) * direction
        d_perpendicular = (projection / direction_norm - direction_norm * noise) * hyperparams[0]
    else:
        d_perpendicular = noise * hyperparams[0]

    # Combine
    candidate = best + d_parallel + d_perpendicular

    return candidate.reshape(org_img.shape)

Running Evolution to Discover Attacks¶

Step 1: Create Interface¶

import evotoolkit
from evotoolkit.task.python_task import EvoEngineerPythonInterface
from evotoolkit.tools.llm import HttpsApi

# Create interface
interface = EvoEngineerPythonInterface(task)

Step 2: Configure LLM¶

llm_api = HttpsApi(
    api_url="api.openai.com",  # Your API URL
    key="your-api-key-here",   # Your API key
    model="gpt-4o"
)

Step 3: Run Evolution¶

result = evotoolkit.solve(
    interface=interface,
    output_path='./attack_results',
    running_llm=llm_api,
    max_generations=10,
    pop_size=5,
    max_sample_nums=20
)

print(f"Best algorithm found:")
print(result.sol_string)
print(f"\nAvg L2 distance: {-result.evaluation_res.score:.2f}")

Try Different Algorithms

EvoToolkit supports multiple evolutionary algorithms for adversarial attacks:

# Using EoH
from evotoolkit.task.python_task import EoHPythonInterface
interface = EoHPythonInterface(task)

# Using FunSearch
from evotoolkit.task.python_task import FunSearchPythonInterface
interface = FunSearchPythonInterface(task)

# Using EvoEngineer (default)
from evotoolkit.task.python_task import EvoEngineerPythonInterface
interface = EvoEngineerPythonInterface(task)

Then use the same evotoolkit.solve() call to run evolution. Different interfaces may discover different attack strategies.

Attack Evolution Example¶

During evolution, the LLM discovers increasingly effective algorithms:

Generation 1: Simple baseline

def draw_proposals(org_img, best_adv_img, std_normal_noise, hyperparams):
    return best_adv_img + 0.01 * std_normal_noise
# Avg L2: 3.5

Generation 3: Direction-based

def draw_proposals(org_img, best_adv_img, std_normal_noise, hyperparams):
    direction = org_img - best_adv_img
    return best_adv_img + hyperparams[0] * 0.1 * direction
# Avg L2: 2.1

Generation 7: Sophisticated combination

def draw_proposals(org_img, best_adv_img, std_normal_noise, hyperparams):
    # Complex algorithm combining multiple components
    ...
# Avg L2: 0.8

Customizing Evolution Behavior¶

The quality of evolved attacks is controlled by the evolution method and its internal prompt design. To improve results:

Adjust prompts: Inherit existing Interface classes and customize LLM prompts
Develop new algorithms: Create entirely new evolutionary strategies

Learn More

These are general techniques applicable to all tasks. For detailed tutorials, see:

Customizing Evolution Methods - How to modify prompts and develop new algorithms
Advanced Usage - More advanced configuration options

Quick Example - Custom Prompts for Adversarial Attacks:

from evotoolkit.task.python_task import EvoEngineerPythonInterface

class EvoEngineerCustomAttackInterface(EvoEngineerPythonInterface):
    """Interface optimized for adversarial attack evolution."""

    def get_operator_prompt(self, operator_name, selected_individuals,
                           current_best_sol, random_thoughts, **kwargs):
        """Customize mutation prompt to emphasize attack effectiveness."""

        if operator_name == "mutation":
            task_description = self.task.get_base_task_description()
            individual = selected_individuals[0]

            prompt = f"""# Adversarial Attack Algorithm Evolution

{task_description}

## Current Best Algorithm
**Avg L2 Distance:** {-current_best_sol.evaluation_res.score:.2f}
**Algorithm:** {current_best_sol.sol_string}

## Algorithm to Mutate
**Avg L2 Distance:** {-individual.evaluation_res.score:.2f}
**Algorithm:** {individual.sol_string}

## Optimization Guidelines
Focus on improving the algorithm by:
- Better balancing exploitation (refinement) and exploration (discovery)
- More effective use of the adaptive hyperparams
- Clever combination of direction vectors and noise
- Numerical stability and efficiency

Generate an improved draw_proposals function that achieves lower L2 distances.

## Response Format:
name: [descriptive_name]
code:
[Your improved draw_proposals function]
thought: [reasoning for changes]
"""
            return [{"role": "user", "content": prompt}]

        # Use default for other operators
        return super().get_operator_prompt(operator_name, selected_individuals,
                                          current_best_sol, random_thoughts, **kwargs)

# Use custom interface
interface = EvoEngineerCustomAttackInterface(task)
result = evotoolkit.solve(
    interface=interface,
    output_path='./custom_results',
    running_llm=llm_api,
    max_generations=10
)

Understanding Evaluation¶

Scoring Mechanism¶

Attack Execution: Run evolved algorithm on test samples
Adversarial Generation: Create adversarial examples using draw_proposals
Distance Measurement: Compute L2 distance from original images
Fitness Calculation: Score = -(average L2 distance)

Lower L2 distance = better attack = higher score (less negative)

Evaluation Output¶

result = task.evaluate_code(algorithm_code)

if result.valid:
    print(f"Score: {result.score:.2f}")  # Negative L2 distance
    print(f"Avg L2: {result.additional_info['avg_distance']:.2f}")
    print(f"Attack steps: {result.additional_info['attack_steps']}")
else:
    print(f"Error: {result.additional_info['error']}")

Use Cases and Applications¶

1. Black-Box Attack Discovery¶

Evolve algorithms for black-box scenarios where gradients are unavailable:

task = AdversarialAttackTask(
    model=black_box_model,
    test_loader=test_loader,
    attack_steps=5000,  # More iterations for black-box
    n_test_samples=50
)

2. Robustness Evaluation¶

Test model defenses by evolving strong attacks:

# Load more robust model (e.g., adversarially trained)
# Note: You need to train or obtain robust models yourself
from torchvision import models
model = models.resnet50(pretrained=True)  # Or your robust model
model.eval()

task = AdversarialAttackTask(
    model=model,
    test_loader=test_loader,
    attack_steps=10000,  # Thorough evaluation
    n_test_samples=100
)

About Robust Models

EvoToolkit no longer depends on robustbench library. If you need to test robust models:

Use your own adversarially trained models
Load pretrained robust models from other sources
Or use standard models for basic testing

3. Transfer Attack Development¶

Evolve attacks that transfer across models:

# Train on surrogate model
from torchvision import models
surrogate_model = models.resnet18(pretrained=True)
surrogate_model.eval()

task = AdversarialAttackTask(
    model=surrogate_model,
    test_loader=test_loader,
    attack_steps=5000,
    n_test_samples=50
)

# Evolve attack
result = evotoolkit.solve(interface, ...)

# Test on target model
target_model = models.resnet50(pretrained=True)  # Different architecture
target_model.eval()
# Evaluate evolved algorithm on target_model

4. Query-Efficient Attacks¶

Optimize for minimal queries to the target model:

task = AdversarialAttackTask(
    model=model,
    test_loader=test_loader,
    attack_steps=100,  # Limited queries
    n_test_samples=20
)

Complete Example¶

Here's a full working example:

import torch
import torch.nn as nn
import timm
import evotoolkit
from torchvision import datasets, transforms
from evotoolkit.task.python_task import (
    AdversarialAttackTask,
    EvoEngineerPythonInterface
)
from evotoolkit.tools.llm import HttpsApi

# 1. Load CIFAR-10 pretrained ResNet18 model
model = timm.create_model("resnet18", num_classes=10, pretrained=False)
model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
model.maxpool = nn.Identity()

# Load pretrained weights
model.load_state_dict(
    torch.hub.load_state_dict_from_url(
        "https://huggingface.co/edadaltocg/resnet18_cifar10/resolve/main/pytorch_model.bin",
        map_location="cpu",
        file_name="resnet18_cifar10.pth"
    )
)
model.eval()

if torch.cuda.is_available():
    model.cuda()

# 2. Prepare test data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
                        std=[0.2471, 0.2435, 0.2616])
])
test_set = datasets.CIFAR10(
    root='./data',
    train=False,
    download=True,
    transform=transform
)
test_loader = torch.utils.data.DataLoader(
    test_set, batch_size=32, shuffle=False
)

# 3. Create task
task = AdversarialAttackTask(
    model=model,
    test_loader=test_loader,
    attack_steps=1000,
    n_test_samples=10,
    use_mock=False
)

# 4. Create LLM API
llm_api = HttpsApi(
    api_url="api.openai.com",  # Your API URL
    key="your-api-key-here",   # Your API key
    model="gpt-4o"
)

# 5. Create interface
interface = EvoEngineerPythonInterface(task)

# 6. Run evolution
result = evotoolkit.solve(
    interface=interface,
    output_path='./attack_results',
    running_llm=llm_api,
    max_generations=10,
    pop_size=5,
    max_sample_nums=20
)

# 7. Show results
print(f"Best attack algorithm found:")
print(result.sol_string)
print(f"\nAvg L2 distance: {-result.evaluation_res.score:.2f}")
print(f"Attack steps: {result.evaluation_res.additional_info['attack_steps']}")

Next Steps¶

Explore Different Attack Scenarios¶

Try different target models (standard vs robust)
Experiment with different datasets (CIFAR-10, ImageNet)
Compare different evolutionary algorithms
Test evolved attacks on multiple models

Customize and Improve Evolution¶

Examine prompt designs in existing Interface classes
Inherit and override Interfaces to customize prompts
Design specialized prompts for different attack types
Develop new evolutionary algorithms if needed

Learn More¶

Customizing Evolution Methods - Deep dive into prompt customization
Advanced Usage - Advanced configuration and techniques
API Reference - Complete API documentation
L-AutoDA Paper - GECCO 2024

References¶

L-AutoDA: Large Language Models for Automatically Evolving Decision-based Adversarial Attacks (GECCO 2024)
Foolbox: A Python toolbox to create adversarial examples
PyTorch Models: Pretrained computer vision models (https://pytorch.org/vision/stable/models.html)