Prompt Engineering Tutorial¶
Learn how to use LLM-driven evolution to optimize prompt templates for better downstream task performance.
Academic Citation
If you use EvoToolkit in your research, please cite:
Complete Example Code
This tutorial provides complete, runnable examples (click to view/download):
- basic_example.py - Basic usage with mock LLM
- README.md - Examples documentation and usage guide
Run locally:
Overview¶
This tutorial demonstrates:
- Creating prompt optimization tasks
- Using LLM-driven evolution to improve prompt templates
- Testing prompts on specific downstream tasks
- Evolving high-quality prompts automatically
Installation¶
Install EvoToolkit:
Prerequisites:
- Python >= 3.11
- LLM API access (OpenAI, Claude, or other compatible providers)
- Basic understanding of prompt engineering
Understanding Prompt Optimization Tasks¶
What is a Prompt Optimization Task?¶
A prompt optimization task evolves string templates to maximize performance on downstream tasks. Unlike Python tasks that evolve code, prompt tasks evolve prompt text directly.
| Aspect | Python Task | Prompt Task |
|---|---|---|
| Solution type | Python code | String template |
| Evolution target | Function/algorithm | Prompt text |
| Evaluation | Execute code | Test template with LLM |
| Example | def func(x): return x**2 |
"Solve: {question}\nAnswer:" |
Task Components¶
A prompt optimization task requires:
- Test cases: Question-answer pairs for evaluation
- Template syntax: String with
{question}placeholder - LLM API: For testing prompt templates (or use mock mode)
- Evaluation metric: Accuracy on test cases
Creating Your First Prompt Task¶
Step 1: Define Test Cases¶
Create test cases with questions and expected answers:
test_cases = [
{"question": "What is 2+2?", "expected": "4"},
{"question": "What is 5*3?", "expected": "15"},
{"question": "What is 10-7?", "expected": "3"},
{"question": "What is 12/4?", "expected": "3"},
{"question": "What is 7+8?", "expected": "15"},
]
Step 2: Create the Task¶
from evotoolkit.task import PromptOptimizationTask
from evotoolkit.tools.llm import HttpsApi
# Configure LLM API
llm_api = HttpsApi(
api_url="your_api_url", # e.g., "ai.api.example.com"
key="your_api_key", # Your API key
model="gpt-4o"
)
task = PromptOptimizationTask(
test_cases=test_cases,
llm_api=llm_api,
use_mock=False
)
Step 3: Test Initial Template¶
# Get initial solution
init_sol = task.make_init_sol_wo_other_info()
print(f"Initial template: {init_sol.sol_string}")
print(f"Accuracy: {init_sol.evaluation_res.score:.2%}")
print(f"Correct: {init_sol.evaluation_res.additional_info['correct']}/{init_sol.evaluation_res.additional_info['total']}")
Output:
Step 4: Test Custom Templates¶
# Test your own template
custom_template = "Solve this math problem and give only the number: {question}"
result = task.evaluate_code(custom_template)
print(f"Custom template: {custom_template}")
print(f"Accuracy: {result.score:.2%}")
print(f"Correct: {result.additional_info['correct']}/{result.additional_info['total']}")
Running Evolution to Optimize Prompts¶
Step 1: Create Interface¶
import evotoolkit
from evotoolkit.task import EvoEngineerStringInterface
# Create interface
interface = EvoEngineerStringInterface(task)
Step 2: Run Evolution¶
# Run evolution with LLM
result = evotoolkit.solve(
interface=interface,
output_path='./prompt_results',
running_llm=llm_api,
max_generations=10,
pop_size=5,
max_sample_nums=20
)
print(f"Best template found: {result.sol_string}")
print(f"Accuracy: {result.evaluation_res.score:.2%}")
Try Different Algorithms
EvoToolkit supports multiple evolutionary algorithms for prompt optimization:
# Using EoH
from evotoolkit.task import EoHStringInterface
interface = EoHStringInterface(task)
# Using FunSearch
from evotoolkit.task import FunSearchStringInterface
interface = FunSearchStringInterface(task)
# Using EvoEngineer (default)
from evotoolkit.task import EvoEngineerStringInterface
interface = EvoEngineerStringInterface(task)
Then use the same evotoolkit.solve() call to run evolution. Different interfaces may perform better on different tasks.
Understanding Template Format¶
Valid Templates¶
Prompt templates must include the {question} placeholder:
# β
Good templates
"Answer this question: {question}"
"Solve this math problem: {question}\nGive only the number."
"Question: {question}\nThink step by step and provide only the final answer."
"Let's solve: {question}\nFirst, analyze the problem..."
# β Bad templates (missing placeholder)
"Solve this problem" # No {question} placeholder
"Answer: 42" # No {question} placeholder
Template Evolution Example¶
During evolution, the LLM generates improved templates:
# Generation 1
"Answer: {question}"
# Accuracy: 60%
# Generation 3
"Solve this math problem: {question}\nProvide only the numerical answer."
# Accuracy: 85%
# Generation 7
"Calculate: {question}\nShow only the final number, no explanation."
# Accuracy: 100%
Use Cases and Applications¶
1. Math Problem Solving¶
test_cases = [
{"question": "What is 15 * 7?", "expected": "105"},
{"question": "What is 144 / 12?", "expected": "12"},
# ...
]
task = PromptOptimizationTask(test_cases=test_cases, llm_api=llm_api)
2. Text Classification¶
test_cases = [
{"question": "This movie is amazing!", "expected": "positive"},
{"question": "This movie is terrible!", "expected": "negative"},
{"question": "I loved this film!", "expected": "positive"},
# ...
]
task = PromptOptimizationTask(test_cases=test_cases, llm_api=llm_api)
3. Information Extraction¶
test_cases = [
{"question": "Extract the date: The meeting is on 2024-03-15", "expected": "2024-03-15"},
{"question": "Extract the date: We'll meet on March 20th, 2024", "expected": "2024-03-20"},
# ...
]
task = PromptOptimizationTask(test_cases=test_cases, llm_api=llm_api)
4. Translation Tasks¶
test_cases = [
{"question": "Translate to French: Hello", "expected": "Bonjour"},
{"question": "Translate to French: Thank you", "expected": "Merci"},
# ...
]
task = PromptOptimizationTask(test_cases=test_cases, llm_api=llm_api)
Customizing Evolution Behavior¶
The quality of evolved prompts is controlled by the evolution method and its internal prompt design. To improve results:
- Adjust prompts: Inherit existing Interface classes and customize LLM prompts
- Develop new algorithms: Create entirely new evolutionary strategies
Learn More
These are general techniques applicable to all tasks. For detailed tutorials, see:
- Customizing Evolution Methods - How to modify prompts and develop new algorithms
- Advanced Usage - More advanced configuration options
Quick Example - Custom Prompts for Prompt Optimization:
from evotoolkit.task import EvoEngineerStringInterface
class CustomPromptInterface(EvoEngineerStringInterface):
"""Interface optimized for prompt template evolution."""
def get_operator_prompt(self, operator_name, selected_individuals,
current_best_sol, random_thoughts, **kwargs):
"""Customize mutation prompt to emphasize clarity and structure."""
if operator_name == "mutation":
task_description = self.task.get_base_task_description()
individual = selected_individuals[0]
prompt = f"""# Prompt Template Optimization
{task_description}
## Current Best Template
**Accuracy:** {current_best_sol.evaluation_res.score:.2%}
**Template:** {current_best_sol.sol_string}
## Template to Mutate
**Accuracy:** {individual.evaluation_res.score:.2%}
**Template:** {individual.sol_string}
## Optimization Guidelines
Focus on improving the template by:
- Adding clear instructions
- Specifying output format explicitly
- Including relevant context or examples
- Using appropriate tone and style
- Ensuring the {{question}} placeholder is preserved
Generate an improved template that increases accuracy.
## Response Format:
name: [descriptive_name]
code:
[Your improved template with {{question}} placeholder]
thought: [reasoning for changes]
"""
return [{"role": "user", "content": prompt}]
# Use default for other operators
return super().get_operator_prompt(operator_name, selected_individuals,
current_best_sol, random_thoughts, **kwargs)
# Use custom interface
interface = CustomPromptInterface(task)
result = evotoolkit.solve(
interface=interface,
output_path='./custom_results',
running_llm=llm_api,
max_generations=10
)
Understanding Evaluation¶
Scoring Mechanism¶
- Template Testing: Each template is tested on all test cases
- LLM Response: The LLM generates answers using the template
- Answer Checking: Responses are compared to expected answers
- Accuracy Calculation: Score = (correct answers) / (total test cases)
Evaluation Output¶
result = task.evaluate_code(template)
if result.valid:
print(f"Accuracy: {result.score:.2%}")
print(f"Correct: {result.additional_info['correct']}/{result.additional_info['total']}")
print(f"Details: {result.additional_info['details']}")
else:
print(f"Error: {result.additional_info['error_msg']}")
Mock Mode for Testing¶
Use mock mode to test without LLM API costs:
# Mock mode always returns correct answers for testing
task = PromptOptimizationTask(
test_cases=test_cases,
use_mock=True # No actual LLM calls
)
# Good for:
# - Testing task setup
# - Debugging template format
# - Understanding the workflow
# - Developing custom interfaces
Custom Evaluation Logic¶
For specialized tasks, you can customize answer checking:
from evotoolkit.task import PromptOptimizationTask
class CustomPromptTask(PromptOptimizationTask):
"""Custom task with specialized answer checking."""
def _check_answer(self, response: str, expected: str) -> bool:
"""Custom evaluation logic."""
# Example: Case-insensitive comparison
return response.strip().lower() == expected.strip().lower()
# Example: Fuzzy matching
# from difflib import SequenceMatcher
# similarity = SequenceMatcher(None, response, expected).ratio()
# return similarity > 0.8
# Example: Regex matching
# import re
# return bool(re.search(expected, response))
# Use custom task
test_cases = [
{"question": "Capital of France?", "expected": "paris"},
# "Paris", "PARIS", "paris" all accepted
]
task = CustomPromptTask(test_cases=test_cases, llm_api=llm_api)
Complete Example¶
Here's a full working example:
import evotoolkit
from evotoolkit.task import PromptOptimizationTask, EvoEngineerStringInterface
from evotoolkit.tools.llm import HttpsApi
# 1. Define test cases
test_cases = [
{"question": "What is 2+2?", "expected": "4"},
{"question": "What is 5*3?", "expected": "15"},
{"question": "What is 10-7?", "expected": "3"},
{"question": "What is 12/4?", "expected": "3"},
{"question": "What is 7+8?", "expected": "15"},
]
# 2. Configure LLM API
llm_api = HttpsApi(
api_url="your_api_url", # e.g., "ai.api.example.com"
key="your_api_key", # Your API key
model="gpt-4o"
)
# 3. Create task
task = PromptOptimizationTask(
test_cases=test_cases,
llm_api=llm_api,
use_mock=False
)
# 4. Create interface
interface = EvoEngineerStringInterface(task)
# 5. Run evolution
result = evotoolkit.solve(
interface=interface,
output_path='./prompt_optimization_results',
running_llm=llm_api,
max_generations=10,
pop_size=5,
max_sample_nums=20
)
# 6. Show results
print(f"Best template found:")
print(f" {result.sol_string}")
print(f"Accuracy: {result.evaluation_res.score:.2%}")
print(f"Correct: {result.evaluation_res.additional_info['correct']}/{result.evaluation_res.additional_info['total']}")
Next Steps¶
Explore Different Optimization Strategies¶
- Try different evolutionary algorithms (EvoEngineer variants, EoH, FunSearch)
- Compare results across different interfaces
- Experiment with different test case sets
- Test on various downstream tasks
Customize and Improve Evolution¶
- Examine prompt designs in existing Interface classes
- Inherit and override Interfaces to customize prompts
- Design specialized prompts for different task types
- Develop new evolutionary algorithms if needed
Learn More¶
- Customizing Evolution Methods - Deep dive into prompt customization and algorithm development
- Advanced Usage - Advanced configuration and techniques
- API Reference - Complete API documentation
- Development Docs - Contribute new methods and features