Control Box2D (Lunar Lander) Tutorial¶
Learn how to use LLM-driven evolution to discover interpretable control policies for the Gymnasium LunarLander-v3 environment.
Complete Example Code
This tutorial provides complete, runnable examples (click to view/download):
- basic_example.py - Basic usage example
- README.md - Examples documentation and usage guide
Run locally:
Overview¶
This tutorial demonstrates:
- Creating a LunarLander control task
- Evolving interpretable Python control policies using LLM-driven evolution
- Understanding the
policy(state) -> actionfunction interface - Evaluating policies in the Gymnasium environment
- Running evolution to discover effective landing strategies
Unlike neural network controllers, EvoToolkit evolves human-readable Python code, enabling inspection, understanding, and formal verification of the resulting policies.
Installation¶
This installs:
gymnasium[box2d]- Gymnasium environment with Box2D physicsbox2d-py- Box2D physics engine
Prerequisites:
- Python >= 3.10
- LLM API access (OpenAI, Claude, or other compatible providers)
Understanding the LunarLander Task¶
What Does the Task Evolve?¶
The task evolves a policy function that maps an 8-dimensional state observation to one of 4 discrete actions:
| State Index | Meaning |
|---|---|
| 0 | X position |
| 1 | Y position |
| 2 | X velocity |
| 3 | Y velocity |
| 4 | Angle |
| 5 | Angular velocity |
| 6 | Left leg contact (bool) |
| 7 | Right leg contact (bool) |
| Action | Meaning |
|---|---|
| 0 | Do nothing |
| 1 | Fire left engine |
| 2 | Fire main engine |
| 3 | Fire right engine |
Evaluation¶
Each policy is evaluated across multiple episodes. The score is the average reward per episode:
- Perfect landing: ~200 points
- Crash: -100 points
- Fuel efficiency: -0.3 per frame of engine use
Quick Start¶
Step 1: Create the Task¶
from evotoolkit.task.python_task.control_box2d import LunarLanderTask
task = LunarLanderTask(
num_episodes=5, # Episodes per evaluation
max_steps=1000, # Max steps per episode
render_mode=None, # Set to "human" to watch
seed=42, # For reproducibility
timeout_seconds=60.0,
)
print(f"Environment: {task.task_info['env_name']}")
print(f"State dimensions: {task.task_info['state_dim']}")
print(f"Action dimensions: {task.task_info['action_dim']}")
Step 2: Test the Baseline Policy¶
# Get the built-in baseline policy
init_sol = task.make_init_sol_wo_other_info()
result = task.evaluate_code(init_sol.sol_string)
print(f"Baseline score: {result.score:.2f}")
if result.valid:
print(f"Avg reward: {result.additional_info['avg_reward']:.2f}")
print(f"Success rate: {result.additional_info['success_rate']:.1%}")
Step 3: Run Evolution¶
import evotoolkit
from evotoolkit.task.python_task.control_box2d import EvoEngineerControlInterface
from evotoolkit.tools.llm import HttpsApi
# Create control-specific interface
interface = EvoEngineerControlInterface(task)
# Configure LLM
llm_api = HttpsApi(
api_url="api.openai.com",
key="your-api-key-here",
model="gpt-4o"
)
# Run evolution
result = evotoolkit.solve(
interface=interface,
output_path='./lunar_lander_results',
running_llm=llm_api,
max_generations=10,
pop_size=5,
max_sample_nums=20
)
print(f"Best policy found:")
print(result.sol_string)
print(f"Score: {result.evaluation_res.score:.2f}")
Policy Function Interface¶
The evolved function must have this exact signature:
def policy(state: list) -> int:
"""
Control policy for LunarLander-v3.
Args:
state: 8-dimensional observation:
[x_pos, y_pos, x_vel, y_vel, angle, angular_vel,
left_contact, right_contact]
Returns:
action: Integer in {0, 1, 2, 3}
0 = do nothing
1 = fire left engine
2 = fire main engine (upward thrust)
3 = fire right engine
"""
x, y, vx, vy, angle, angular_vel, left_contact, right_contact = state
# Your control logic here
return 0
Example: Simple Heuristic Policy¶
def policy(state):
x, y, vx, vy, angle, angular_vel, left_contact, right_contact = state
# Fire main engine if falling fast
if vy < -1.0:
return 2
# Correct angle
if angle > 0.2:
return 3 # Fire right to tilt left
elif angle < -0.2:
return 1 # Fire left to tilt right
# Hover and descend slowly
if vy < -0.5:
return 2
return 0
Available Interfaces¶
| Interface | Algorithm | Description |
|---|---|---|
EvoEngineerControlInterface |
EvoEngineer | Recommended — uses control-specific prompts |
EvoEngineerPythonInterface |
EvoEngineer | Generic Python interface |
EoHPythonInterface |
EoH | Heuristic evolution |
FunSearchPythonInterface |
FunSearch | Function search |
# Import options
from evotoolkit.task.python_task.control_box2d import EvoEngineerControlInterface
from evotoolkit.task.python_task import EvoEngineerPythonInterface, EoHPythonInterface
LunarLanderTask API¶
class LunarLanderTask(PythonTask):
def __init__(
self,
num_episodes: int = 10, # Episodes per evaluation
max_steps: int = 1000, # Max steps per episode
render_mode: str | None = None, # "human" to visualize
use_mock: bool = False, # Return random score (for testing)
seed: int | None = None, # Random seed
timeout_seconds: float = 60.0, # Execution timeout
)
Key Methods:
| Method | Description |
|---|---|
evaluate_code(code_str) |
Evaluate a policy string, returns EvaluationResult |
make_init_sol_wo_other_info() |
Get the built-in baseline policy as a Solution |
EvaluationResult.additional_info keys:
| Key | Description |
|---|---|
avg_reward |
Average reward over episodes |
std_reward |
Standard deviation of rewards |
success_rate |
Fraction of episodes with reward > 200 |
min_reward |
Minimum episode reward |
max_reward |
Maximum episode reward |
Tips for Better Results¶
- Use more episodes for stable evaluation:
num_episodes=10or more - Set a seed for reproducibility during development
- Use
EvoEngineerControlInterface— it includes control-specific domain knowledge in prompts - Increase
max_generationsfor harder tasks
Next Steps¶
- Customizing Evolution Methods — Modify prompts for better control policies
- Advanced Usage — Low-level API and configuration
- API Reference — Complete API documentation