CANN Init (Ascend NPU Operator) Tutorial¶

Learn how to use LLM-driven evolution to generate and optimize Ascend C operator kernel code for Huawei Ascend NPUs.

Hardware Requirement

This task requires Huawei Ascend NPU hardware and the CANN toolkit to be installed. It cannot run in standard CPU/GPU environments.

Experimental Adjacent Workflow

CANN Init is kept in EvoToolkit as an experimental adjacent workflow. It is documented for completeness, but it is not part of the primary reviewed surface for the MLOSS submission.

Complete Example Code

See the example directory for scripts:

examples/cann_init/ - Agent and evaluator scripts
README.md - Usage guide

Overview¶

This tutorial demonstrates:

Creating a CANN Init task for Ascend C operator generation
Using LLM-driven evolution to generate optimized Ascend C kernel code
Understanding the operator signature and template system
Evaluating operator correctness and performance on Ascend NPU hardware

EvoToolkit treats Ascend C operator generation as an optimization problem: given a Python reference implementation, evolve Ascend C kernel code that is both correct and performant.

Prerequisites¶

Hardware¶

Huawei Ascend NPU (tested on Ascend910B2)

Software¶

# Install CANN toolkit from Huawei (see official documentation)
# https://www.hiascend.com/software/cann

# Install EvoToolkit with CANN support
pip install evotoolkit[cann_init]

This installs:

pybind11 - For Python/C++ binding generation
Other CANN-related dependencies

Understanding the CANN Init Task¶

What Does the Task Generate?¶

The task evolves Ascend C kernel code (C++ for Ascend NPU). Given:

An operator name (e.g., "relu", "layer_norm")
A Python reference implementation (correct but not optimized)

The LLM generates Ascend C kernel code that implements the same operation using Ascend C APIs (Data Copy, Compute, Tiling, etc.).

Template System¶

EvoToolkit automatically generates the surrounding code (host code, tiling configuration, Python bindings) from templates. The LLM only needs to provide the kernel implementation.

Evaluation¶

Each generated kernel is:

Compiled using the CANN toolkit
Tested for correctness against the Python reference
Benchmarked for performance (throughput, latency)

Quick Start¶

Step 1: Define the Python Reference¶

PYTHON_REFERENCE = '''
def relu(x):
    """ReLU activation: max(0, x)"""
    import numpy as np
    return np.maximum(0, x)
'''

Step 2: Create the Task¶

from evotoolkit.task.cann_init import CANNInitTask

task = CANNInitTask(
    data={
        "op_name": "relu",
        "python_reference": PYTHON_REFERENCE,
        "npu_type": "Ascend910B2",   # Your NPU model
        "cann_version": "8.0",        # Your CANN version
    },
    project_path="/tmp/cann_projects",  # Directory for compiled artifacts
)

print(f"Operator: {task.task_info['op_name']}")
print(f"NPU type: {task.task_info['npu_type']}")

Step 3: Evaluate a Kernel¶

kernel_code = '''
// Ascend C ReLU kernel implementation
class KernelRelu {
public:
    __aicore__ inline void Init(GM_ADDR x, GM_ADDR y, uint32_t totalLength) {
        // ... initialization code ...
    }

    __aicore__ inline void Process() {
        // ... computation code ...
    }
};
'''

result = task.evaluate_code(kernel_code)

if result.valid:
    print(f"Score: {result.score:.4f}")
    print(f"Correctness: {result.additional_info.get('correctness')}")
    print(f"Performance: {result.additional_info.get('performance')}")
else:
    print(f"Error: {result.additional_info.get('error')}")

Step 4: Run Evolution¶

import evotoolkit
from evotoolkit.task.cann_init.method_interface import CANNIniterInterface
from evotoolkit.tools.llm import HttpsApi

# Create interface
interface = CANNIniterInterface(task)

# Configure LLM
llm_api = HttpsApi(
    api_url="api.openai.com",
    key="your-api-key-here",
    model="gpt-4o"
)

# Run evolution
result = evotoolkit.solve(
    interface=interface,
    output_path='./cann_results',
    running_llm=llm_api,
    max_generations=5,
    pop_size=3,
)

print(f"Best kernel found:")
print(result.sol_string)
print(f"Score: {result.evaluation_res.score:.4f}")

`CANNInitTask` API¶

class CANNInitTask(BaseTask):
    def __init__(
        self,
        data: dict,            # Task configuration (see below)
        project_path: str | None = None,  # Default directory for compiled artifacts
        fake_mode: bool = False,          # Skip evaluation (for testing)
    )

data dictionary keys:

Key	Required	Description
`op_name`	Yes	Operator name (e.g., `"relu"`, `"layer_norm"`)
`python_reference`	Yes	Python reference implementation (string)
`npu_type`	No	NPU model (default: `"Ascend910B2"`)
`cann_version`	No	CANN version (default: `"8.0"`)

Key Methods:

Method	Description
`evaluate_code(kernel_src)`	Evaluate kernel code string, returns `EvaluationResult`
`evaluate_solution(solution)`	Rich interface with `other_info` for advanced options

Advanced evaluate_solution options via other_info:

from evotoolkit.core import Solution

# Compile-only mode (for parallel workflows)
solution = Solution(
    sol_string=kernel_src,
    other_info={
        "project_path": "/compile/sol_001",
        "compile_only": True,
        "save_compile_to": "/compile/sol_001",
    }
)
compile_result = task.evaluate_solution(solution)

# Load pre-compiled artifact for testing
solution = Solution(
    sol_string="",
    other_info={
        "load_from": "/compile/sol_001",
    }
)
test_result = task.evaluate_solution(solution)

Supported Operators¶

The CANN Init task can be applied to any operator expressible in Python:

Category	Examples
Element-wise	ReLU, Sigmoid, GELU, Add, Multiply
Reduction	Softmax, LayerNorm, Sum, Mean
Matmul	GEMM, Attention (SDPA)
Custom	Any operator with a Python reference

Tips for Better Results¶

Provide a clear Python reference — The LLM uses it to understand the operator semantics
Start with simple operators (element-wise) before complex ones (matmul)
Use fake_mode=True during development to test the pipeline without hardware
Check CANN documentation for available Ascend C APIs and tiling patterns

Troubleshooting¶

Issue	Solution
Compilation error	Check CANN environment variables and toolkit installation
Correctness failure	Review the Python reference for edge cases
Performance below baseline	LLM may need domain knowledge about Ascend C tiling

Next Steps¶

Customizing Evolution Methods — Add domain knowledge to prompts
Advanced Usage — Parallel compilation and advanced workflows
API Reference — Complete API documentation