# PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

**Authors:** M. U. Tariq, A. Jangda, A. Moreira, M. Musuvathi, T. Sorensen  
**Venue:** ArXiv, 2025  
**PDF:** [peak2025.pdf](../peak2025.pdf) | **Full Markdown:** [peak2025.md](../markdown/peak2025.md)

This paper introduces Peak, an AI-powered framework for optimizing GPU kernels using natural language transformation specifications executed by LLMs.

## Key Contributions

- **Natural transformations**: Optimization strategies expressed in natural language (from general "unroll a loop" to specific "tile the inner loop over the K dimension"), easily specialized to specific kernels and hardware.
- **Modular infrastructure**: Kernel contexts, correctness validators, and performance evaluators provide rigorous checking of LLM-generated transformations.
- **Cross-backend support**: Instantiated for three backends — CUDA, HIP, and HLSL — with 16 natural transformations for matrix multiplication.
- **Competitive results**: Implementations match vendor libraries when available, and for HLSL (without a library) match hardware-documented FLOPS.

## Summary

Peak captures the workflow of expert performance engineers by expressing iterative code optimizations as natural language specifications that LLMs execute. Unlike prior all-or-nothing automation approaches, Peak supports human-AI collaboration, interpretable iterative refinement, and extensible modular interfaces. It can be used interactively by performance engineers or driven autonomously by AI agents.