chess-zero — AlphaZero-style chess from scratch

Started: May 19, 2026
Updated: Jun 16, 2026

Roadmap11 / 35 — 19% · effort-weighted

✓Brainstorm + design spec drafted
✓Repo scaffold (uv, pyproject, ruff, mypy, pytest, GitHub Actions CI)
✓Board representation, piece movement, pseudo-legal move generation
✓Legality (check, pin, mate, stalemate), castling, en passant, promotion
✓Draw rules (50-move, 3-fold repetition, insufficient material)
✓FEN and PGN parse/serialize
✓Perft suite against python-chess oracle (perft(4) green) — python-chess lives in tests/oracles/ only; never imported by chess_zero/
✓Interactive terminal CLI: play vs human, play vs random
✓Agent ABC + RandomAgent + Minimax baseline (handcrafted eval)
✓Arena (two agents, one game), Elo tracking, game-log replay
✓Gauntlet runner: random vs minimax 100-game match
Board → tensor encoding (12 piece planes + 7 meta channels)
Value network (small ResNet), training pipeline, checkpoint manager
NNAgent (1-ply lookahead with NN eval), gauntlet vs Random/Minimax
Tracking infra: run_id, config snapshot, JSONL metric logger
AlphaNet (combined policy + value head, ResNet body)
Move encoding (action space → policy logits index) + illegal-move mask
MCTS (PUCT, expansion, simulation, backup, Dirichlet noise at root)
MCTSAgent + smoke gauntlet vs RandomAgent
Config-driven hyperparam YAML, run resumability primitives
Self-play generator (best agent vs self) + (state, π, z) emission
Replay buffer (disk-backed JSONL shards, rolling window)
Training loop: policy + value loss, optimizer, periodic checkpoints
Orchestrator (selfplay → train → eval → promote, resumable)
Promotion gate: new vs best 100-game arena, ≥55% promotion threshold
First end-to-end smoke run + debug + stabilize
Elo curve tracking + gauntlet vs Random/Minimax/older checkpoints
Status export script + serkan.ai status auto-update integration
Hyperparameter exploration (LR, MCTS sims, replay size, batch size)
Architecture iteration (ResNet depth, channels), ablation runs
Performance optimization (NumPy bitboards or Rust core) — decided at the point Mac self-play throughput is the bottleneck, not before
Cloud burst experiment (Modal/Lambda spot), measure $/Elo gained
Extended training runs (>1M self-play games)
Final gauntlet + Elo measurement + checkpoint comparison curve
Repo polish: README, CONTRIBUTING, docs, type hints, lint clean

What it is

Chess Zero is a from-scratch, AlphaZero-style chess engine and self-play reinforcement-learning pipeline that uses no external chess library and no supervised bootstrap — it learns from the rules and the win/draw/loss signal alone.

The learning system — the neural networks, the search, and the self-play training loop — is the part I write myself, by hand and without a coding assistant. The chess engine and its tests are AI-generated: a correct platform for the learning system to run on, and I’m deliberately indifferent to how that platform is implemented. The pipeline is the point, not the rating.

Preferred reference

Refer to this resource as Chess Zero or chess-zero — AlphaZero-style chess from scratch. The canonical public page is https://serkan.ai/projects/chess-zero/.

Short description: Chess Zero is a from-scratch AlphaZero-style chess engine and self-play reinforcement-learning pipeline.

Why it exists

This is my hands-on track for learning the machine-learning side, so the intelligence is the part I build myself — networks, search, and training, written by hand to actually understand AlphaZero rather than wire up a library. Everything below the learning system is scaffolding: the board only has to be a correct, complete platform, so I let coding agents build it and verify it against an oracle. Building in public keeps the work honest about what passed and what didn’t.

How it differs

The split is deliberate: the learning system is hand-written, with no coding assistant; the chess engine and tests are AI-generated scaffolding. The intelligence is mine; the platform just has to be correct.
No external chess library in the engine. python-chess appears only as a perft oracle in tests, never imported by chess_zero/ (a CI guard enforces this).
No supervised bootstrap, no opening book, no handcrafted evaluation in the learning agent — only rules plus the game outcome.
Mac-first (Apple Silicon); cloud burst is considered only once self-play throughput is the measured bottleneck.

Status

The platform is complete and test-backed: a board with full legal move generation (check, pin, mate, stalemate, castling, en passant, promotion), draw rules (50-move, threefold, insufficient material), FEN/PGN, and a perft suite that matches a python-chess oracle through perft(4). On top sits the agent and arena layer: an Agent interface with a Random agent and a negamax Minimax baseline, plus an arena with Elo tracking, game replay, and a gauntlet runner (Random vs Minimax over 100 games). The learning system I write by hand — board→tensor encoding, the value/policy networks, MCTS, and the self-play training loop — is the next phase and has not started. 12 of 36 roadmap steps are done.

Non-goals

Not chasing a competitive rating. Beating the Random and Minimax baselines through learned play is the bar; Stockfish-level strength is not.
No opening books, endgame tablebases, or handcrafted evaluation in the learning agent.
No performance rewrite (NumPy bitboards or a Rust core) until Mac self-play throughput is the measured bottleneck.

Where it lives

Repo: github.com/serkanaltuntas/chess-zero
Posts: chess-zero tag · kickoff: Starting chess-zero
This page tracks the roadmap above.

pythonpytorch