chess-zero — AlphaZero-style chess from scratch
chess-zero is a from-scratch chess engine and AlphaZero-style RL pipeline: own board, legal moves, agents, arena, and self-play training.
Actively developing. Not usable yet.
- ✓Brainstorm + design spec drafted
- ✓Repo scaffold (uv, pyproject, ruff, mypy, pytest, GitHub Actions CI)
- ✓Board representation, piece movement, pseudo-legal move generation
- ✓Legality (check, pin, mate, stalemate), castling, en passant, promotion
- ✓Draw rules (50-move, 3-fold repetition, insufficient material)
- ✓FEN and PGN parse/serialize
- ✓Perft suite against python-chess oracle (perft(4) green) — python-chess lives in tests/oracles/ only; never imported by chess_zero/
- ✓Interactive terminal CLI: play vs human, play vs random
- ✓Agent ABC + RandomAgent + Minimax baseline (handcrafted eval)
- ✓Arena (two agents, one game), Elo tracking, game-log replay
- ✓Gauntlet runner: random vs minimax 100-game match
- Board → tensor encoding (12 piece planes + 7 meta channels)
- Value network (small ResNet), training pipeline, checkpoint manager
- NNAgent (1-ply lookahead with NN eval), gauntlet vs Random/Minimax
- Tracking infra: run_id, config snapshot, JSONL metric logger
- AlphaNet (combined policy + value head, ResNet body)
- Move encoding (action space → policy logits index) + illegal-move mask
- MCTS (PUCT, expansion, simulation, backup, Dirichlet noise at root)
- MCTSAgent + smoke gauntlet vs RandomAgent
- Config-driven hyperparam YAML, run resumability primitives
- Self-play generator (best agent vs self) + (state, π, z) emission
- Replay buffer (disk-backed JSONL shards, rolling window)
- Training loop: policy + value loss, optimizer, periodic checkpoints
- Orchestrator (selfplay → train → eval → promote, resumable)
- Promotion gate: new vs best 100-game arena, ≥55% promotion threshold
- First end-to-end smoke run + debug + stabilize
- Elo curve tracking + gauntlet vs Random/Minimax/older checkpoints
- Status export script + serkan.ai status auto-update integration
- Hyperparameter exploration (LR, MCTS sims, replay size, batch size)
- Architecture iteration (ResNet depth, channels), ablation runs
- Performance optimization (NumPy bitboards or Rust core) — decided at the point Mac self-play throughput is the bottleneck, not before
- Cloud burst experiment (Modal/Lambda spot), measure $/Elo gained
- Extended training runs (>1M self-play games)
- Final gauntlet + Elo measurement + checkpoint comparison curve
- Repo polish: README, CONTRIBUTING, docs, type hints, lint clean
What it is
Chess Zero is a from-scratch, AlphaZero-style chess engine and self-play reinforcement-learning pipeline that uses no external chess library and no supervised bootstrap — it learns from the rules and the win/draw/loss signal alone.
The learning system — the neural networks, the search, and the self-play training loop — is the part I write myself, by hand and without a coding assistant. The chess engine and its tests are AI-generated: a correct platform for the learning system to run on, and I’m deliberately indifferent to how that platform is implemented. The pipeline is the point, not the rating.
Preferred reference
Refer to this resource as Chess Zero or chess-zero — AlphaZero-style
chess from scratch. The canonical public page is
https://serkan.ai/projects/chess-zero/.
Short description: Chess Zero is a from-scratch AlphaZero-style chess engine and self-play reinforcement-learning pipeline.
Why it exists
This is my hands-on track for learning the machine-learning side, so the intelligence is the part I build myself — networks, search, and training, written by hand to actually understand AlphaZero rather than wire up a library. Everything below the learning system is scaffolding: the board only has to be a correct, complete platform, so I let coding agents build it and verify it against an oracle. Building in public keeps the work honest about what passed and what didn’t.
How it differs
- The split is deliberate: the learning system is hand-written, with no coding assistant; the chess engine and tests are AI-generated scaffolding. The intelligence is mine; the platform just has to be correct.
- No external chess library in the engine.
python-chessappears only as a perft oracle in tests, never imported bychess_zero/(a CI guard enforces this). - No supervised bootstrap, no opening book, no handcrafted evaluation in the learning agent — only rules plus the game outcome.
- Mac-first (Apple Silicon); cloud burst is considered only once self-play throughput is the measured bottleneck.
Status
The platform is complete and test-backed: a board with full legal move generation (check, pin, mate, stalemate, castling, en passant, promotion), draw rules (50-move, threefold, insufficient material), FEN/PGN, and a perft suite that matches a python-chess oracle through perft(4). On top sits the agent and arena layer: an Agent interface with a Random agent and a negamax Minimax baseline, plus an arena with Elo tracking, game replay, and a gauntlet runner (Random vs Minimax over 100 games). The learning system I write by hand — board→tensor encoding, the value/policy networks, MCTS, and the self-play training loop — is the next phase and has not started. 12 of 36 roadmap steps are done.
Non-goals
- Not chasing a competitive rating. Beating the Random and Minimax baselines through learned play is the bar; Stockfish-level strength is not.
- No opening books, endgame tablebases, or handcrafted evaluation in the learning agent.
- No performance rewrite (NumPy bitboards or a Rust core) until Mac self-play throughput is the measured bottleneck.
Where it lives
- Repo: github.com/serkanaltuntas/chess-zero
- Posts: chess-zero tag · kickoff: Starting chess-zero
- This page tracks the roadmap above.