Building Fresh
chess-zero — AlphaZero-style chess from scratch
A chess engine and AlphaZero-style self-play RL pipeline built from scratch — own board, no supervised bootstrap, only rules and the win/draw/loss signal. The pipeline is the point, not the rating.
Actively developing. Not usable yet.
Roadmap2 / 35 — 1% · effort-weighted
- ✓Brainstorm + design spec drafted
- ✓Repo scaffold (uv, pyproject, ruff, mypy, pytest, GitHub Actions CI)
- Board representation, piece movement, pseudo-legal move generation
- Legality (check, pin, mate, stalemate), castling, en passant, promotion
- Draw rules (50-move, 3-fold repetition, insufficient material)
- FEN and PGN parse/serialize
- Perft suite against python-chess oracle (perft(4) green) — python-chess lives in tests/oracles/ only; never imported by chess_zero/
- Interactive terminal CLI: play vs human, play vs random
- Agent ABC + RandomAgent + Minimax baseline (handcrafted eval)
- Arena (two agents, one game), Elo tracking, game-log replay
- Gauntlet runner: random vs minimax 100-game match
- Board → tensor encoding (12 piece planes + 7 meta channels)
- Value network (small ResNet), training pipeline, checkpoint manager
- NNAgent (1-ply lookahead with NN eval), gauntlet vs Random/Minimax
- Tracking infra: run_id, config snapshot, JSONL metric logger
- AlphaNet (combined policy + value head, ResNet body)
- Move encoding (action space → policy logits index) + illegal-move mask
- MCTS (PUCT, expansion, simulation, backup, Dirichlet noise at root)
- MCTSAgent + smoke gauntlet vs RandomAgent
- Config-driven hyperparam YAML, run resumability primitives
- Self-play generator (best agent vs self) + (state, π, z) emission
- Replay buffer (disk-backed JSONL shards, rolling window)
- Training loop: policy + value loss, optimizer, periodic checkpoints
- Orchestrator (selfplay → train → eval → promote, resumable)
- Promotion gate: new vs best 100-game arena, ≥55% promotion threshold
- First end-to-end smoke run + debug + stabilize
- Elo curve tracking + gauntlet vs Random/Minimax/older checkpoints
- Status export script + serkan.ai status auto-update integration
- Hyperparameter exploration (LR, MCTS sims, replay size, batch size)
- Architecture iteration (ResNet depth, channels), ablation runs
- Performance optimization (NumPy bitboards or Rust core) — decided at the point Mac self-play throughput is the bottleneck, not before
- Cloud burst experiment (Modal/Lambda spot), measure $/Elo gained
- Extended training runs (>1M self-play games)
- Final gauntlet + Elo measurement + checkpoint comparison curve
- Repo polish: README, CONTRIBUTING, docs, type hints, lint clean
What this is
This is a chess playing machine learning engine which will be an AlphaZero-style system. I will be building it publicly as part of my ML learning track. I will create a chess board for this system instead of relying on an existing chess library. I will be using AI coding tools on any non-ML scaffolding of the project.
Why this shape
Where it lives
- Repo: github.com/serkanaltuntas/chess-zero
- Posts: chess-zero tag
- This page tracks the roadmap.
Non-goals
Status
pythonpytorch