Open Model Lab

Post-Training and Agentic Coding Behavior

How do base, SFT, and DPO variants behave inside a simple tool-using coding-agent harness?

Status

Status: planned
Month/theme: October 2026: Agent Harness / Tool Use

Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.

Research question

How do base, SFT, and DPO variants behave inside a simple tool-using coding-agent harness?

Planned setup

Build a small tool registry and coding-agent harness.
Run base, SFT, and DPO variants on the same coding tasks.
Save replayable traces for every task attempt.

Planned measurements

Score where the grader supports a score.
Latency and cost where the run infrastructure can measure them.
Output-quality notes and failure-mode labels.
Known caveats and reproducibility requirements.
Tool-use failures, retry behavior, and trace-level diagnosis.

Planned sections

Research question and claim boundary
Setup, model variants, data versions, and config hashes
Eval suite or task design
Measurements and failure modes
Limitations, caveats, and next decision

Expected artifacts

agents module.
Small coding-agent benchmark set.
Replayable traces.

Claim boundary

This report will not claim general agent capability.

Related links

Reports index Related month page Runs