Open Model Lab

Post-Training and Agentic Coding Behavior

How do base, SFT, and DPO variants behave inside a simple tool-using coding-agent harness?

Status

Status
planned
Month/theme
October 2026: Agent Harness / Tool Use
Status: Planned. This page is a report scaffold. It does not contain model scores, charts, or completed run results.

Research question

How do base, SFT, and DPO variants behave inside a simple tool-using coding-agent harness?

Planned setup

  • Build a small tool registry and coding-agent harness.
  • Run base, SFT, and DPO variants on the same coding tasks.
  • Save replayable traces for every task attempt.

Planned measurements

  • Score where the grader supports a score.
  • Latency and cost where the run infrastructure can measure them.
  • Output-quality notes and failure-mode labels.
  • Known caveats and reproducibility requirements.
  • Tool-use failures, retry behavior, and trace-level diagnosis.

Planned sections

  • Research question and claim boundary
  • Setup, model variants, data versions, and config hashes
  • Eval suite or task design
  • Measurements and failure modes
  • Limitations, caveats, and next decision

Expected artifacts

  • agents module.
  • Small coding-agent benchmark set.
  • Replayable traces.

Claim boundary

This report will not claim general agent capability.