Code evals
Deterministic scoring with custom Python — validate that your task-specialized models meet production requirements.
Code evals let you define a scoring function in Python (pass/fail or numeric score) and use it as the optimization objective. They gate every optimization run — a candidate model only replaces your current frontier baseline if it passes.
Use code evals when you need deterministic checks (schema validation, exact match, unit tests, custom business rules) or when you already have a reliable evaluation function for your niche domain task.
For implementation details and examples, see the hosted docs at
https://docs.maniac.ai/evaluations/creating-evaluations.