@maniac-ai/agents · the TypeScript agent SDK

The agent SDK built for the models you actually run

Most agent frameworks assume a frontier model that never makes a mistake. @maniac-ai/agents assumes the opposite — and engineers for it. It's built for small, open, and local models: the 7Bs and 14Bs on Mach, vLLM, Ollama, llama.cpp and MLX.

$ npm install @maniac-ai/agentsRead the SDK docs
quickstart.ts
import { OpenAICompatibleModel, runAgent } from "@maniac-ai/agents";

const model = new OpenAICompatibleModel({ slug: "qwen2.5-7b-instruct",
  baseUrl: "http://localhost:8000/v1", requireApiKey: false });

const result = await runAgent(
  { id: "assistant", instructions: "Be concise.", model },
  "Summarize the release checks."
);

localhost

Point at any OpenAI-compatible server — vLLM, Ollama, llama.cpp, SGLang, MLX.

top-10

Only the most relevant tools enter the prompt each turn, not your whole catalog.

depth-bounded

Recursive subagents keep messy multi-step work in isolated context.

auto-repair

Malformed tool arguments and JSON are caught and retried, never silently dropped.

Point it at localhost. That's the whole setup.

Any server that speaks the OpenAI Chat Completions API is one constructor away — vLLM, Ollama, llama.cpp, MLX. Just change the base URL. Or implement the one-method Model interface for any other runtime.

agent.ts
import { OpenAICompatibleModel, runAgent } from "@maniac-ai/agents";

const model = new OpenAICompatibleModel({
  slug: "qwen2.5-7b-instruct",
  baseUrl: "http://localhost:8000/v1", // vLLM · Ollama · llama.cpp
  requireApiKey: false, // local servers don't check
});

await runAgent({ id: "assistant", instructions: "Be concise.", model }, prompt);

Hundreds of tools, without drowning the context window

Every unused tool schema is context a small model pays for and gets distracted by. Register large tool sets as toolsets, and each turn the SDK surfaces only the handful most relevant to the query — a catalog of 200 for the cost of ~10.

Recursive subagents that keep context where it belongs

A whole toolset can collapse into one { prompt } tool that spins up a nested subagent — its own fresh context, the messy work done in isolation, just the answer handed back. Recursive and bounded by max_depth, so the main transcript stays clean.

Small models break JSON. We expect it.

Malformed tool arguments surface as a structured error instead of running with empty args. Structured output is schema-validated with an automatic repair loop, and parsing tolerates quirks like markdown-fenced JSON.

repair loop
// model returns near-JSON, wrapped in a fence
```json
{ "total": 42, } // trailing comma
```

// → fence stripped, validated against output_model
rejected: "Unexpected token } in JSON"
// → model gets its own error, asked to resubmit
output_repair_attempts: 2 remaining

{ "total": 42 } // ✓ valid on retry

Start small, escalate only when you must

Small models are fast and cheap until they're stuck. RetryingModel absorbs flaky servers, FallbackModel cascades 7B → 14B → hosted, and a hook can route one hard turn to a bigger model — then drop back down.

local-7Bprimary · cheap + fast
on failure (after retries)
local-14Bfallback
on failure (after retries)
hosted frontierlast resort

Make a small context window go far

A compactor folds older turns into a summary, a relevance filter drops what's irrelevant, and working memory keeps facts out of context — so the model carries less each turn. All of it fails open.

Before · long transcript

LMSummarizingCompactor · middle folded, tool pairs preserved

After · fits the window

System prompt and recent turns kept verbatim; the middle becomes a summary. Fails open — an error continues the run untouched.

Hard limits and a black box, for models that surprise you

Loose sampling, runaway loops, and flaky servers come with the territory. The runtime caps them with budgets and guardrails and records every step — so a misbehaving model is bounded and debuggable, not a mystery.

Budgets

Token, cost, wall-clock, and iteration caps enforced every turn.

Guardrails

Allow, rewrite, block, or require approval on LM and tool calls.

Resume

Interrupted runs persist their transcript and seal pending tool calls.

Tracing

Every step, tool, and token mapped to OpenTelemetry spans.

A different design center, not a longer feature list

The Vercel AI SDK and Mastra are excellent — for the models they assume. @maniac-ai/agents starts from a different premise: the model is small, local, and fallible. Here's where that changes the engineering.

Where you run
OthersHosted frontier APIs first
@maniac-ai/agentsPoint at localhost — any OpenAI-compatible server, no key required
Large tool catalogs
OthersAll schemas in the prompt; fine for big models
@maniac-ai/agentsPer-turn lexical search surfaces only the top-k relevant tools
Multi-step subtasks
OthersInlined into one growing context
@maniac-ai/agentsDelegated to isolated, depth-bounded subagents; only the answer returns
Malformed tool args
OthersAssumed rare; often coerced silently
@maniac-ai/agentsSurfaced as an error, handler skipped, model recovers
Structured output
OthersAssumed valid
@maniac-ai/agentsSchema-validated repair loop + fence-tolerant parsing
Reliability
OthersOne strong model
@maniac-ai/agentsRetry + small→large fallback + per-turn escalation
Cost & runaway control
OthersLight
@maniac-ai/agentsToken, cost, wall-clock, and iteration budgets enforced

Build agents on the models you own

@maniac-ai/agents is the runtime inside the Maniac app — or install it standalone and point it at your own local server.

$ npm install @maniac-ai/agentsDownload for MacRead the SDK docs