Maniac Docs
Inference

Streaming and Reasoning

StreamChunk kinds, agent-level streaming, reasoning configuration, and multimodal Message.content.

Streaming from the runner

runAgentStream and Model.stream both yield progressive output. At the runner level, stream items are trace events (kind: "token", kind: "tool", etc.) terminated by the final AgentResult:

import { runAgentStream } from "@maniac-ai/agents";

for await (const item of runAgentStream(spec, "Explain quantum tunneling.")) {
  if (
    item != null &&
    typeof item === "object" &&
    "kind" in item &&
    item.kind === "token" &&
    item.chunk_kind === "text"
  ) {
    process.stdout.write(item.delta);
  }
  if ("final" in item) {
    console.log("\nDone:", item.usage);
  }
}

Maniac.chatStream wraps the same events in StreamEnvelope:

for await (const env of app.chatStream("support", "Hello", { threadId: "t1" })) {
  if (env.type === "event" && env.event.kind === "token") {
    // render token
  }
}

StreamChunk kinds

At the model layer, stream() yields StreamChunk objects:

kindDescription
tokenText delta (chunk_kind: "text")
reasoningExtended-thinking / chain-of-thought delta
tool_callPartial or complete tool call
usageIncremental token usage

mergeStreamChunks folds a chunk stream into InferenceResponse, accumulating reasoning text separately in response.reasoning.

Correlation IDs

Token events optionally carry turn_id, message_id, block_id, and thread_id for UI routing. See Streaming and tracing.

Reasoning configuration

Reasoning-capable families accept a normalized ReasoningConfig on both Agent.reasoning and InferenceRequest.reasoning:

import { OpenAICompatibleModel, AnthropicModel, type Agent } from "@maniac-ai/agents";

const spec: Agent = {
  id: "researcher",
  instructions: "Be thorough.",
  model: new OpenAICompatibleModel({ slug: "gpt-5" }),
  reasoning: { effort: "high" }
};
FieldOpenAI-compatible / OpenRouterAnthropic
effort: "minimal" | "low" | "medium" | "high"reasoning_effortmapped to thinking.budget_tokens (1024 / 4096 / 10000 / 24000)
max_tokens: numberignored (chat completions)thinking.budget_tokens (exact)
summary: "auto" | "concise" | "detailed"ignored (Responses API only)no equivalent

A prepare_step hook can override per-turn by setting request.reasoning; the runner does not merge the spec default on top.

Response-side reasoning

OpenAICompatibleModel surfaces provider reasoning from:

  • message.reasoning / delta.reasoning — OpenRouter-normalized form
  • message.reasoning_content / delta.reasoning_content — DeepSeek-R1, vLLM, Qwen native form

Both populate InferenceResponse.reasoning on infer() and emit StreamChunk(kind: "reasoning") on stream().

Multimodal content

Message.content accepts string (back-compat) or ContentPart[]:

type ContentPart =
  | { type: "text"; text: string; cache_control?: { type: "ephemeral" } }
  | { type: "image"; source: ImageSource }
  | { type: "file"; ... };

Anthropic forwards cache_control for prompt caching. OpenAI-compatible adapters strip it with a warning.

Cancellation

Pass signal: AbortSignal through RunOptions or ModelCallOptions. Aborting raises RunCancelledError with the partial transcript on error.partial.

On this page