Streaming and Reasoning

StreamChunk kinds, agent-level streaming, reasoning configuration, and multimodal Message.content.

Streaming from the runner

runAgentStream and Model.stream both yield progressive output. At the runner level, stream items are trace events (kind: "token", kind: "tool", etc.) terminated by the final AgentResult:

import { runAgentStream } from "@maniac-ai/agents";

for await (const item of runAgentStream(spec, "Explain quantum tunneling.")) {
  if (
    item != null &&
    typeof item === "object" &&
    "kind" in item &&
    item.kind === "token" &&
    item.chunk_kind === "text"
  ) {
    process.stdout.write(item.delta);
  }
  if ("final" in item) {
    console.log("\nDone:", item.usage);
  }
}

Maniac.chatStream wraps the same events in StreamEnvelope:

for await (const env of app.chatStream("support", "Hello", { threadId: "t1" })) {
  if (env.type === "event" && env.event.kind === "token") {
    // render token
  }
}

`StreamChunk` kinds

At the model layer, stream() yields StreamChunk objects:

`kind`	Description
`token`	Text delta (`chunk_kind: "text"`)
`reasoning`	Extended-thinking / chain-of-thought delta
`tool_call`	Partial or complete tool call
`usage`	Incremental token usage

mergeStreamChunks folds a chunk stream into InferenceResponse, accumulating reasoning text separately in response.reasoning.

Correlation IDs

Token events optionally carry turn_id, message_id, block_id, and thread_id for UI routing. See Streaming and tracing.

Reasoning configuration

Reasoning-capable families accept a normalized ReasoningConfig on both Agent.reasoning and InferenceRequest.reasoning:

import { OpenAICompatibleModel, AnthropicModel, type Agent } from "@maniac-ai/agents";

const spec: Agent = {
  id: "researcher",
  instructions: "Be thorough.",
  model: new OpenAICompatibleModel({ slug: "gpt-5" }),
  reasoning: { effort: "high" }
};

Field	OpenAI-compatible / OpenRouter	Anthropic
`effort: "minimal" \| "low" \| "medium" \| "high"`	`reasoning_effort`	mapped to `thinking.budget_tokens` (1024 / 4096 / 10000 / 24000)
`max_tokens: number`	ignored (chat completions)	`thinking.budget_tokens` (exact)
`summary: "auto" \| "concise" \| "detailed"`	ignored (Responses API only)	no equivalent

A prepare_step hook can override per-turn by setting request.reasoning; the runner does not merge the spec default on top.

Response-side reasoning

OpenAICompatibleModel surfaces provider reasoning from:

message.reasoning / delta.reasoning — OpenRouter-normalized form
message.reasoning_content / delta.reasoning_content — DeepSeek-R1, vLLM, Qwen native form

Both populate InferenceResponse.reasoning on infer() and emit StreamChunk(kind: "reasoning") on stream().

Multimodal content

Message.content accepts string (back-compat) or ContentPart[]:

type ContentPart =
  | { type: "text"; text: string; cache_control?: { type: "ephemeral" } }
  | { type: "image"; source: ImageSource }
  | { type: "file"; ... };

Anthropic forwards cache_control for prompt caching. OpenAI-compatible adapters strip it with a warning.

Cancellation

Pass signal: AbortSignal through RunOptions or ModelCallOptions. Aborting raises RunCancelledError with the partial transcript on error.partial.