> This is the markdown version of https://www.maniac.ai/adaptive-inference. Visit the full page for interactive content.


\[ MANIAC \]

# rl as a service  
inference api

Reliable AI, out of the box.

RL inference as a swap-in model endpoint: reliability at massive scale, without managing prompt engineering.

Get Access[Agent Docs](https://docs.maniac.ai/agent-setup/agent-setup)

0.2

hallucinations per 1K samples

20%

avg accuracy gain over GPT 5.2

60x

avg cost savings at 1M req/day

[Why RL inference ](#production-gap)[How it works ](#how-it-works)

PythonTypeScript

1

\# pip install maniac

2

from maniac import Maniac

3

 

4

client = Maniac(api\_key="your-maniac-api-key")

5

 

6

container = client.containers.create(

7

label="my-container",

8

initial\_model="openai/gpt-5.2",

9

)

10

 

11

response = client.chat.completions.create(

12

model="my-container",

13

messages=\[

14

{"role": "user", "content": "Hello!"},

15

\],

16

)

\[ THE PRODUCTION GAP \]

## Big labs aren't designed to give you  
reliability at scale.

Intelligence isn't the same thing as reliability. We've seen it on repeat: enterprise AI pilots that don't make it to production. The lifecycle looks like this:

the enterprise ai journey~70% of pilots stall here

01

### Demo looks great

92% accuracy with GPT 5.2. Pilot approved.

02

### Try to scale it

Cutting-edge models are too expensive to run 100,000 times a day. Team tweaks prompts, switches models.

03

### Babysitting the model

Model ignores explicit prompt instructions. Have to add guardrails, figure out post-training. Customers change, model changes, start over.

04

### Lower the bar

The model still doesn't do what customers need too often. The task is simplified until it's no longer that useful.

With Maniac

Skip steps 3–5. Maniac bakes reliability into the model endpoint, so the model you call is actually usable in production.

your app → maniac

live

Client

Your App

Same OpenAI SDK call

calls today

2,847,291

req

res

Maniac API

Model Optimization

Post-training, distillation, SLMs tuned to your domain

Runtime Reliability

Corrections, constraints, retries, schema enforcement

Continuous Loop

traffic → eval → optimize → promote → auto-update

Production-grade reliability · baked into the endpoint

99.9%

schema compliance

Drop-in

OpenAI compatible

<400ms

p50 latency

10M+

calls / day

\[ WHAT YOU STOP BUILDING \]

## Getting an LLM to work  
requires far more than calling a model.

You end up building layers for reliability, formatting, retries, and constraint enforcement. Maniac bakes all of that into the model endpoint.

Prompt engineering

Iterating on prompts to get the model to follow instructions

Handled by Maniac

Retries & fallbacks

Handling failures, timeouts, and switching between models

Handled by Maniac

Output parsing

Extracting structured data from free-text model responses

Handled by Maniac

Schema enforcement

Validating outputs match your expected format

Handled by Maniac

Eval loops

Building test suites to measure quality over time

Handled by Maniac

Fine-tuning

Training custom models when prompting isn't enough

Handled by Maniac

\[ THREE THINGS, ONE ENDPOINT \]

## Maniac combines what's usually separate.

Model optimization (post-training, SLMs) + Runtime reliability (corrections, constraints, retries) + Simple API surface (drop-in replacement).

Platform

What they do

What's missing

OpenAI / Anthropic

Strong base model

No reliability layer

Open-source models

Control + cost

You build everything else

Guardrails tools

Output constraints

No model optimization

Observability

Debugging & logging

Doesn't fix the model

Maniac

Model optimization + Runtime reliability + Drop-in API, all in one endpoint

\[ HOW IT WORKS \]

## From first call to  
production-grade model.

Start logging. Maniac handles the rest: post-training, evals, and routing, so you swap to a better model when you're ready, not when you've built an ML platform.

01

### Add the hooks

Point your existing calls at Maniac. Same OpenAI-compatible API: your code doesn't change. Maniac records every input and output, learning what kind of task you have.

[Setup guide →](https://docs.maniac.ai/agent-setup/agent-setup)

pythonagent.py

\# Add the Maniac hook (2 lines of code).

from maniac import Maniac

client = Maniac()

\# Use the same OpenAI-style call.

\# Maniac records every input + output.

response = client.chat.completions.create(

model="my-extraction-agent",

messages=\[{"role": "user", "content": prompt}\],

)

02

### We post-train in the background

After 1,000 logged completions, Maniac automatically kicks off cutting-edge post-training (fine-tuning, distillation, and optimization), all in the background. Your production traffic is unaffected.

pythonmaniac logs

\[maniac\] Container: extraction-agent

\[maniac\] Logged completions: 1,247; threshold reached

\[maniac\] ▶ Starting post-training pipeline

\[maniac\] Training candidate: qwen3-14b-lora-r64

\[maniac\] Dataset: 1,247 curated pairs

\[maniac\] Estimated time: ~45 min

\# Runs in the background.

\# Your production traffic is unaffected.

03

### See evals vs. frontier models

Maniac evaluates your tuned model against GPT 5.2, Claude Opus 4.6, and other frontier models on your actual data. You see accuracy, cost, and latency side by side.

pythoneval results

\[maniac\] Eval complete: results vs frontier:

GPT 5.2 82.1% accuracy

Claude Opus 4.6 84.7% accuracy

Maniac tuned 97.3% accuracy

\[maniac\] Cost: $0.20/1M tokens vs $15.00 frontier

\[maniac\] Latency: 340ms p50

04

### Swap over when you're ready

When the numbers look right, flip inference to Maniac. Same endpoint, same code, now backed by a model built for your workload.

pythonagent.py

\# When you're ready: flip inference to Maniac.

\# Same endpoint. Same code. Better model.

response = client.chat.completions.create(

model="my-extraction-agent", \# now routes to your tuned model

messages=\[{"role": "user", "content": prompt}\],

)

\[maniac\] ✓ Inference routed to maniac-opt-v3

\[maniac\] ✓ Zero downtime. Zero code changes.

## Engineering Blog

Deep dives on model optimization, agent throughput, and the economics of running intelligence at scale, plus updates from the Maniac team.

[View the blog](/blog)

[

Model LandscapeApr 2, 2026

### Qwen 3.5 vs Gemma 4: the benchmark-by-size comparison

A deployment-class comparison of Qwen 3.5 and Gemma 4 that separates official model-card benchmark overlap from third-party Arena AI chat-preference evidence.

](/blog/qwen-3-5-vs-gemma-4-benchmarks-by-size)[

VPCMar 25, 2026

### Private cloud and VPC AI agents: run frontier-quality automation without shipping data out of your network

ChatGPT and Claude run on their vendors’ clouds, not inside your VPC. Maniac’s agent platform lets you describe workflows and automations in natural language and run them on-premises or in your VPC so data and orchestration stay under your controls.

](/blog/private-cloud-vpc-ai-agents)

\[ GET STARTED \]

## Stop building  
reliability.

Drop-in OpenAI replacement. Production-grade reliability from the first call. Start in minutes.

Get Access[Agent Docs](https://docs.maniac.ai/agent-setup/agent-setup)

terminal

$ pip install maniac
$ maniac init --container my-extraction-agent  ✓ Container created: my-extraction-agent  ✓ Initial model: openai/gpt-5  ✓ Endpoint: https://api.maniac.ai/v1
$ maniac status  ┌─────────────────────────────────────────────┐  │ Container: my-extraction-agent              │  │ Status:    ● active                           │  │ Reliability: 99.94% success rate           │  │ Schema:    99.9% compliance              │  │ Calls:     2.4M today                     │  └─────────────────────────────────────────────┘

---

*Maniac, High throughput background agents. Opus-quality outputs at 1/50 of the cost. Learn more at [maniac.ai](https://www.maniac.ai).*