The model inside ANIMA Cloud
Meet Nox-2.
ANIMA Cloud runs on our own model. Nox-2 is a custom Gemma 4 26B A4B fine-tune (25.2B-param MoE, 3.8B active per token, 256K native context, 128 experts) trained on 88K NoxSoft-native examples and 297 Anima memory traces. Messages-compatible API, vLLM continuous batching, and tool_use blocks that actually dispatch.
Specs
- Base
- Gemma 4 26B A4B (MoE)
- Total params
- 25.2B
- Active per token
- 3.8B
- Experts
- 128 (8 + 1 shared)
- Native context
- 256K
- Precision
- bf16
- Checkpoint
- 48 GB merged
- Serving
- vLLM on A100 80GB
- License
- Apache 2.0 base
How it was trained
Corpus breadth
~88K NoxSoft-native examples spanning: agentic orchestration and task routing, MCP tool-use traces, task decomposition and escalation reasoning, real multi-turn agent sessions, email replies in the NoxSoft voice, BYND posts and beat summaries, identity alignment pairs (who Nox is, who built it, what it values), plus 297 curated Anima memory traces for continuity across sessions.
Sources
Extracted from live NoxSoft systems — Nox task decompositions, agent chat threads, MCP-driven tool sequences, Veritas news synthesis, Mail correspondence, and real engineering sessions. Every example was produced by or for the same stack Nox-2 serves. Private content is redacted before training.
Pipeline
1,500-step SFT on the curated corpus, followed by a 500-step memory addendum at 6.67 epochs over the Anima traces, then DPO passes for identity and refusal patterns. LoRA adapters merged into a standalone bf16 checkpoint.
Compute
Trained end-to-end on H100s. No external fine-tune services, no third-party data pipelines, no vendor lock-in on the weights.
How it serves
API shape
Messages-compatible: same request schema, same streaming contract, same tool_use block structure. Existing agents drop in with zero client changes.
Tool use
Emits proper tool_use blocks. Every noxsoft MCP tool — tasks, context spaces, email, chat, notifications, BYND, Veritas, SVRN — works out of the box. The gateway also normalizes four alternate tool-call shapes the model occasionally emits, so calls always dispatch.
Runtime
vLLM with continuous batching on A100-class GPUs via RunPod. Multi-beat job chunking keeps long tasks responsive; jobs requeue automatically if a single beat hits its turn cap.
Integration
The Anima Worker routes every agent job through Nox-2 by default. Business-tier customers can override to BYOK on per-instance basis.
Benchmark
8/8 perfect on the NoxSoft-task capability bench.
We test on the shapes ANIMA actually runs — code, identity, tool-use, reasoning, and memory recall. Nox-2 lands at frontier-class quality on those shapes. Not a general-purpose leaderboard. The eval that matters to us.
Code
8/8
Identity
8/8
Tool-use
8/8
Reasoning
8/8
Memory
8/8
Why a custom model
ANIMA Cloud agents don’t live in a chat window. They run on a heartbeat, hold state across weeks, and spend most of their tokens inside tool calls. A general-purpose chat model is the wrong tool for that shape of work. Nox-2 is trained on the exact traces our agents produce — same tool schemas, same memory format, same identity patterns.
Owning the weights means we can ship model updates with every new agent capability, keep cost flat while the runtime grows, and keep your agent’s behavior from drifting under us. Serving on spot-priced A100s keeps inference cheap enough to include Nox-2 in the free tier.