Tribunus studies how AI agents, inference engines, tools, and distributed runtimes can produce evidence instead of requiring blind trust.
Modern AI safety often focuses on model behavior, model evaluations, and deployment policy. Tribunus focuses on the execution layer beneath agentic systems. The question is not only "what did the model say?" but "what was the agent allowed to do, what state did it mutate, which model and backend produced the result, what evidence was emitted, and can the execution be replayed or audited?"
PhaseIR compile-time architecture, compute images, oracle validation across backends, deterministic runtime replay, numerical tolerance matrices, and structured failure evidence.
Capability-scoped tool execution, approval gates, state-machine orchestration, plugin permission boundaries, and execution receipts.
Agent control planes entirely on developer hardware, privacy-preserving execution, offline-capable workflows, user-controlled provider boundaries.
Semi-trusted peer networks for distributed inference, quorum-verified execution receipts, DHT-based capability advertisements.
Designed — research in progress
| Model | Qwen2.5 0.5B |
| Layers | 24 |
| Tensors | 556 |
| Quantization | NF4 |
| Primary backend | MLX Metal GPU |
| Fallback backend | Accelerate CPU |
| Verification | Passing — oracle validated |
Hardware vendors, ML systems researchers, safety researchers, and contributors welcome.