Researching verifiable execution for agentic AI

Tribunus studies how AI agents, inference engines, tools, and distributed runtimes can produce evidence instead of requiring blind trust.

Research thesis

Modern AI safety often focuses on model behavior, model evaluations, and deployment policy. Tribunus focuses on the execution layer beneath agentic systems. The question is not only "what did the model say?" but "what was the agent allowed to do, what state did it mutate, which model and backend produced the result, what evidence was emitted, and can the execution be replayed or audited?"

Research programs

1. Verifiable Inference

PhaseIR compile-time architecture, compute images, oracle validation across backends, deterministic runtime replay, numerical tolerance matrices, and structured failure evidence.

2. Governed Agents

Capability-scoped tool execution, approval gates, state-machine orchestration, plugin permission boundaries, and execution receipts.

3. Local-First AI Systems

Agent control planes entirely on developer hardware, privacy-preserving execution, offline-capable workflows, user-controlled provider boundaries.

4. Federated Mutual-Aid Inference (Dharma)

Semi-trusted peer networks for distributed inference, quorum-verified execution receipts, DHT-based capability advertisements.

Evidence registry

Current results

Qwen2.5 0.5B ComputeImage

ModelQwen2.5 0.5B
Layers24
Tensors556
QuantizationNF4
Primary backendMLX Metal GPU
Fallback backendAccelerate CPU
VerificationPassing — oracle validated

Open problems

Collaborate

Hardware vendors, ML systems researchers, safety researchers, and contributors welcome.

GitHub → · Docs → · research@tribunus.dev