5 Architecture Decisions Worth Stealing from an 18K-Star AI Agent Toolkit
A deep dive into 5 architecture decisions from pi-mono: runtime provider registration, dual-layer messaging, injectable tool operations, diff rendering, and session trees. Each one is ready to use in your own AI agent project.
If you’re building an AI agent product, or debating whether to adopt LangChain / Vercel AI SDK, this post dissects a framework-free, built-from-scratch full-stack approach. Five architecture decisions, each one directly applicable to your own project.
pi-mono is the latest creation from Mario Zechner, the author of libGDX — a full-stack AI agent toolkit spanning an LLM API abstraction layer, an agent runtime, and a complete terminal-based coding assistant. 18K stars, 7 packages. I spent an afternoon reading the core code, and every layer contains design decisions that made me rethink my own code.
What This Project Is
pi-mono is a TypeScript monorepo containing 7 packages:
| Package | Purpose |
|---|---|
pi-ai | Unified streaming API across multiple LLM providers |
pi-agent-core | Agent runtime: tool calling, state management, message orchestration |
pi-coding-agent | Interactive coding agent CLI (similar to Claude Code / Aider) |
pi-tui | Terminal UI library with diff rendering |
pi-web-ui | Web chat components |
pi-mom | Slack bot |
pi-pods | vLLM GPU pod management |
From the lowest-level LLM calls to the highest-level user interactions, every layer is built in-house — no LangChain, no Vercel AI SDK. This “full-stack DIY” choice is worth discussing on its own, but today I want to focus on 5 specific architecture decisions.
The following 5 decisions progress bottom-up — LLM call layer, agent runtime, UI rendering, session persistence. You can steal just one, but understanding how they layer together will help you steal more effectively.
Decision 1: Runtime Provider Registration Instead of Compile-Time Hardcoding
Most LLM libraries import all supported providers directly in code:
// Typical approach: compile-time hardcoding
import { openai } from './providers/openai';
import { anthropic } from './providers/anthropic';
import { google } from './providers/google';
// ... plus 20 more
pi-ai does it differently. It uses a runtime registry:
// api-registry.ts
const apiProviderRegistry = new Map<string, RegisteredApiProvider>();
export function registerApiProvider<TApi extends Api>(
provider: ApiProvider<TApi, TOptions>,
): void {
apiProviderRegistry.set(provider.api, { provider });
}
The 20+ built-in providers self-register at startup via register-builtins.ts. But here’s the key — users can also call registerApiProvider() to register their own providers.
Why does this matter? In enterprise settings, you might be running self-hosted vLLM instances or using a regional model API. With a hardcoded library, you’d have to fork the code or wait for upstream support. With pi-ai, you just write an extension. This connects to what I discussed in designing an LLM fault-tolerance layer — the difference being that pi-ai emphasizes extensibility while the fault-tolerance layer emphasizes reliability.
In my own toolkit/ai, I took a different route — a declarative config.json that defines provider chains:
{
"models": {
"smart": {
"chain": [
{ "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
{ "provider": "openai", "model": "gpt-4.1" }
]
}
}
}
Declarative is simpler but can’t extend to unknown providers. pi-ai’s imperative registration is more flexible, at the cost of requiring users to write code.
My take: For libraries meant for others, runtime registration is the better choice. For products (where callers are fixed), declarative config is enough.
Decision 2: Dual-Layer Messages — App Messages != LLM Messages
This is the most elegant design in the entire project.
In pi-agent-core, the message list maintained by the Agent isn’t in LLM format (Message[]). Instead, it’s a broader AgentMessage[]:
// The application layer can have custom message types
type AgentMessage = Message | CustomAgentMessages[keyof CustomAgentMessages];
// Before each LLM call, convert to standard format
convertToLlm: (messages: AgentMessage[]) => Message[]
Why add this extra layer?
Imagine you’re building a coding assistant. User actions aren’t limited to “sending messages” — they might switch files, run tests, or see error popups. These events matter to the application (they need to show in the UI, they need to influence subsequent behavior), but the LLM doesn’t need to see all of them.
With dual-layer messages, you can:
- Insert any custom events into the app layer (UI notifications, timers, system state) without polluting the LLM context
- Trim context during conversion — running low on tokens?
transformContextcompresses earlier messages into summaries - Preserve app state across model switches — switch models and your app messages persist; only the format sent to the LLM changes
// Two-stage conversion pipeline
AgentMessage[]
-> transformContext() // Optional: trim, inject external context
-> convertToLlm() // Required: convert to LLM-compatible format
-> Message[]
-> streamSimple() // Call the LLM
Most agent frameworks treat messages as a flat list — whatever goes in gets sent to the LLM. pi-mono’s layered approach truly decouples the application layer from the LLM layer.
Decision 3: Injectable Tool Operation Interfaces
Every tool in pi-coding-agent (read, write, bash, grep…) doesn’t call fs.readFile() directly. Instead, they go through an operations interface:
export interface ReadOperations {
readFile: (path: string) => Promise<Buffer>;
access: (path: string) => Promise<void>;
detectImageMimeType?: (path: string) => Promise<string | null>;
}
// Inject the implementation when creating the tool
const readTool = createReadTool(cwd, {
operations: myCustomReadOps
});
The default implementation uses Node.js fs. But you can inject anything — an SSH remote filesystem, a Docker container’s filesystem, even S3.
The same tool code, thanks to the decoupled operations interface, can run on:
- Local terminal
- Slack bot (via pi-mom)
- Remote GPU pod (via pi-pods)
- Web UI (via pi-web-ui)
This is the classic dependency injection pattern, but it’s particularly well-suited for agent tools. Agent tools inherently need to adapt to multiple runtime environments — running locally today, in the cloud tomorrow, in a browser the day after. Separating “what to do” (tool logic) from “how to do it” (filesystem access) is a remarkably forward-thinking decision.
Decision 4: Diff Rendering + Synchronized Output for Terminal UI
If you’ve used any CLI AI tool, you’ve probably experienced screen flickering — the entire terminal clears and redraws. pi-tui solves this with two techniques.
Technique 1: Three-Strategy Diff Rendering
Instead of clearing the screen every time, the TUI retains every line from the previous frame and compares line-by-line with the new frame:
- First render: Output directly, don’t clear anything
- Width change: Full clear and re-render (unavoidable on resize)
- Normal update: Only redraw from the first changed line; everything above stays untouched
// Pseudocode: find the first changed line
for (let i = 0; i < lines.length; i++) {
if (lines[i] !== previousLines[i]) {
// Start redrawing from here, leave everything above alone
moveCursorTo(i);
clearFromHere();
renderFrom(i);
break;
}
}
Technique 2: CSI 2026 Synchronized Output
write("\x1b[?2026h"); // Tell the terminal: start buffering, don't draw yet
// ... output all changes ...
write("\x1b[?2026l"); // OK, now draw everything at once
This ANSI escape sequence prevents the terminal from refreshing until it receives the end signal. The result: even when many lines need updating, the user sees a single complete frame with no intermediate flickering.
Most Node.js CLI libraries (ink, blessed, etc.) can’t achieve this level of control. pi-tui is purpose-built for AI scenarios (frequent streaming text updates).
Decision 5: Session Trees — Branching in a Single File
A common scenario with coding assistants: you ask the AI to try Approach A, you’re not happy, and you want to go back to the same point to try Approach B.
pi-coding-agent stores sessions in JSONL format, where each message has an id and parentId:
{"id":"m1","type":"user","content":"Refactor this function"}
{"id":"m2","parentId":"m1","type":"assistant","content":"Approach A..."}
{"id":"m3","parentId":"m1","type":"assistant","content":"Approach B..."}
m2 and m3 share the same parentId — that’s a branch. A single JSONL file is a tree. Use the /tree command to visualize it:
m1 (Refactor this function)
+-- m2 (Approach A...)
| +-- m4 (Continued iteration on A)
+-- m3 (Approach B...)
+-- m5 (Continued iteration on B)
No need to create a new file for each branch, no “copy-paste session” workarounds. Branching is a first-class citizen.
Takeaways: What’s Worth Bringing Home
| Pattern | Best For | Complexity |
|---|---|---|
| Runtime Provider Registration | LLM libraries built for others | Medium |
| Dual-Layer Message Conversion | Any agent app with a UI | Medium |
| Injectable Tool Operations | Agents needing multi-environment support | Low |
| Diff Rendering + CSI 2026 | High-frequency-update CLI tools | High |
| Session Trees | AI tools requiring trial-and-error/branching | Low |
These five decisions aren’t isolated. Together, they form a highly composable system — the same agent core, through different injected tool operations, different message conversions, and different UI layers, adapts to entirely different product forms.
This is also where the monorepo structure proves its worth: each layer can be used independently (you can use pi-ai alone for LLM calls), but when combined, the whole is greater than the sum of its parts.
If you’re building anything AI agent-related, I’d recommend reading at least packages/ai/src/api-registry.ts and packages/agent/src/agent.ts. It’s only about 200 lines combined, but the design density is exceptional.
pi-mono is open source at github.com/badlogic/pi-mono under the MIT license. The architecture patterns I learned from it are being fed back into my own toolkit/ai project.