The Central Nervous System Analogy
Most people think of AI assistants as standalone brains—a single entity that processes and responds. clawbot is different: it's architected like a biological nervous system. Your brain (the AI model) doesn't directly control your muscles; signals travel through nerves (channels), are processed by the spinal cord (Gateway), and execute through motor neurons (system tools).
This distributed architecture is why clawbot can simultaneously listen to WhatsApp, respond via Telegram, execute a shell command, and update its memory—all in parallel. Traditional monolithic AI can't do this because every capability is tightly coupled. clawbot's modular design treats each component as an independent service communicating through well-defined protocols.
clawbot System Architecture
Component Deep Dive: How Each Piece Functions
🌐 The Gateway: Central Command
The Gateway is the heart of clawbot—a persistent Node.js process that never sleeps. Why a dedicated service instead of running AI directly in each channel? Three critical reasons:
- Unified State: Single source of truth for conversation history across all channels
- Resource Efficiency: One connection to AI APIs instead of 15+ separate connections
- Fault Isolation: If WhatsApp crashes, Telegram keeps working—channels are replaceable plugins
The Gateway communicates via WebSockets, not HTTP. Why? HTTP requires the client to constantly ask "any updates?" (polling). WebSockets maintain a persistent two-way connection—the Gateway can push messages to channels the instant they're ready, enabling real-time streaming responses.
💬 Channels: The Sensory Interface
Channels are independent processes that translate between messaging platform protocols and the Gateway's unified message format. Think of them as protocol adapters—WhatsApp speaks Baileys, Telegram speaks Bot API, but both present messages to the Gateway in identical JSON structure.
This abstraction is powerful: adding a new messaging platform requires writing one channel plugin, not modifying the entire system. Each channel is sandboxed—if it crashes, the Gateway automatically attempts reconnection without affecting other channels.
🧠 AI Model Router: The Intelligence Backbone
clawbot doesn't lock you into a single AI provider. The model router implements dynamic provider selection: choose Claude for complex reasoning, GPT-4 for creative tasks, or local Ollama for privacy-critical operations—all within the same conversation.
How does this work technically? Each AI provider has a standardized interface (following the OpenAI Completion API pattern). The router maintains connection pools to all configured providers, routes requests based on configured preferences, and handles automatic failover if a provider is unavailable.
Advanced: Custom Model Routing Logic
You can configure custom routing rules in ~/.clawbot/clawbot.json:
- Route coding tasks to Claude (better at programming)
- Route creative writing to GPT-4 (more imaginative)
- Route sensitive data processing to local Ollama (zero external transmission)
- Automatic fallback to secondary provider if primary fails
💾 Persistent Memory: Conversation State Management
Unlike stateless AI APIs that forget everything between requests, clawbot maintains durable conversation history
stored in Markdown files at ~/.clawbot/conversations/. Each conversation is a complete log of messages,
tool executions, and AI responses.
Why Markdown instead of a database? Three reasons:
- Human-Readable: You can grep, search, and version control your AI conversations
- Portable: Move your entire conversation history by copying a folder
- AI-Friendly: AI models can natively understand Markdown formatting for context retrieval
The Gateway automatically manages context windows—when a conversation exceeds the model's token limit, it intelligently summarizes older messages while preserving critical context. This enables conversations that span weeks without degradation.
⚙️ Execution Engine: Safe System Control
This is where clawbot differentiates from chatbots: genuine task execution. When the AI decides a shell command is needed, the execution engine handles it safely through multiple protection layers:
- Sandboxing: Commands run in restricted environments with limited file system access
- Permission Gates: User-configurable allowlists/denylists for command patterns
- Confirmation Prompts: Destructive operations require explicit user approval
- Audit Logging: Every executed command is logged with timestamp, user, and result
- Timeout Protection: Commands automatically terminate after configurable duration
🔌 Skills System: Extensible Capabilities
Skills are clawbot's plugin system—folders containing SKILL.md files that teach the AI new capabilities.
How does a Markdown file extend functionality? The AI reads the skill definition and learns:
- What the skill does and when to use it
- What parameters it accepts
- Example commands that trigger it
- Expected behavior and edge cases
Skills can include shell scripts, Node.js modules, or API integration code. The AI calls these tools through structured function calls, receives results, and incorporates them into responses. This is how clawbot can control smart homes, manage cloud infrastructure, or integrate with proprietary internal systems—anyone can write a skill.
The Technology Stack: What Powers clawbot
Runtime: Node.js 22+
Modern JavaScript runtime with excellent async I/O performance for real-time communication. Native support for WebSockets, HTTP/2, and worker threads enables parallel task execution.
Language: TypeScript
Type-safe development prevents entire categories of bugs. Strong typing for WebSocket messages, AI responses, and configuration ensures robust error handling.
Communication: WebSocket Protocol
Bidirectional, persistent connections between Gateway and all clients. Enables real-time message streaming, instant notifications, and sub-second latency for AI responses.
Storage: File-Based State
Conversations stored as Markdown, configuration as JSON, skills as structured folders. No database dependency—simpler deployment, easier backups, grep-friendly debugging.
WhatsApp: Baileys Library
Reverse-engineered WhatsApp Web protocol implementation. Maintains persistent connection using the same protocol your browser uses—no unofficial APIs or phone number sharing.
Telegram: grammY Framework
Official Telegram Bot API wrapper with TypeScript support. Long-polling and webhook support for flexible deployment, file uploads, inline keyboards, and full bot capabilities.
AI Integration: Provider Abstraction
Unified interface supporting Anthropic Claude, OpenAI GPT-4, Ollama local models, and Google Gemini. Automatic conversion between different API formats for seamless provider switching.
Security: Sandboxed Execution
Shell commands run in restricted environments with configurable permissions. User-defined trust boundaries prevent accidental system damage or unauthorized operations.
Message Flow: Following a Request Through the System
Let's trace what happens when you send: "Remind me to call Mom in 2 hours" via WhatsApp.
{"from": "user", "text": "Remind...", "channel": "whatsapp"}schedule_reminder(time="+2h", message="Call Mom")This 19-step choreography happens in under 2 seconds. The distributed architecture enables each component to work independently—the WhatsApp channel doesn't know about Claude, Claude doesn't know about WhatsApp, yet they coordinate seamlessly through the Gateway's orchestration.
Why This Architecture Matters
Most AI assistants are monolithic—all logic in one application, tightly coupled to specific platforms. clawbot's distributed design enables capabilities impossible in traditional architectures:
- Hot-Swappable Components: Upgrade AI models without restarting channels. Replace WhatsApp with Telegram without touching the Gateway.
- Horizontal Scaling: Run multiple channel instances across different machines, all connecting to one Gateway.
- Fault Tolerance: If one channel crashes, others continue operating. If the Gateway restarts, channels automatically reconnect.
- Multi-Tenancy: One Gateway can serve multiple users with isolated conversation spaces—perfect for family deployments.
- Extensibility: Community members can build new channels, skills, or tools without access to core code.
🔐 Security Through Architecture
The distributed design isn't just about flexibility—it's a security boundary. Channels run in separate processes with limited permissions. If a channel is compromised, attackers gain access only to that messaging platform, not your entire system. The Gateway enforces authentication, and the execution engine applies additional sandboxing before any system commands run.
Technical Decisions and Trade-offs
Why WebSockets Instead of HTTP REST?
HTTP is request-response: client asks, server answers. AI responses often take 5-15 seconds. Without WebSockets, you'd need polling (wasteful) or long-polling (complex). WebSockets enable streaming responses—the AI begins sending words the moment it generates them, creating the perception of instant responsiveness.
Why File Storage Instead of a Database?
Databases add deployment complexity—installation, configuration, backup strategies. For personal AI (1-10 users), files provide sufficient performance while remaining portable, inspectable, and version-controllable. You can grep your conversation history, back it up with Dropbox, or track changes with Git.
Why Node.js Instead of Python/Go/Rust?
Node.js excels at I/O-bound tasks (real-time communication, API calls). JavaScript's async/await model maps naturally to AI workflows (send request, await response, process result). The npm ecosystem provides mature libraries for every messaging platform. TypeScript adds safety without sacrificing development speed.
Ready to Deploy Your Own?
Understanding the architecture is step one. Step two is building it.