LLM Providers

Use any model, mix and match per agent

MUXI is LLM-agnostic. You can run proprietary APIs (OpenAI, Anthropic, Google), managed services (Bedrock, Vertex), or self-hosted models (Ollama, vLLM, llama.cpp) in the same formation.

Test different models. LLM performance varies significantly by task. A model that excels at reasoning may struggle with creative writing. Run your actual prompts through several models to find the best fit for each agent's role.

Powered by OneLLM

Under the hood, MUXI uses OneLLM - a unified interface for 200+ language models across all major providers. Every model OneLLM supports works out of the box with MUXI.

Supported providers include:

  • Proprietary APIs: OpenAI, Anthropic, Google (Gemini), Mistral, Cohere, AI21
  • Cloud platforms: AWS Bedrock, Google Vertex AI, Azure OpenAI
  • Self-hosted: Any model supported via Ollama or llama.cpp
  • Any OpenAI-compatible endpoint

Key ideas

  • Per-agent model selection: Choose the best model per agent (e.g., reasoning vs. writing vs. high-volume), and change without redeploying code.
  • Provider flexibility: Any OpenAI-compatible endpoint works; custom providers can be added via HTTP.
  • Failover ready: Swap providers when rate limits or outages occur.
  • Token efficiency: Combine with synopsis caching and tool-indexing to minimize context size.

How to configure

Agent-specific model overrides in agents/*.afs:

# agents/researcher.afs
schema: "1.0.0"
id: researcher
name: Researcher
description: Research specialist

system_message: |
  You are a research specialist.
  Your job is to gather accurate information...

llm_models:
  - text: "openai/gpt-4.1"
# agents/writer.afs
schema: "1.0.0"
id: writer
name: Writer
description: Content writer

system_message: |
  You are a content writer.
  Your job is to create clear, engaging content...

llm_models:
  - text: "anthropic/claude-3.5-sonnet"
# agents/high-volume.afs
schema: "1.0.0"
id: high-volume
name: High Volume Agent
description: High volume processing

system_message: |
  You are a data processor.
  Your job is to handle high-volume tasks efficiently...

llm_models:
  - text: "http://llama.example.com/v1/llama-3-70b-instruct"

Model choice matrix example

  • Reasoning-heavy: GPT-5, Claude Opus 4.5
  • Long-form writing: Claude Sonnet 4.5, Gemini 3
  • High-volume or on-prem: Llama 3 (vLLM/llama.cpp), Mistral

Related docs