Request Lifecycle

The complete journey from user request to agent response

Ask ChatGPT Ask Claude.ai Open in Cursor

This is the definitive guide to understanding how MUXI processes requests - from the moment a user sends a message to the final response delivery. Follow along to see how MUXI's orchestration intelligence works.

What Makes MUXI Different

MUXI isn't just a request-response system. It's an intelligent processing pipeline that:

Adapts to complexity - Routes simple queries to single agents, decomposes complex requests into multi-agent workflows
Maintains context - Three-tier memory system ensures coherent conversations
Clarifies ambiguity - Multi-turn clarification resolves unclear requests before processing
Orchestrates intelligence - Coordinates multiple AI agents with specialized capabilities
Optimizes execution - Dynamically chooses sync/async processing based on estimated time
Ensures consistency - Applies configurable soul across all agents
Handles complexity - Supports SOPs (Standard Operating Procedures) for repeatable workflows

The Request Journey

A request passing through MUXI undergoes:

Session & Memory Initialization - Context loading from four memory layers
Credential Detection - Intercepts credential-related requests before processing
Clarification & Actionability - Resolves unclear requests via multi-turn dialog
Intelligent Routing - Priority-based: agent specification → SOP match → complexity analysis
SOP-First Processing - Standard procedures override all other routing when matched
Workflow Analysis - Complex requests trigger multi-agent orchestration
Agent Processing - Tool execution, agent-to-agent delegation, parallel tasks
Response Generation - Batch, streaming, or webhook delivery
Soul Application - Style and tone consistency regardless of which agents were involved
Memory Updates - Learning from interactions for future context

The system seamlessly handles everything from simple queries ("What's the weather?") to complex orchestrations ("Analyze my codebase, generate security audit, create Linear issues, and notify my team").

Complete Flow Diagram

flowchart TD
    Start([User Sends Request])

    %% Input Methods
    Start -->|API/MCP/SDK/CLI/Embedded| InitReq[Initialize Request]

    %% Session Check
    InitReq --> CheckSession{Check Session}
    CheckSession -->|New Session| CreateSession[Create Session ID]
    CheckSession -->|Existing| UseSession[Use Existing Session]

    CreateSession --> ReqTracking[Request Tracking
- Generate Request ID
- Start Timer
- Log Event]
    UseSession --> ReqTracking

    %% File Upload Check
    ReqTracking --> HasFiles{Has File
Uploads?}
    HasFiles -->|Yes| ProcessFiles[Process Files
- Store in temp
- Extract metadata
- Update context]
    HasFiles -->|No| InitMemory[Initialize Memory
Systems]
    ProcessFiles --> InitMemory

    %% Memory Systems
    InitMemory --> SmartBuffer[Smart Buffer Memory
- Load last N messages
- Vector similarity search
- Auto-summarization
- FIFO with multiplier]

    SmartBuffer --> LongTermCheck{Long-term
Enabled?}
    LongTermCheck -->|Yes| LongTermMem[Long-term Memory
- PostgreSQL/SQLite
- User preferences
- Historical context
- Semantic search]
    LongTermCheck -->|No| BuildContext

    LongTermMem --> BuildContext[Build User Context
- Merge preferences
- Apply context
- Load user history
- Extract patterns]

    BuildContext --> WorkingMem[Working Memory
- Current task state
- Active file refs
- Tool outputs
- Session data]

    WorkingMem --> FormatMsg[Format Message with Full Context
- User context
- Recent history
- Current state]

    %% Credential Check
    FormatMsg --> CredentialCheck{Need
Credentials?}
    CredentialCheck -->|SERVICE_USE| HandleCredentials[Handle Credentials
- Check MCP registry
- Verify user has creds
- Redirect or prompt]
    CredentialCheck -->|CREDENTIAL_REQUEST| HandleCredentials
    CredentialCheck -->|NONE| PendingClarification

    HandleCredentials -->|Handled| End([Return Response])
    HandleCredentials -->|Has Credentials| PendingClarification

    %% Clarification Check
    PendingClarification{Has pending
Clarification?}
    PendingClarification -->|Yes| ProcessClarification[Process Clarification
Response]
    PendingClarification -->|No| NeedClarification{Need
Clarification?}

    ProcessClarification --> NeedClarification
    NeedClarification -->|Yes| InitClarification[Initialize Clarification
- Analyze gaps
- Generate questions
- Store context]
    NeedClarification -->|No| IsActionable{Is actionable?}

    InitClarification --> WriteQuestion[Write Clarification
Question]
    WriteQuestion --> ApplySoul1[[Apply Overlord Soul]]

    %% Main Processing Path
    IsActionable -->|Yes| AgentSpecified{Agent
Specified?}
    IsActionable -->|No| ApplySoul1

    AgentSpecified -->|Yes| RequestAgent[Request Specific
Agent]
    AgentSpecified -->|No| SOPMatch{SOP
Matched?}
    RequestAgent --> AgentProcessing

    %% SOP and Workflow Path
    SOPMatch -->|Yes| ExecuteSOP[Execute SOP
- Load template
- Decompose tasks
- Execute workflow]
    SOPMatch -->|No| ComplexCheck{Complex
Request?}

    ExecuteSOP --> AgentProcessing
    ComplexCheck -->|≥Threshold| WorkflowDecomp[Workflow
Decomposition
- Create task graph
- Assign agents
- Execute parallel]
    ComplexCheck -->|Best Agent]

    WorkflowDecomp --> AgentProcessing
    AutoRoute --> AgentProcessing

    %% Plan Confirmation
    ComplexCheck2{Requires plan
confirmation?}
    WorkflowDecomp --> ComplexCheck2
    ComplexCheck2 -->|Yes| AskConfirmation[Ask user for
confirmation]
    ComplexCheck2 -->|No| EstimateTime

    AskConfirmation -->|Approved| EstimateTime
    AskConfirmation -->|Declined| WorkflowDecomp

    %% Time Estimation
    EstimateTime[Estimate Execution Time
- Analyze task complexity
- Check tool requirements
- Review historical data]

    EstimateTime --> TimeThreshold{Estimated
Time?}
    TimeThreshold -->|≥Threshold| NotifyUser[Notify User of Estimated Time
- Return task ID
- Provide time estimate
- Setup webhook]
    TimeThreshold -->|- Load agent config
- Apply system prompt
- Initialize context]

    NotifyUser --> ProcessAgent

    %% Agent Processing
    ProcessAgent --> AgentProcessing[Agent Processing]

    AgentProcessing --> NeedTools{Need Tools?}
    NeedTools -->|Yes| MCPTools[MCP Tool Calls
- Connect to server
- Execute tools
- Handle responses]
    NeedTools -->|No| NeedHelp{Need Help
from another
Agent?}

    MCPTools --> ProcessToolResults[Process Tool Results
- Parse outputs
- Update context
- Check errors]

    ProcessToolResults --> NeedHelp

    %% Agent Communication
    NeedHelp -->|Yes| A2A[A2A Communication
- Format request
- Call internal/external agent
- Await response]
    NeedHelp -->|No| MoreProcessing{Need more
Processing?}

    A2A --> HandoffType{Handoff
Type?}
    HandoffType -->|Delegation| DelegateAgent[Delegate to Agent
- Transfer context
- Wait for completion
- Aggregate results]
    HandoffType -->|Consultation| ConsultAgent[Consult Agent
- Request specific info
- Continue processing
- Integrate insights]
    HandoffType -->|Parallel| ParallelAgents[Parallel Agents
- Split tasks
- Execute concurrently
- Merge results]

    DelegateAgent --> AgentResponse[Agent Response
- Collect outputs
- Update memory
- Format results]
    ConsultAgent --> AgentResponse
    ParallelAgents --> AgentResponse

    AgentResponse --> MoreProcessing
    MoreProcessing -->|Yes| AgentProcessing
    MoreProcessing -->|No| GenerateResponse[Generate Response
- Format output
- Add explanations
- Include artifacts]

    %% Final Response Path
    GenerateResponse --> ApplySoul2[Apply Overlord Soul
- Maintain tone consistency
- Apply style preferences
- Format for user
- Add personality touches]

    ApplySoul1 --> ApplySoul2

    ApplySoul2 --> ResponseMode{Response
Mode?}

    ResponseMode -->|Async| WebhookDelivery[Webhook Delivery
- Format payload
- Call webhook URL
- Retry on failure]
    ResponseMode -->|Sync| UserPref{User
Preference?}

    UserPref -->|Batch| BatchResponse[Batch Response
- Complete processing
- Return full response
- Single payload]
    UserPref -->|Stream| StreamResponse[Stream Response
- Chunk generation
- Progressive delivery
- Real-time updates]

    WebhookDelivery --> UpdateMemory[Update Memory
- Buffer: conversation
- Working: context
- Long-term: persist]
    BatchResponse --> UpdateMemory
    StreamResponse --> UpdateMemory

    UpdateMemory --> HasArtifacts{Has
Artifacts?}
    HasArtifacts -->|Yes| AttachArtifacts[Attach Artifacts
- PDFs
- Files
- Data URLs]
    HasArtifacts -->|No| FinalizeResponse[Finalize Response
- Add metadata
- Log completion
- Update metrics]

    AttachArtifacts --> FinalizeResponse

    FinalizeResponse --> End([Return to User])

    class Start,End startEnd
    class CheckSession,HasFiles,LongTermCheck,PendingClarification,NeedClarification,IsActionable,AgentSpecified,SOPMatch,ComplexCheck,ComplexCheck2,TimeThreshold,NeedTools,NeedHelp,HandoffType,MoreProcessing,ResponseMode,UserPref,HasArtifacts decision
    class UseSession,CreateSession warning

flowchart TD
    Start([User Sends Request])

    %% Input Methods
    Start -->|API/MCP/SDK/CLI/Embedded| InitReq[Initialize Request]

    %% Session Check
    InitReq --> CheckSession{Check Session}
    CheckSession -->|New Session| CreateSession[Create Session ID]
    CheckSession -->|Existing| UseSession[Use Existing Session]

    CreateSession --> ReqTracking[Request Tracking<br/>- Generate Request ID<br/>- Start Timer<br/>- Log Event]
    UseSession --> ReqTracking

    %% File Upload Check
    ReqTracking --> HasFiles{Has File<br/>Uploads?}
    HasFiles -->|Yes| ProcessFiles[Process Files<br/>- Store in temp<br/>- Extract metadata<br/>- Update context]
    HasFiles -->|No| InitMemory[Initialize Memory<br/>Systems]
    ProcessFiles --> InitMemory

    %% Memory Systems
    InitMemory --> SmartBuffer[Smart Buffer Memory<br/>- Load last N messages<br/>- Vector similarity search<br/>- Auto-summarization<br/>- FIFO with multiplier]

    SmartBuffer --> LongTermCheck{Long-term<br/>Enabled?}
    LongTermCheck -->|Yes| LongTermMem[Long-term Memory<br/>- PostgreSQL/SQLite<br/>- User preferences<br/>- Historical context<br/>- Semantic search]
    LongTermCheck -->|No| BuildContext

    LongTermMem --> BuildContext[Build User Context<br/>- Merge preferences<br/>- Apply context<br/>- Load user history<br/>- Extract patterns]

    BuildContext --> WorkingMem[Working Memory<br/>- Current task state<br/>- Active file refs<br/>- Tool outputs<br/>- Session data]

    WorkingMem --> FormatMsg[Format Message with Full Context<br/>- User context<br/>- Recent history<br/>- Current state]

    %% Credential Check
    FormatMsg --> CredentialCheck{Need<br/>Credentials?}
    CredentialCheck -->|SERVICE_USE| HandleCredentials[Handle Credentials<br/>- Check MCP registry<br/>- Verify user has creds<br/>- Redirect or prompt]
    CredentialCheck -->|CREDENTIAL_REQUEST| HandleCredentials
    CredentialCheck -->|NONE| PendingClarification

    HandleCredentials -->|Handled| End([Return Response])
    HandleCredentials -->|Has Credentials| PendingClarification

    %% Clarification Check
    PendingClarification{Has pending<br/>Clarification?}
    PendingClarification -->|Yes| ProcessClarification[Process Clarification<br/>Response]
    PendingClarification -->|No| NeedClarification{Need<br/>Clarification?}

    ProcessClarification --> NeedClarification
    NeedClarification -->|Yes| InitClarification[Initialize Clarification<br/>- Analyze gaps<br/>- Generate questions<br/>- Store context]
    NeedClarification -->|No| IsActionable{Is actionable?}

    InitClarification --> WriteQuestion[Write Clarification<br/>Question]
    WriteQuestion --> ApplySoul1[[Apply Overlord Soul]]

    %% Main Processing Path
    IsActionable -->|Yes| AgentSpecified{Agent<br/>Specified?}
    IsActionable -->|No| ApplySoul1

    AgentSpecified -->|Yes| RequestAgent[Request Specific<br/>Agent]
    AgentSpecified -->|No| SOPMatch{SOP<br/>Matched?}
    RequestAgent --> AgentProcessing

    %% SOP and Workflow Path
    SOPMatch -->|Yes| ExecuteSOP[Execute SOP<br/>- Load template<br/>- Decompose tasks<br/>- Execute workflow]
    SOPMatch -->|No| ComplexCheck{Complex<br/>Request?}

    ExecuteSOP --> AgentProcessing
    ComplexCheck -->|≥Threshold| WorkflowDecomp[Workflow<br/>Decomposition<br/>- Create task graph<br/>- Assign agents<br/>- Execute parallel]
    ComplexCheck -->|<Threshold| AutoRoute[Auto-Route to<br/>Best Agent]

    WorkflowDecomp --> AgentProcessing
    AutoRoute --> AgentProcessing

    %% Plan Confirmation
    ComplexCheck2{Requires plan<br/>confirmation?}
    WorkflowDecomp --> ComplexCheck2
    ComplexCheck2 -->|Yes| AskConfirmation[Ask user for<br/>confirmation]
    ComplexCheck2 -->|No| EstimateTime

    AskConfirmation -->|Approved| EstimateTime
    AskConfirmation -->|Declined| WorkflowDecomp

    %% Time Estimation
    EstimateTime[Estimate Execution Time<br/>- Analyze task complexity<br/>- Check tool requirements<br/>- Review historical data]

    EstimateTime --> TimeThreshold{Estimated<br/>Time?}
    TimeThreshold -->|≥Threshold| NotifyUser[Notify User of Estimated Time<br/>- Return task ID<br/>- Provide time estimate<br/>- Setup webhook]
    TimeThreshold -->|<Threshold| ProcessAgent[Process with Agent<br/>- Load agent config<br/>- Apply system prompt<br/>- Initialize context]

    NotifyUser --> ProcessAgent

    %% Agent Processing
    ProcessAgent --> AgentProcessing[Agent Processing]

    AgentProcessing --> NeedTools{Need Tools?}
    NeedTools -->|Yes| MCPTools[MCP Tool Calls<br/>- Connect to server<br/>- Execute tools<br/>- Handle responses]
    NeedTools -->|No| NeedHelp{Need Help<br/>from another<br/>Agent?}

    MCPTools --> ProcessToolResults[Process Tool Results<br/>- Parse outputs<br/>- Update context<br/>- Check errors]

    ProcessToolResults --> NeedHelp

    %% Agent Communication
    NeedHelp -->|Yes| A2A[A2A Communication<br/>- Format request<br/>- Call internal/external agent<br/>- Await response]
    NeedHelp -->|No| MoreProcessing{Need more<br/>Processing?}

    A2A --> HandoffType{Handoff<br/>Type?}
    HandoffType -->|Delegation| DelegateAgent[Delegate to Agent<br/>- Transfer context<br/>- Wait for completion<br/>- Aggregate results]
    HandoffType -->|Consultation| ConsultAgent[Consult Agent<br/>- Request specific info<br/>- Continue processing<br/>- Integrate insights]
    HandoffType -->|Parallel| ParallelAgents[Parallel Agents<br/>- Split tasks<br/>- Execute concurrently<br/>- Merge results]

    DelegateAgent --> AgentResponse[Agent Response<br/>- Collect outputs<br/>- Update memory<br/>- Format results]
    ConsultAgent --> AgentResponse
    ParallelAgents --> AgentResponse

    AgentResponse --> MoreProcessing
    MoreProcessing -->|Yes| AgentProcessing
    MoreProcessing -->|No| GenerateResponse[Generate Response<br/>- Format output<br/>- Add explanations<br/>- Include artifacts]

    %% Final Response Path
    GenerateResponse --> ApplySoul2[Apply Overlord Soul<br/>- Maintain tone consistency<br/>- Apply style preferences<br/>- Format for user<br/>- Add personality touches]

    ApplySoul1 --> ApplySoul2

    ApplySoul2 --> ResponseMode{Response<br/>Mode?}

    ResponseMode -->|Async| WebhookDelivery[Webhook Delivery<br/>- Format payload<br/>- Call webhook URL<br/>- Retry on failure]
    ResponseMode -->|Sync| UserPref{User<br/>Preference?}

    UserPref -->|Batch| BatchResponse[Batch Response<br/>- Complete processing<br/>- Return full response<br/>- Single payload]
    UserPref -->|Stream| StreamResponse[Stream Response<br/>- Chunk generation<br/>- Progressive delivery<br/>- Real-time updates]

    WebhookDelivery --> UpdateMemory[Update Memory<br/>- Buffer: conversation<br/>- Working: context<br/>- Long-term: persist]
    BatchResponse --> UpdateMemory
    StreamResponse --> UpdateMemory

    UpdateMemory --> HasArtifacts{Has<br/>Artifacts?}
    HasArtifacts -->|Yes| AttachArtifacts[Attach Artifacts<br/>- PDFs<br/>- Files<br/>- Data URLs]
    HasArtifacts -->|No| FinalizeResponse[Finalize Response<br/>- Add metadata<br/>- Log completion<br/>- Update metrics]

    AttachArtifacts --> FinalizeResponse

    FinalizeResponse --> End([Return to User])

    class Start,End startEnd
    class CheckSession,HasFiles,LongTermCheck,PendingClarification,NeedClarification,IsActionable,AgentSpecified,SOPMatch,ComplexCheck,ComplexCheck2,TimeThreshold,NeedTools,NeedHelp,HandoffType,MoreProcessing,ResponseMode,UserPref,HasArtifacts decision
    class UseSession,CreateSession warning

Phase-by-Phase Breakdown

Phase 1: Entry & Session Management

How requests arrive:

REST API - Primary HTTP interface for web apps
MCP Protocol - Native AI-to-AI communication
SDKs - Python, TypeScript, Go client libraries
CLI - Command-line interface
Embedded - Direct library integration

Session handling:

New user → Generate session ID
Existing user → Load session
         ↓
Track request:
  - request_id: req_abc123
  - user_id: user@example.com
  - session_id: sess_xyz789
  - timestamp: 2025-01-09T10:30:00Z

Request ID reuse:

Multi-turn clarifications reuse same request_id
Enables complete trace of conversation
Simplifies debugging and observability

Phase 2: File Upload Processing

If user uploads files:

Files received
     ↓
Store in /tmp/muxi_uploads/{session_id}/
     ↓
Extract metadata:
  - MIME type
  - File size
  - Content preview
     ↓
Add to context for agent access

Supported files:

Documents (PDF, DOCX, TXT, MD)
Images (PNG, JPG, GIF)
Data (CSV, JSON, YAML)
Code (various languages)

Phase 3: Memory System Initialization

Three-tier memory loading:

1. Smart Buffer Memory

Load last N messages (default: 50)
     ↓
Vector similarity search (if enabled)
     ↓
Auto-summarization for old messages
     ↓
FIFO eviction with importance weighting

Features:

Multiplier system (N × multiplier for effective size)
Preserves critical messages longer
Context window optimization

2. Long-Term Memory (Optional)

Load user profile from PostgreSQL/SQLite
     ↓
User preferences:
  - Communication style
  - Expertise level
  - Response format
     ↓
Historical context:
  - Past interactions
  - Learned patterns
  - Domain knowledge
     ↓
Semantic search:
  - "previous discussions about API design"
  - Relevance threshold: 0.7
  - Time range: 30 days

3. Working Memory

Session state:
  - Current task progress
  - Active file references
  - Tool call results
  - Temporary data (TTL)

Message enhancement with context priority:

=== USER SYNOPSIS ===
[Cached user profile - multi-user mode only]

=== LONG-TERM MEMORIES ===
[Top 3 relevant memories from vector search]

=== RECENT CONVERSATION ===
[Last N messages from buffer memory]

=== CURRENT REQUEST ===
User: [actual message from user]

Why this order?

User synopsis first (identity/preferences)
Long-term memories second (historical patterns)
Recent conversation third (immediate context)
Current request last (highest priority)

Phase 4: Credential Detection & Handling

Critical: Happens BEFORE clarification analysis.

Check for credential needs:
  - SERVICE_USE: "Show my GitHub repos"
  - CREDENTIAL_REQUEST: "Add my GitHub token"
         ↓
User has credentials?
  ├─ Yes → Continue processing
  └─ No → Handle based on mode:
           - Redirect mode: "Please add credentials at https://..."
           - Dynamic mode: "Please provide your GitHub token:"
                   ↓
                User provides token
                   ↓
                Validate via MCP connection
                   ↓
                Store encrypted
                   ↓
                Discover identity ("ranaroussi")
                   ↓
                Retry original request

For multiple accounts:

User:  "Check my GitHub repos"
Agent: "You have 2 GitHub accounts: ranaroussi, lilyautomaze. Which one?"
User:  "ranaroussi"
Agent: [Uses ranaroussi's credentials]

Phase 5: Clarification System

Detects unclear requests:

LLM analyzes message:
  - Missing information?
  - Ambiguous references?
  - Conflicting instructions?
  - Insufficient context?
  - Needs confirmation?
         ↓
Needs clarification?
  ├─ Yes → Ask focused question
  │        Store clarification state
  │        Wait for user response
  │        Reuse same request_id
  └─ No → Continue to routing

Multi-turn clarification:

User:  "Deploy the app"
Agent: "Which app? You have 3: web-api, mobile-backend, admin-panel"
User:  "web-api"
Agent: "Which environment?"
User:  "production"
Agent: [Deploys web-api to production]

All turns share same request_id for complete trace.

Context switch detection:

[Clarifying deployment]
Agent: "Which environment?"
User:  "Actually, can you help me with something else?"
Agent: [Abandons clarification, starts fresh]

Phase 6: Actionability Check

Is the request actionable?

Actionable:
  ✓ "Deploy the app"
  ✓ "Search for API docs"
  ✓ "Create a report"

Not actionable:
  ✗ "That sounds good"
  ✗ "Thanks!"
  ✗ "Okay"

For non-actionable:

Agent responds directly:
"You're welcome! Let me know if you need anything else."
         ↓
Apply soul
         ↓
Return response
         ↓
Done (no agent processing needed)

Phase 7: Intelligent Routing

Priority-based routing:

1. Agent Specified?
   "Ask @researcher to find info..."
   → Route to specified agent

2. SOP Matched?
   "Onboard new customer"
   → Execute "customer-onboarding" SOP

3. Complexity Analysis
   Complexity score ≥ threshold?
   → Multi-agent workflow
   → Single agent routing

Agent Specification

User:  "@researcher search for AI trends"
     ↓
Route directly to researcher agent
Skip SOP matching and complexity analysis

SOP Matching

User message → Semantic similarity search
     ↓
Match SOP templates:
  - "customer-onboarding"
  - "refund-processing"
  - "incident-response"
     ↓
Similarity ≥ threshold?
  ├─ Yes → Execute SOP workflow
  └─ No → Continue to complexity analysis

SOPs override everything when matched - ensures consistent execution.

Complexity Analysis

LLM scores request complexity (0-10):
  - Number of steps required
  - Multiple domains involved
  - Tool requirements
  - Subtask dependencies
         ↓
Score ≥ 7 (configurable threshold)?
  ├─ Yes → Workflow decomposition
  └─ No → Single agent routing

Single agent routing:

Analyze request requirements:
  - Required tools
  - Domain expertise
  - Agent capabilities
         ↓
Score each agent:
  - Capability match
  - Tool access
  - Past performance
         ↓
Route to highest scoring agent

Phase 8: Workflow Decomposition (Complex Requests)

When complexity score ≥ threshold:

LLM decomposes request:
"Analyze codebase, create security audit, file issues"
         ↓
Subtasks:
  1. Code analysis → researcher agent
  2. Security audit → security agent
  3. Create issues → developer agent
  4. Send notification → notifier agent
         ↓
Build task graph:
  - Dependencies (task 2 depends on task 1)
  - Parallel execution (tasks 3 and 4 can run together)
  - Agent assignments
         ↓
Confirm plan (if enabled):
"I'll analyze the code, run a security audit, create
 Linear issues, and notify your team. Proceed?"
         ↓
User approves → Execute workflow
User declines → Revise plan

Parallel execution:

Task 1 (Analysis) → Completes
         ↓
    ┌────┴────┐
Task 2        Task 3
(Audit)       (Issues)
[Parallel execution]
         ↓
    Both complete
         ↓
Task 4 (Notify)

Phase 9: Execution Time Estimation

Before starting execution:

Estimate execution time based on:
  - Task complexity
  - Number of subtasks
  - Tool requirements
  - Historical data
  - Agent workload
         ↓
Estimated time ≥ threshold (e.g., 10 seconds)?
  ├─ Yes → Switch to async mode
  │        Return: "Task started, ID: task_abc123"
  │        Setup webhook
  │        Execute in background
  └─ No → Execute synchronously
          Return response when done

Async notification:

Immediate response:
{
  "status": "processing",
  "task_id": "task_abc123",
  "estimated_time": "45 seconds",
  "webhook_url": "https://your-app.com/webhook"
}

[Later, when done]
POST https://your-app.com/webhook
{
  "task_id": "task_abc123",
  "status": "completed",
  "result": "..."
}

Phase 10: Agent Processing

Agent executes the task:

Agent receives:
  - Enhanced message (with context)
  - System prompt
  - Available tools
  - Memory state
         ↓
Agent processes:
  - Analyzes request
  - Plans approach
  - Decides tool use
         ↓
Need tools?
  ├─ Yes → Call MCP tools
  └─ No → Continue processing
         ↓
Need help from another agent?
  ├─ Yes → A2A communication
  └─ No → Generate response

MCP Tool Execution

Agent: "I need to search the web"
         ↓
Select tool: web-search
         ↓
Prepare arguments: {"query": "AI trends 2025"}
         ↓
Call MCP server
         ↓
Receive results
         ↓
Parse and integrate into response
         ↓
Continue processing or return result

Tool chaining (automatic error recovery):

Call tool → Error: "Repository not found"
         ↓
Agent analyzes error
         ↓
Agent: "I'll create the repository first"
         ↓
Call create_repo tool
         ↓
Retry original operation
         ↓
Success!

A2A Communication (Agent-to-Agent)

Three handoff types:

1. Delegation - Transfer complete task:

Developer Agent: "I need security review"
         ↓
Delegate to Security Agent
         ↓
Security Agent: [Performs complete review]
         ↓
Returns full audit report
         ↓
Developer Agent: [Uses report in response]

2. Consultation - Request specific info:

Backend Agent: "What data format does frontend need?"
         ↓
Consult Frontend Agent
         ↓
Frontend Agent: "JSON with these fields: ..."
         ↓
Backend Agent: [Continues with that knowledge]

3. Parallel - Split work across agents:

Overlord: "Build analytics dashboard"
         ↓
Parallel execution:
  - Data Engineer: Design schema
  - Backend Dev: Build API
  - Frontend Dev: Create UI
  - DevOps: Set up monitoring
         ↓
All agents work simultaneously
         ↓
Aggregate results

Agent Decision Making

Agent continuously evaluates:
  - Have I answered the question?
  - Do I need more information?
  - Should I call a tool?
  - Should I consult another agent?
  - Am I stuck (need to escalate)?
         ↓
Decision → Action → Evaluate → Repeat

Phase 11: Soul Application

The Overlord applies consistent soul:

Agent responses (may vary in style)
         ↓
Overlord soul instructions:
  - Tone: "professional but friendly"
  - Style: "concise and clear"
  - Voice: "active voice, no jargon"
  - Format: "markdown with headers"
         ↓
All responses transformed to match soul
         ↓
Consistent user experience regardless of agents used

Example:

Agent: "The analysis has been completed. Results show..."
         ↓
After soul:
"I've analyzed the data for you. Here's what I found..."

Soul maintains:

Consistent tone across agents
User's preferred communication style
Format preferences (verbose vs concise)
Language/terminology level

Phase 12: Response Generation

Three delivery modes:

1. Synchronous Batch

Wait for complete processing
         ↓
Generate full response
         ↓
Return JSON:
{
  "content": "Complete response text...",
  "request_id": "req_abc123",
  "agents_used": ["researcher", "writer"],
  "tools_called": ["web-search", "file-read"],
  "duration_ms": 3420
}

2. Synchronous Streaming

Open SSE connection
         ↓
Stream chunks as generated:

event: chunk
data: {"text": "I've analyzed"}

event: chunk
data: {"text": " the data"}

event: tool_start
data: {"tool": "web-search"}

event: tool_end
data: {"tool": "web-search", "result": "..."}

event: chunk
data: {"text": " and found..."}

event: done
data: {"request_id": "req_abc123"}

User sees response appear in real-time (typewriter effect).

3. Asynchronous Webhook

Immediate response:
{
  "status": "processing",
  "task_id": "task_abc123",
  "estimated_time": "60 seconds"
}
         ↓
[Processing happens in background]
         ↓
When complete, POST to webhook:
{
  "task_id": "task_abc123",
  "status": "completed",
  "result": "...",
  "duration_ms": 58230
}

Phase 13: Artifact Handling

If agent generated files:

Agent created:
  - report.pdf
  - chart.png
  - data.csv
         ↓
Process artifacts:
  - Generate base64 data URLs
  - Extract metadata (MIME, size)
  - Create download links
         ↓
Attach to response:
{
  "content": "Here's your report",
  "artifacts": [
    {
      "name": "report.pdf",
      "type": "application/pdf",
      "size": 245678,
      "data_url": "data:application/pdf;base64,..."
    }
  ]
}

Artifact types:

Documents (PDF, DOCX)
Charts (PNG, SVG)
Data files (CSV, JSON)
Code files (Python, JavaScript)

Phase 14: Memory Updates

After response delivered:

1. Buffer Memory Update:
   - Add user message
   - Add agent response
   - Update vector index (if enabled)
   - Trigger summarization (if buffer near capacity)

2. Working Memory Update:
   - Increment request count
   - Update last interaction time
   - Clear temporary tool outputs
   - Update task state

3. Long-Term Memory Update (if enabled):
   - Extract learnings from interaction
   - Update user preferences
   - Store interaction metadata
   - Update patterns and trends

What gets learned:

User preferences ("prefers Python over JavaScript")
Interaction patterns (asks for detailed explanations)
Domain knowledge (works on e-commerce platform)
Tool usage (frequently uses GitHub integration)

Phase 15: Observability Events

Events emitted throughout lifecycle:

Phase	Event	When
Entry	`request.received`	Request arrives
Session	`session.created`	New session
Memory	`memory.loaded`	Context loaded
Credential	`credential.needed`	Credentials required
Clarification	`clarification.started`	Unclear request
Routing	`agent.selected`	Agent chosen
SOP	`sop.matched`	SOP triggered
Workflow	`workflow.decomposed`	Complex request
Agent	`agent.processing`	Agent starts
Tool	`tool.invoked`	MCP tool called
A2A	`agent.delegated`	Agent-to-agent
Response	`response.generated`	Response ready
Streaming	`chunk.sent`	Chunk streamed
Memory	`memory.updated`	Memory persisted
Complete	`request.completed`	Done

Use for:

Debugging request flow
Performance monitoring
Usage analytics
Error tracking

Performance Timing

Typical latency breakdown:

Phase	Typical Time	Details
Auth & Routing	1-5ms	Minimal overhead
Session Load	5-20ms	Retrieve session
Memory Loading	10-100ms	3-tier memory
Credential Check	1-10ms	Cache hit: <1ms
Clarification	500ms-3s	LLM analysis
Complexity Analysis	500ms-2s	LLM scoring
Agent Selection	10-50ms	Scoring logic
LLM Call	500ms-10s	Model dependent
Tool Execution	100ms-5s	MCP tools
A2A Communication	500ms-10s	Remote agent
Memory Update	5-50ms	Database write
Total (simple)	1-15s	Single agent
Total (complex)	10-60s	Multi-agent

Error Handling

Errors handled at each phase:

Phase	Error Type	Handling
Auth	Invalid API key	401 Unauthorized
Session	Session not found	Create new session
Memory	Database error	Log, use buffer only
Clarification	Too many turns	Force processing
Routing	No agent match	Use default agent
Agent	Agent failure	Try fallback agent
LLM	API timeout	Retry with backoff
Tool	Tool error	Agent sees error, adapts
A2A	Agent unavailable	Return error to agent
Response	Streaming failed	Fallback to batch
Memory	Update failed	Log, continue

Graceful degradation:

Long-term memory fails → Use buffer only
Tool fails → Agent works around it
Streaming fails → Return batch response
Agent fails → Try another agent

Request Lifecycle Examples

Example 1: Simple Query

User:  "What's the weather in SF?"
         ↓
Session: Load/create
Memory: Load buffer (recent messages)
Credentials: None needed
Clarification: None needed (clear request)
Actionability: Actionable
Routing: Auto-route to assistant agent
Agent: Calls weather tool
Tool: Returns weather data
Agent: Formats response
Soul: Applied
Response: "It's 68°F and sunny in San Francisco"
Memory: Updated
         ↓
Total time: ~2 seconds

Example 2: Ambiguous Request

User:  "Deploy the app"
         ↓
Session: Existing
Memory: Load context
Credentials: None needed
Clarification: Multiple apps exist
Agent: "Which app? web-api, mobile-backend, admin-panel"
User:  "web-api"
Clarification: "Which environment?"
User:  "production"
Actionability: Now clear
Routing: Auto-route to devops agent
Agent: Runs deployment
Tool: kubectl deploy
Agent: Confirms success
Response: "Deployed web-api to production. Version 1.2.3 is now live."
Memory: Updated
         ↓
Total time: ~15 seconds (with clarifications)

Example 3: Complex Multi-Agent Workflow

User:  "Analyze my codebase, run security audit, create issues for problems"
         ↓
Session: Existing
Memory: Load context + user preferences
Credentials: GitHub credentials needed → Already stored
Clarification: None (clear intent)
Actionability: Actionable
Routing: Complexity score: 9/10 → Workflow decomposition
Workflow:
  Task 1: Code analysis → researcher agent
  Task 2: Security audit → security agent (depends on task 1)
  Task 3: Create issues → developer agent (depends on task 2)
Execution:
  - Researcher: Scans codebase, generates report
  - Security: Reviews report, finds 5 vulnerabilities
  - Developer: Creates 5 GitHub issues
Aggregation: Combine all results
Soul: Applied
Response: "I've analyzed your codebase and found 5 security issues.
           Created GitHub issues #123-#127 for each one. Here's the summary..."
Artifacts: security-audit.pdf attached
Memory: Updated
         ↓
Total time: ~45 seconds (async mode)

Example 4: Credential Collection

User:  "Show my GitHub repos"
         ↓
Session: New
Memory: Empty buffer
Credentials: Check → User has no GitHub credentials
Mode: Dynamic
Agent: "I need access to GitHub. Please provide your personal access token:"
User:  "ghp_xxxxxxxxxxxxx"
Validation: Connects to GitHub, validates token
Identity: Discovers username: "ranaroussi"
Storage: Encrypts and stores credential
Retry: Original request with credentials
Agent: Calls GitHub API with user's token
Tool: Returns repositories
Response: "You have 15 repositories: ..."
Memory: Updated with credential info
         ↓
Total time: ~10 seconds (with credential collection)

Key Takeaways

What makes this lifecycle special:

Intelligence at every stage - LLM analysis drives routing, not hardcoded rules
Context preservation - 3-tier memory ensures coherent conversations
Automatic error recovery - Agents adapt when tools fail
Seamless complexity handling - Same pipeline for simple and complex requests
User privacy - Per-user credentials and memory isolation
Flexible delivery - Batch, streaming, or async based on needs
Consistent experience - Soul ensures uniform style
Complete observability - Events at every stage for debugging

The result: A request processing system that feels intelligent, not mechanical.

Learn More

How Orchestration Works - Deep dive into the Overlord
Memory Internals - Technical memory system details
Clarification System - How ambiguity is handled
Agent Collaboration - A2A communication patterns
Observability - Monitoring and debugging

We use cookies