Streaming Responses
Real-time responses as they're generated
MUXI streams responses using Server-Sent Events (SSE), reducing time-to-first-token from seconds to milliseconds.
Why Streaming?
| Metric | Without Streaming | With Streaming |
|---|---|---|
| Time to first token | 2-10 seconds | ~500ms |
| User experience | Wait... wait... wall of text | Typewriter effect |
| Memory usage | Buffer entire response | Chunk by chunk |
Performance baseline (typical):
- Time to first token: 400-800ms
- Subsequent chunks: 50-150ms
- End-to-end (500-1500 tokens): 2-6s depending on model
Enable Streaming
# formation.afs
overlord:
response:
streaming: true
SSE Format
Responses arrive as Server-Sent Events:
event: chunk
data: {"text": "Hello", "agent": "assistant"}
event: chunk
data: {"text": " there", "agent": "assistant"}
event: chunk
data: {"text": "!", "agent": "assistant"}
event: done
data: {"session_id": "sess_abc123"}
Using Streaming
curl -N http://localhost:8001/v1/chat \
-H "Accept: text/event-stream" \
-H "X-Muxi-Client-Key: fmc_..." \
-d '{"message": "Tell me a story"}'
for chunk in formation.chat_stream("Tell me a story"):
print(chunk.text, end="", flush=True)
print() # Newline at end
for await (const chunk of formation.chatStream('Tell me a story')) {
process.stdout.write(chunk.text);
}
console.log(); // Newline at end
stream, _ := formation.ChatStream("Tell me a story")
for chunk := range stream.Chunks {
fmt.Print(chunk.Text)
}
fmt.Println() // Newline at end
const response = await fetch('http://localhost:8001/v1/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Accept': 'text/event-stream',
'X-Muxi-Client-Key': 'fmc_...'
},
body: JSON.stringify({ message: 'Tell me a story' })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE format and update UI
console.log(chunk);
}
Event Types
| Event | When | Data |
|---|---|---|
chunk
| Text generated | {"text": "...", "agent": "..."}
|
tool_start
| Tool invoked | {"tool": "...", "args": {...}}
|
tool_end
| Tool completed | {"tool": "...", "result": "..."}
|
error
| Error occurred | {"error": "...", "code": "..."}
|
done
| Stream complete | {"session_id": "..."}
|
Tool Events
When an agent uses tools, you see the process:
event: chunk
data: {"text": "Let me search for that..."}
event: tool_start
data: {"tool": "web-search", "args": {"query": "AI trends 2025"}}
event: tool_end
data: {"tool": "web-search", "result": "Found 10 results..."}
event: chunk
data: {"text": "Based on my research, "}
event: chunk
data: {"text": "here are the latest AI trends..."}
This lets you show users what's happening:
- "Searching the web..."
- "Querying database..."
- "Reading file..."
React Integration
function Chat() {
const [messages, setMessages] = useState<string[]>([]);
const [currentMessage, setCurrentMessage] = useState('');
const sendMessage = async (text: string) => {
setMessages(prev => [...prev, You: ${text}]);
setCurrentMessage('');
for await (const chunk of formation.chatStream(text)) {
setCurrentMessage(prev => prev + chunk.text);
}
setMessages(prev => [...prev, Assistant: ${currentMessage}]);
setCurrentMessage('');
};
return (
<div>
{messages.map((m, i) => <p key={i}>{m}</p>)}
{currentMessage && <p>Assistant: {currentMessage}▌</p>}
</div>
);
}
Connection Management
Timeouts
Configure server timeouts for long responses:
server:
write_timeout: 60s
Keep-Alive
MUXI sends periodic heartbeats:
: heartbeat
event: chunk
data: {"text": "..."}
Heartbeats prevent proxy/load balancer timeouts.
Client Reconnection
If connection drops, resume with Last-Event-ID:
curl -H "Last-Event-ID: 42" ...
Fallback to Non-Streaming
If streaming fails or is disabled:
curl http://localhost:8001/v1/chat \
-H "Accept: application/json" \
-d '{"message": "Hello"}'
Returns complete JSON:
{
"text": "Hello! How can I help?",
"agent": "assistant",
"session_id": "sess_abc123"
}
Next Steps
Build Custom UI - Frontend streaming integration
Async Operations - Background task processing