Response Formats Deep Dive

Technical implementation of LLM soul-based formatting

Ask ChatGPT Ask Claude.ai Open in Cursor

MUXI's response format system uses LLM soul instructions combined with post-processing validation to generate naturally formatted content while maintaining technical correctness.

Architecture

Hybrid Approach

User Request
     ↓
Soul Enhancement (LLM instructions)
     ↓
Agent Processing (natural generation)
     ↓
Post-Processing Validation (JSON/HTML only)
     ↓
Formatted Response

Why hybrid?

LLM instructions: Natural, contextually appropriate formatting
Post-processing: Ensures technical validity for structured formats

Soul Enhancement

How It Works

The system modifies the agent's system prompt based on format:

# Internal implementation
def _get_format_instruction(format: str) -> str:
    if format == "markdown":
        return """
        Format your response using proper markdown with headers (# ## ###),
        bullet points, bold/italic text, and code blocks where appropriate.
        Use markdown syntax for structure and emphasis.
        """

    elif format == "text":
        return """
        Format your response as plain text with no markdown formatting,
        special characters, or HTML. Use simple text formatting like
        line breaks and spacing for structure.
        """

    elif format == "json":
        return """
        Format your response as valid JSON. Use appropriate data structures
        (objects, arrays, strings, numbers, booleans). Ensure all JSON is
        properly formatted and parseable.
        """

    elif format == "html":
        return """
        Format your response as valid HTML with proper semantic tags like
        <h1>, <h2>, <p>, <ul>, <li>, <strong>, <em>, and <code>. Include
        proper structure and ensure all tags are properly closed.
        """

Soul Injection

# Simplified flow
enhanced_soul = f"{base_soul}

{format_instruction}"

llm_response = await llm.chat(
    messages=messages,
    system=enhanced_soul  # Enhanced with format instructions
)

The LLM naturally generates formatted responses based on enhanced instructions.

Post-Processing Validation

JSON Validation

def validate_json_response(content: str) -> str:
    """Validate and reformat JSON response."""
    try:
        # Parse to ensure validity
        parsed = json.loads(content)

        # Re-serialize with consistent formatting
        return json.dumps(parsed, indent=2, ensure_ascii=False)

    except json.JSONDecodeError as e:
        # Invalid JSON - log and return as-is
        logger.warning(f"Invalid JSON response: {e}")
        return content

Why validate?

Ensures response is parseable
Consistent formatting (indentation, spacing)
Catches LLM errors early

HTML Validation

from bs4 import BeautifulSoup

def validate_html_response(content: str) -> str:
    """Validate and reformat HTML response."""
    try:
        # Parse with BeautifulSoup
        soup = BeautifulSoup(content, 'html.parser')

        # Validate structure
        if not soup.find():
            raise ValueError("No valid HTML tags found")

        # Return prettified HTML
        return soup.prettify()

    except Exception as e:
        logger.warning(f"Invalid HTML response: {e}")
        return content

Validation includes:

Proper tag nesting
Closed tags
Semantic structure
Auto-correction of minor issues

MUXI streams tokens as the model generates them. Use SSE for simple, one-way streams (chat, dashboards) or WebSockets for bidirectional interactions. Long tasks can start streaming, then hand off to async workflows with progress events.

Protocols: SSE, WebSockets
When to use: Chat UIs, live dashboards, long-running jobs needing updates
See also: Streaming, Async Operations

Markdown

No validation - LLMs are good at markdown naturally.

Features:

Headers (#, ##, ###)
Bold and italic
Code blocks with syntax highlighting
Lists (bulleted and numbered)
Links and references
Tables

Example response:

# Analysis Results

## Key Findings

1. **Performance**: 95% improvement
2. **Cost**: Reduced by 40%

### Recommendations

- Optimize caching strategy
- Scale horizontally for peak loads

bash

Deploy command

kubectl apply -f deployment.yaml

Plain Text

No validation - flexible by design.

Features:

Clean, unformatted text
Line breaks for structure
No special characters
Maximum compatibility

Example response:

Analysis Results

Key Findings:

1. Performance improved by 95%
2. Cost reduced by 40%

Recommendations:
- Optimize caching strategy
- Scale horizontally for peak loads

Deployment: kubectl apply -f deployment.yaml

JSON

Validated and reformatted.

Structure:

{
  "content": "The actual response content",
  "type": "response",
  "format": "json",
  "metadata": {
    // Optional metadata
  }
}

Validation:

Must be valid JSON
Parsed with json.loads()
Re-serialized with consistent formatting

Error handling:

if validation_fails:
    # Return original content with error metadata
    return {
        "content": original_content,
        "error": "Invalid JSON",
        "format": "json"
    }

HTML

Validated with BeautifulSoup.

Allowed tags:

Structural: div, span, section
Headers: h1 through h6
Text: p, strong, em, code
Lists: ul, ol, li
Links: a
Pre-formatted: pre, code

Not allowed:

Scripts: script
Styles: style (inline styles ok)
Forms: form, input (security)
Iframes: iframe (security)

Validation:

# Auto-closes unclosed tags
<p>Text<p>More  →  <p>Text</p><p>More</p>

# Fixes nesting
<strong><em>Text</strong></em>  →  <strong><em>Text</em></strong>

# Removes dangerous tags
<p>Text<script>alert()</script></p>  →  <p>Text</p>

Configuration

Formation YAML

# formation.afs
overlord:
  soul: "You are a helpful assistant"

  response:
    format: "markdown"           # Default format
    streaming: true              # Works with all formats
    validation: true             # Enable post-processing (JSON/HTML)

Runtime Override

from muxi.runtime.formation import Formation

formation = Formation()
await formation.load("formation.afs")

# Get overlord instance
overlord = await formation.start_overlord()

# Override format
overlord.response_format = "json"
response = await overlord.chat("Extract data...")

# Reset to formation default
overlord.response_format = None

API Request

curl -X POST http://localhost:8001/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Analyze this data",
    "format": "json"
  }'

Streaming Behavior

Format-Specific Streaming

Markdown and Text:

event: chunk
data: {"text": "# Header

"}

event: chunk
data: {"text": "Some **bold** text"}

Streams naturally, readable as it arrives.

JSON:

event: chunk
data: {"text": "{\"name\": "}

event: chunk
data: {"text": "\"John\""}

event: chunk
data: {"text": ", \"email\": "}

event: chunk
data: {"text": "\"john@example.com\""}

event: chunk
data: {"text": "}"}

Client must collect all chunks, then parse complete JSON.

HTML:

event: chunk
data: {"text": "<h1>"}

event: chunk
data: {"text": "Header"}

event: chunk
data: {"text": "</h1>"}

Similar to JSON - collect and validate at end.

Performance Considerations

Token Overhead

Format instructions add ~50-100 tokens to system prompt:

Format	Instruction Tokens	Impact
Markdown	~60 tokens	Minimal
Text	~50 tokens	Minimal
JSON	~70 tokens	Minimal
HTML	~80 tokens	Minimal

Cost impact: <0.01% for typical conversations.

Validation Overhead

Operation	Time	When
JSON parse	<1ms	Every JSON response
HTML parse	1-5ms	Every HTML response
Markdown	0ms	No validation
Text	0ms	No validation

Latency impact: Negligible (<0.1% of total).

Error Handling

Invalid JSON

# Agent returns invalid JSON
response_text = "{name: 'John'}"  # Missing quotes

# Validation fails
try:
    json.loads(response_text)
except JSONDecodeError:
    # Log error, return wrapped response
    return {
        "content": response_text,
        "error": "Invalid JSON returned by agent",
        "format": "json"
    }

Invalid HTML

# Agent returns malformed HTML
response_text = "<p>Text<strong>Bold"  # Unclosed tags

# BeautifulSoup auto-corrects
soup = BeautifulSoup(response_text, 'html.parser')
corrected = soup.prettify()
# Result: <p>Text<strong>Bold</strong></p>

BeautifulSoup is forgiving - auto-closes tags, fixes nesting.

Best Practices

1. Clear Schema Expectations

agents:
  - id: extractor
    system_message: |
      Extract customer information as JSON with fields:
      - name (string, required)
      - email (string, required)
      - phone (string, optional)
      - company (string, optional)

      Use null for missing optional fields.

2. Validation in Code

from pydantic import BaseModel

class Customer(BaseModel):
    name: str
    email: str
    phone: str | None = None
    company: str | None = None

# Validate response
response = await overlord.chat("Extract customer info...")
data = json.loads(response.content)
customer = Customer(**data)  # Validates schema

3. Retry on Failure

max_retries = 3
for attempt in range(max_retries):
    response = await overlord.chat(message)

    try:
        data = json.loads(response.content)
        break  # Success
    except JSONDecodeError:
        if attempt == max_retries - 1:
            raise  # Give up after retries
        # Retry with clarification
        message = f"{message}

Previous response was invalid JSON. Please return valid JSON."

Future Enhancements

Planned Features

Pydantic Schema Enforcement yaml response: format: json schema: "schemas/customer.json"
Custom Validators yaml response: format: json validator: "validators/custom.py"
Format Auto-Detection yaml response: format: auto # Detect from context

Debugging

Enable Validation Logs

logging:
  level: debug
  validators: true

Logs all validation attempts and failures.

Inspect Raw Responses

response = await overlord.chat("Extract data...")
print("Raw:", response.raw_content)       # Before validation
print("Validated:", response.content)    # After validation

Learn More

Structured Output Concept - User-facing explanation
Tools & MCP - Tools also return structured data
API Reference - Complete API documentation

Response Formats Deep Dive

Technical implementation of LLM soul-based formatting

Architecture

Hybrid Approach

Soul Enhancement

How It Works

Soul Injection

Post-Processing Validation

JSON Validation

HTML Validation

Format-Specific Behavior

Real-Time Streaming

Markdown

Deploy command

Plain Text

JSON

HTML

Configuration

Formation YAML

Runtime Override

API Request

Streaming Behavior

Format-Specific Streaming

Performance Considerations

Token Overhead

Validation Overhead

Error Handling

Invalid JSON

Invalid HTML

Best Practices

1. Clear Schema Expectations

2. Validation in Code

3. Retry on Failure

Future Enhancements

Planned Features

Debugging

Enable Validation Logs

Inspect Raw Responses

Learn More

Deploy command

We use cookies