A.I. Smart Router技能使用说明

2026-03-31 新闻来源：网淘吧围观:175

电脑广告

手机广告

A.I.智能路由

通过分级分类、自动故障切换处理和成本优化，智能地将请求路由到最优的AI模型。

工作原理（默认静默模式）

路由器透明运行——用户正常发送消息，即可获得最适合其任务的最佳模型响应。无需特殊指令。

A.I. Smart Router

可选可见性：在任何消息中包含[显示路由]即可查看路由决策。

分级分类系统

路由器采用三级决策流程：

┌─────────────────────────────────────────────────────────────────┐
│                    TIER 1: INTENT DETECTION                      │
│  Classify the primary purpose of the request                     │
├─────────────────────────────────────────────────────────────────┤
│  CODE        │ ANALYSIS    │ CREATIVE   │ REALTIME  │ GENERAL   │
│  write/debug │ research    │ writing    │ news/live │ Q&A/chat  │
│  refactor    │ explain     │ stories    │ X/Twitter │ translate │
│  review      │ compare     │ brainstorm │ prices    │ summarize │
└──────┬───────┴──────┬──────┴─────┬──────┴─────┬─────┴─────┬─────┘
       │              │            │            │           │
       ▼              ▼            ▼            ▼           ▼
┌─────────────────────────────────────────────────────────────────┐
│                  TIER 2: COMPLEXITY ESTIMATION                   │
├─────────────────────────────────────────────────────────────────┤
│  SIMPLE (Tier $)        │ MEDIUM (Tier $$)    │ COMPLEX (Tier $$$)│
│  • One-step task        │ • Multi-step task   │ • Deep reasoning  │
│  • Short response OK    │ • Some nuance       │ • Extensive output│
│  • Factual lookup       │ • Moderate context  │ • Critical task   │
│  → Haiku/Flash          │ → Sonnet/Grok/GPT   │ → Opus/GPT-5      │
└──────────────────────────┴─────────────────────┴───────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────┐
│                TIER 3: SPECIAL CASE OVERRIDES                    │
├─────────────────────────────────────────────────────────────────┤
│  CONDITION                           │ OVERRIDE TO              │
│  ─────────────────────────────────────┼─────────────────────────│
│  Context >100K tokens                │ → Gemini Pro (1M ctx)    │
│  Context >500K tokens                │ → Gemini Pro ONLY        │
│  Needs real-time data                │ → Grok (regardless)      │
│  Image/vision input                  │ → Opus or Gemini Pro     │
│  User explicit override              │ → Requested model        │
└──────────────────────────────────────┴──────────────────────────┘

意图检测模式

代码意图

关键词：编写、代码、调试、修复、重构、实现、函数、类、脚本、API、错误、bug、编译、测试、PR、提交
提及的文件扩展名：.py、.js、.ts、.go、.rs、.java等
输入中包含的代码块

分析意图

关键词：分析、解释、比较、研究、理解、为什么、如何、评估、评定、审查、调查、检查
长篇幅问题
"帮我理解..."

创意意图

关键词：写（故事/诗歌/文章）、创作、头脑风暴、想象、设计、起草、构思
虚构/叙事类请求
营销/文案类请求

实时意图

关键词：现在、今天、当前、最新、趋势、新闻、正在发生、直播、价格、比分、天气
X/Twitter提及
股票/加密货币代码
体育比分

通用意图（默认）

简单问答
翻译
摘要
对话交流

混合意图（检测到多个意图）

当请求包含多个明确意图时（例如："编写代码来分析这些数据并富有创意地解释它"）：

识别主要意图——主要交付成果是什么？
路由至能力最强的模型— 混合任务需要多功能性
默认使用COMPLEX复杂度— 多意图 = 多步骤

例如：

"编写代码并解释其工作原理" → 代码（主要）+ 分析 → 路由到Opus
"总结这个内容并获取其最新消息" → 实时性优先 → Grok
"基于真实时事创作故事" → 实时性 + 创造性 → Grok（实时性优先）

语言处理

非英语请求正常处理——所有支持的模型都具备多语言能力：

模型	非英语支持
Opus/Sonnet/Haiku	优秀（100+种语言）
GPT-5	优秀（100+种语言）
Gemini Pro/Flash	优秀（100+种语言）
Grok	良好（主要语言）

意图检测仍能正常工作因为：

关键词模式包括常见的非英语等效词
通过文件扩展名、代码块（与语言无关）检测代码意图
通过查询长度估算复杂度（跨语言适用）

边界情况：如果因语言原因意图不明确，则默认使用 GENERAL 意图和 MEDIUM 复杂度。

复杂度信号

简单复杂度 ($)

短查询 (<50 词)
单个问号
"快速问题"、"直接告诉我"、"简要说明"
是/否格式
单位换算、定义

中等复杂度 ($$)

中等长度查询 (50-200 词)
涉及多个方面
"解释"、"描述"、"比较"
提供了一些上下文

复杂复杂度 ($$$)

长查询 (>200 词) 或复杂任务
"逐步地"、"彻底地"、"详细地"
多部分问题
关键/重要限定词
研究、分析或创造性工作

路由矩阵

意图	简单	中等	复杂
代码	Sonnet	Opus	Opus
分析	Flash	GPT-5	Opus
创意	Sonnet	Opus	Opus
实时	Grok	Grok	Grok-3
通用	Flash	十四行诗	作品

令牌耗尽与自动模型切换

当模型在会话中途变得不可用（令牌配额耗尽、达到速率限制、API错误）时，路由器会自动切换到下一个最佳可用模型并通知用户。

通知格式

当因耗尽而发生模型切换时，用户会收到通知：

┌─────────────────────────────────────────────────────────────────┐
│  ⚠️ MODEL SWITCH NOTICE                                         │
│                                                                  │
│  Your request could not be completed on claude-opus-4-5         │
│  (reason: token quota exhausted).                               │
│                                                                  │
│  ✅ Request completed using: anthropic/claude-sonnet-4-5        │
│                                                                  │
│  The response below was generated by the fallback model.        │
└─────────────────────────────────────────────────────────────────┘

切换原因

原因	描述
`令牌配额耗尽`	达到每日/每月令牌限制
`超过速率限制`	每分钟请求过多
`超出上下文窗口`	输入对于模型而言过大
`API超时`	模型响应时间过长
`API错误`	提供商返回错误
`模型不可用`	模型暂时离线

实施

def execute_with_fallback(primary_model: str, fallback_chain: list[str], request: str) -> Response:
    """
    Execute request with automatic fallback and user notification.
    """
    attempted_models = []
    switch_reason = None
    
    # Try primary model first
    models_to_try = [primary_model] + fallback_chain
    
    for model in models_to_try:
        try:
            response = call_model(model, request)
            
            # If we switched models, prepend notification
            if attempted_models:
                notification = build_switch_notification(
                    failed_model=attempted_models[0],
                    reason=switch_reason,
                    success_model=model
                )
                return Response(
                    content=notification + "\n\n---\n\n" + response.content,
                    model_used=model,
                    switched=True
                )
            
            return Response(content=response.content, model_used=model, switched=False)
            
        except TokenQuotaExhausted:
            attempted_models.append(model)
            switch_reason = "token quota exhausted"
            log_fallback(model, switch_reason)
            continue
            
        except RateLimitExceeded:
            attempted_models.append(model)
            switch_reason = "rate limit exceeded"
            log_fallback(model, switch_reason)
            continue
            
        except ContextWindowExceeded:
            attempted_models.append(model)
            switch_reason = "context window exceeded"
            log_fallback(model, switch_reason)
            continue
            
        except APITimeout:
            attempted_models.append(model)
            switch_reason = "API timeout"
            log_fallback(model, switch_reason)
            continue
            
        except APIError as e:
            attempted_models.append(model)
            switch_reason = f"API error: {e.code}"
            log_fallback(model, switch_reason)
            continue
    
    # All models exhausted
    return build_exhaustion_error(attempted_models)


def build_switch_notification(failed_model: str, reason: str, success_model: str) -> str:
    """Build user-facing notification when model switch occurs."""
    return f"""⚠️ **MODEL SWITCH NOTICE**

Your request could not be completed on `{failed_model}` (reason: {reason}).

✅ **Request completed using:** `{success_model}`

The response below was generated by the fallback model."""


def build_exhaustion_error(attempted_models: list[str]) -> Response:
    """Build error when all models are exhausted."""
    models_tried = ", ".join(attempted_models)
    return Response(
        content=f"""❌ **REQUEST FAILED**

Unable to complete your request. All available models have been exhausted.

**Models attempted:** {models_tried}

**What you can do:**
1. **Wait** — Token quotas typically reset hourly or daily
2. **Simplify** — Try a shorter or simpler request
3. **Check status** — Run `/router status` to see model availability

If this persists, your human may need to check API quotas or add additional providers.""",
        model_used=None,
        switched=False,
        failed=True
    )

令牌耗尽时的备用优先级

当某个模型耗尽时，路由器会为相同任务类型选择下一个最佳模型

：	原始模型
备用优先级（同等能力）	Opus
Sonnet → GPT-5 → Grok-3 → Gemini Pro	Sonnet
GPT-5 → Grok-3 → Opus → Haiku	GPT-5
Sonnet → Opus → Grok-3 → Gemini Pro	Gemini Pro
Flash → GPT-5 → Opus → Sonnet	Grok-2/3

（警告：无实时备用方案可用）

用户确认

模型切换后，代理应在响应中注明：
原始模型不可用
响应质量可能与原始模型的典型输出存在差异

这确保了透明度并设定了适当的预期

流式响应与回退机制

使用流式响应时，回退处理需要特别考虑：

async def execute_with_streaming_fallback(primary_model: str, fallback_chain: list[str], request: str):
    """
    Handle streaming responses with mid-stream fallback.
    
    If a model fails DURING streaming (not before), the partial response is lost.
    Strategy: Don't start streaming until first chunk received successfully.
    """
    models_to_try = [primary_model] + fallback_chain
    
    for model in models_to_try:
        try:
            # Test with non-streaming ping first (optional, adds latency)
            # await test_model_availability(model)
            
            # Start streaming
            stream = await call_model_streaming(model, request)
            first_chunk = await stream.get_first_chunk(timeout=10_000)  # 10s timeout for first chunk
            
            # If we got here, model is responding — continue streaming
            yield first_chunk
            async for chunk in stream:
                yield chunk
            return  # Success
            
        except (FirstChunkTimeout, StreamError) as e:
            log_fallback(model, str(e))
            continue  # Try next model
    
    # All models failed
    yield build_exhaustion_error(models_to_try)

关键洞察：在确认使用某个模型前，等待其首个数据块。如果首个数据块超时，则应在向用户显示任何部分响应前进行回退

重试时间配置

RETRY_CONFIG = {
    "initial_timeout_ms": 30_000,     # 30s for first attempt
    "fallback_timeout_ms": 20_000,    # 20s for fallback attempts (faster fail)
    "max_retries_per_model": 1,       # Don't retry same model
    "backoff_multiplier": 1.5,        # Not used (no same-model retry)
    "circuit_breaker_threshold": 3,   # Failures before skipping model entirely
    "circuit_breaker_reset_ms": 300_000  # 5 min before trying failed model again
}

熔断机制：如果一个模型在5分钟内失败3次，则在接下来的5分钟内完全跳过该模型。这可以避免反复访问已宕机的服务

回退链

当首选模型失败时（如达到速率限制、API宕机、错误等），级联至下一个选项：

代码任务

Opus → Sonnet → GPT-5 → Gemini Pro

分析任务

Opus → GPT-5 → Gemini Pro → Sonnet

创意任务

Opus → GPT-5 → Sonnet → Gemini Pro

实时任务

Grok-2 → Grok-3 → (warn: no real-time fallback)

通用任务

Flash → Haiku → Sonnet → GPT-5

长文本处理（按规模分级）

┌─────────────────────────────────────────────────────────────────┐
│                  LONG CONTEXT FALLBACK CHAIN                     │
├─────────────────────────────────────────────────────────────────┤
│  TOKEN COUNT        │ FALLBACK CHAIN                            │
│  ───────────────────┼───────────────────────────────────────────│
│  128K - 200K        │ Opus (200K) → Sonnet (200K) → Gemini Pro  │
│  200K - 1M          │ Gemini Pro → Flash (1M) → ERROR_MESSAGE   │
│  > 1M               │ ERROR_MESSAGE (no model supports this)    │
└─────────────────────┴───────────────────────────────────────────┘

实现方式：

def handle_long_context(token_count: int, available_models: dict) -> str | ErrorMessage:
    """Route long-context requests with graceful degradation."""
    
    # Tier 1: 128K - 200K tokens (Opus/Sonnet can handle)
    if token_count <= 200_000:
        for model in ["opus", "sonnet", "haiku", "gemini-pro", "flash"]:
            if model in available_models and get_context_limit(model) >= token_count:
                return model
    
    # Tier 2: 200K - 1M tokens (only Gemini)
    elif token_count <= 1_000_000:
        for model in ["gemini-pro", "flash"]:
            if model in available_models:
                return model
    
    # Tier 3: > 1M tokens (nothing available)
    # Fall through to error
    
    # No suitable model found — return helpful error
    return build_context_error(token_count, available_models)


def build_context_error(token_count: int, available_models: dict) -> ErrorMessage:
    """Build a helpful error message when no model can handle the input."""
    
    # Find the largest available context window
    max_available = max(
        (get_context_limit(m) for m in available_models),
        default=0
    )
    
    # Determine what's missing
    missing_models = []
    if "gemini-pro" not in available_models and "flash" not in available_models:
        missing_models.append("Gemini Pro/Flash (1M context)")
    if token_count <= 200_000 and "opus" not in available_models:
        missing_models.append("Opus (200K context)")
    
    # Format token count for readability
    if token_count >= 1_000_000:
        token_display = f"{token_count / 1_000_000:.1f}M"
    else:
        token_display = f"{token_count // 1000}K"
    
    return ErrorMessage(
        title="Context Window Exceeded",
        message=f"""Your input is approximately **{token_display} tokens**, which exceeds the context window of all currently available models.

**Required:** Gemini Pro (1M context) {"— currently unavailable" if "gemini-pro" not in available_models else ""}
**Your max available:** {max_available // 1000}K tokens

**Options:**
1. **Wait and retry** — Gemini may be temporarily down
2. **Reduce input size** — Remove unnecessary content to fit within {max_available // 1000}K tokens
3. **Split into chunks** — I can process your input sequentially in smaller pieces

Would you like me to help split this into manageable chunks?""",
        
        recoverable=True,
        suggested_action="split_chunks"
    )

错误输出示例：

⚠️ Context Window Exceeded

Your input is approximately **340K tokens**, which exceeds the context 
window of all currently available models.

Required: Gemini Pro (1M context) — currently unavailable
Your max available: 200K tokens

Options:
1. Wait and retry — Gemini may be temporarily down
2. Reduce input size — Remove unnecessary content to fit within 200K tokens
3. Split into chunks — I can process your input sequentially in smaller pieces

Would you like me to help split this into manageable chunks?

动态模型发现

路由器在运行时自动检测可用的服务提供商：

1. Check configured auth profiles
2. Build available model list from authenticated providers
3. Construct routing table using ONLY available models
4. If preferred model unavailable, use best available alternative

示例：如果仅配置了 Anthropic 和 Google：

代码任务 → Opus（Anthropic 可用 ✓）
实时任务 → ⚠️ 无 Grok → 回退至 Opus 并提醒用户
长文档 → Gemini Pro（Google 可用 ✓）

成本优化

当复杂度为低时，路由器会考虑成本：

模型	成本层级	使用场景
Gemini Flash	$	简单任务，高并发量
Claude Haiku	$	简单任务，快速响应
Claude Sonnet	$$	中等复杂度
Grok 2	$$	只需实时需求
GPT-5	$$	通用后备方案
Gemini Pro	$$$	长上下文需求
Claude Opus	$$$$	复杂/关键任务

规则：绝不使用Opus（$$$）处理Flash（$）能胜任的任务。

用户控制

显示路由决策

添加[显示路由]至任意消息：

[show routing] What's the weather in NYC?

输出包含：

[Routed → xai/grok-2-latest | Reason: REALTIME intent detected | Fallback: none available]

强制指定模型

显式覆盖：

"使用grok: ..." → 强制使用Grok
"使用claude: ..." → 强制使用Opus
"使用gemini: ..." → 强制使用Gemini Pro
"使用flash: ..." → 强制使用Gemini Flash
"use gpt: ..." → 强制使用GPT-5

检查路由器状态

询问："router status" 或 "/router" 以查看：

可用提供商
已配置模型
当前路由表
最近的路由决策

实现说明

关于代理实现

处理请求时：

1. DETECT available models (check auth profiles)
2. CLASSIFY intent (code/analysis/creative/realtime/general)
3. ESTIMATE complexity (simple/medium/complex)
4. CHECK special cases (context size, vision, explicit override)
5. FILTER by cost tier based on complexity ← BEFORE model selection
6. SELECT model from filtered pool using routing matrix
7. VERIFY model available, else use fallback chain (also cost-filtered)
8. EXECUTE request with selected model
9. IF failure, try next in fallback chain
10. LOG routing decision (for debugging)

成本感知路由流程（关键顺序）

def route_with_fallback(request):
    """
    Main routing function with CORRECT execution order.
    Cost filtering MUST happen BEFORE routing table lookup.
    """
    
    # Step 1: Discover available models
    available_models = discover_providers()
    
    # Step 2: Classify intent
    intent = classify_intent(request)
    
    # Step 3: Estimate complexity
    complexity = estimate_complexity(request)
    
    # Step 4: Check special-case overrides (these bypass cost filtering)
    if user_override := get_user_model_override(request):
        return execute_with_fallback(user_override, [])  # No cost filter for explicit override
    
    if token_count > 128_000:
        return handle_long_context(token_count, available_models)  # Special handling
    
    if needs_realtime(request):
        return execute_with_fallback("grok-2", ["grok-3"])  # Realtime bypasses cost
    
    # ┌─────────────────────────────────────────────────────────────┐
    # │  STEP 5: FILTER BY COST TIER — THIS MUST COME FIRST!       │
    # │                                                             │
    # │  Cost filtering happens BEFORE the routing table lookup,   │
    # │  NOT after. This ensures "what's 2+2?" never considers     │
    # │  Opus even momentarily.                                    │
    # └─────────────────────────────────────────────────────────────┘
    
    allowed_tiers = get_allowed_tiers(complexity)
    # SIMPLE  → ["$"]
    # MEDIUM  → ["$", "$$"]
    # COMPLEX → ["$", "$$", "$$$"]
    
    cost_filtered_models = {
        model: meta for model, meta in available_models.items()
        if COST_TIERS.get(model) in allowed_tiers
    }
    
    # Step 6: NOW select from cost-filtered pool using routing preferences
    preferences = ROUTING_PREFERENCES.get((intent, complexity), [])
    
    for model in preferences:
        if model in cost_filtered_models:  # Only consider cost-appropriate models
            selected_model = model
            break
    else:
        # No preferred model in cost-filtered pool — use cheapest available
        selected_model = select_cheapest(cost_filtered_models)
    
    # Step 7: Build cost-filtered fallback chain
    task_type = get_task_type(intent, complexity)
    full_chain = MASTER_FALLBACK_CHAINS.get(task_type, [])
    filtered_chain = [m for m in full_chain if m in cost_filtered_models and m != selected_model]
    
    # Step 8-10: Execute with fallback + logging
    return execute_with_fallback(selected_model, filtered_chain)


def get_allowed_tiers(complexity: str) -> list[str]:
    """Return allowed cost tiers for a given complexity level."""
    return {
        "SIMPLE":  ["$"],                      # Budget only — no exceptions
        "MEDIUM":  ["$", "$$"],                # Budget + standard
        "COMPLEX": ["$", "$$", "$$$", "$$$$"], # All tiers — complex tasks deserve the best
    }.get(complexity, ["$", "$$"])


# Example flow for "what's 2+2?":
#
# 1. available_models = {opus, sonnet, haiku, flash, grok-2, ...}
# 2. intent = GENERAL
# 3. complexity = SIMPLE
# 4. (no special cases)
# 5. allowed_tiers = ["$"]  ← SIMPLE means $ only
#    cost_filtered_models = {haiku, flash, grok-2}  ← Opus/Sonnet EXCLUDED
# 6. preferences for (GENERAL, SIMPLE) = [flash, haiku, grok-2, sonnet]
#    first match in cost_filtered = flash ✓
# 7. fallback_chain = [haiku, grok-2]  ← Also cost-filtered
# 8. execute with flash
#
# Result: Opus is NEVER considered, not even momentarily.

成本优化：两种方法

┌─────────────────────────────────────────────────────────────────┐
│           COST OPTIMIZATION IMPLEMENTATION OPTIONS               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  APPROACH 1: Explicit filter_by_cost() (shown above)            │
│  ─────────────────────────────────────────────────────────────  │
│  • Calls get_allowed_tiers(complexity) explicitly               │
│  • Filters available_models BEFORE routing table lookup         │
│  • Most defensive — impossible to route wrong tier              │
│  • Recommended for security-critical deployments                │
│                                                                  │
│  APPROACH 2: Preference ordering (implicit)                     │
│  ─────────────────────────────────────────────────────────────  │
│  • ROUTING_PREFERENCES lists cheapest capable models first      │
│  • For SIMPLE tasks: [flash, haiku, grok-2, sonnet]            │
│  • First available match wins → naturally picks cheapest        │
│  • Simpler code, relies on correct preference ordering          │
│                                                                  │
│  This implementation uses BOTH for defense-in-depth:            │
│  • Preference ordering provides first line of cost awareness    │
│  • Explicit filter_by_cost() guarantees tier enforcement        │
│                                                                  │
│  For alternative implementations that rely solely on            │
│  preference ordering, see references/models.md for the          │
│  filter_by_cost() function if explicit enforcement is needed.   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

使用不同模型生成

使用 sessions_spawn 进行模型路由：

sessions_spawn(
  task: "user's request",
  model: "selected/model-id",
  label: "task-type-query"
)

安全性

切勿将敏感数据发送给不可信的模型
API密钥仅通过环境/认证配置文件处理
参见references/security.md以获取完整的安全指南

模型详情

参见参考资料/模型.md了解详细功能和定价信息。

免责申明

部分文章来自各大搜索引擎，如有侵权，请与我联系删除。

打赏

文章底部电脑广告

手机广告位-内容正文底部

标签

上一篇：Finance Tracker技能使用说明下一篇：Publisher技能使用说明