Octo Router logoOctoRouter
Routing

Semantic Routing

Route requests based on the "meaning" or keywords of a prompt.

Semantic Routing allows you to route requests based on the intent of the user's prompt rather than just a model name. Octo Router supports two engines for this classification: Embedding and Keyword.

Engine Types

1. Embedding Engine (Smart)

The embedding engine uses a local ONNX model to convert prompts into mathematical vectors and compares them against intent groups using cosine similarity.

Best for: Complex intents, natural language, and high accuracy.

routing:
  policies:
    semantic:
      enabled: true
      engine: "embedding"
      threshold: 0.45      
      model_path: "assets/models/embedding.onnx"
      groups:
        - name: "coding"
          intent_description: "Programming or architecture questions."
          examples:
            - "how do I fix a null pointer exception?"
            - "write a binary search in python"
          allow_providers: ["openai", "anthropic"]

2. Keyword Engine (Fast)

The keyword engine performs simple string matching against defined keywords in the prompt. It's extremely fast and requires no local model.

Best for: Routing specific commands, simple tags, or when minimizing CPU/Memory usage.

routing:
  policies:
    semantic:
      enabled: true
      engine: "keyword"
      groups:
        - name: "image-gen"
          intent_keywords: ["generate", "draw", "image", "paint"]
          allow_providers: ["openai"]
        
        - name: "fast-chat"
          intent_keywords: ["hello", "hi", "hey"]
          allow_providers: ["gemini"]

How it Works

Regardless of the engine, the flow remains the same:

  1. Analysis: The prompt is processed by either the Embedding or Keyword engine.
  2. Filtering: If a match is found, the router restricts candidates to the allow_providers defined for that group.
  3. Routing: The final selection strategy (Weighted, Cost, etc.) is applied to the filtered pool.

Which engine should I use?

FeatureEmbedding EngineKeyword Engine
LogicNeural (Local Model)String Matching
AccuracyHigh (Context Aware)Simple (Exact/Partial)
ResourcesRequires CPU/Memory for ONNXNear Zero
Use CaseIntent-based routingTag/Keyword-based routing

Defining Intent Groups

Intent groups are the core logic of semantic routing. They define how prompts are classified and which providers are allowed for each category.

Group Configuration Reference

FieldEngineDescription
nameBothA unique identifier for the group (e.g., coding, image-gen).
intent_descriptionEmbeddingA natural language summary of what this group covers. Used to prime the embedding engine.
examplesEmbeddingFew-shot phrases that represent the intent. More examples lead to better matching.
intent_keywordsKeywordA list of specific words or tags that trigger the group when found in the prompt.
allow_providersBothOptional. Restricts the router to these specific providers when this group is matched.
required_capabilityBothOptional. Filters providers based on their catalog capabilities (e.g., vision, coding).
use_system_defaultEmbeddingIf true, appends built-in examples from Octo Router's core library (see below).

Best Practices for Better Matching

To ensure the Embedding Engine is accurate, follow these guidelines:

  1. Be Descriptive: In intent_description, use clear, distinct language. Instead of "coding", use "Requests involving writing, debugging, or explaining software code and architecture."
  2. Quality Samples: Provide 5-10 examples that cover different ways a user might ask for the same thing.
  3. Avoid Overlap: If two groups (e.g., support and billing) have very similar examples, the router may struggle to distinguish them. Keep them distinct.
  4. Keyword Precision: Use the Keyword engine for unambiguous commands (like /image) and the Embedding engine for conversational intent.

Extending System Defaults

Octo Router comes with pre-tuned intent examples for common categories like coding and fast-chat. You can inherit these examples to save time while still defining your own providers and capabilities.

Global Toggle: extend_default

Set extend_default: true at the policy level to enable the inheritance mechanism.

Group Toggle: use_system_default

Set use_system_default: true within a specific group to pull in built-in examples.

Example: Advanced Coding Setup In this example, we inherit the system's "coding" examples (which include various programming queries) but restrict the providers to OpenAI and Anthropic.

routing:
  policies:
    semantic:
      enabled: true
      engine: "embedding"
      extend_default: true # Enable the extension system
      groups:
        - name: "coding"
          use_system_default: true # Inherit pre-tuned coding examples
          allow_providers: ["openai", "anthropic"]
          required_capability: "coding" 

Built-in Groups

The following groups are supported for inheritance:

  • fast-chat: Greetings, basic math, and general knowledge.

Technical Configuration

For fine-grained control over the Embedding Engine, you can tune the following technical parameters:

OptionDefaultDescription
threshold0.5The minimum cosine similarity score (0.0 to 1.0) required to trigger a match. Higher values require the prompt to be closer to your examples.
default_groupN/AThe group to use if no other group meets the threshold. This acts as a global fallback for semantic routing.
model_pathN/AAbsolute or relative path to the local ONNX embedding model (e.g., MiniLM).
shared_lib_pathN/AOptional. Path to the ONNX Runtime library (.so, .dylib, or .dll). Useful if the library is not in your system's standard search path.

Tuning the Threshold

  • Precision (0.7+): Only route when you are very certain of the intent. Leads to more "default group" fallbacks.
  • Recall (0.3 - 0.5): More aggressive routing. Useful if your intent groups are very distinct.

Troubleshooting

If using the Embedding engine, ensure your docker-compose.yml has the correct ONNX_LIB_PATH set (e.g., /usr/local/lib/libonnxruntime.so). For the Keyword engine, no extra setup is required.

On this page