Semantic Routing
Route requests based on the "meaning" or keywords of a prompt.
Semantic Routing allows you to route requests based on the intent of the user's prompt rather than just a model name. Octo Router supports two engines for this classification: Embedding and Keyword.
Engine Types
1. Embedding Engine (Smart)
The embedding engine uses a local ONNX model to convert prompts into mathematical vectors and compares them against intent groups using cosine similarity.
Best for: Complex intents, natural language, and high accuracy.
routing:
policies:
semantic:
enabled: true
engine: "embedding"
threshold: 0.45
model_path: "assets/models/embedding.onnx"
groups:
- name: "coding"
intent_description: "Programming or architecture questions."
examples:
- "how do I fix a null pointer exception?"
- "write a binary search in python"
allow_providers: ["openai", "anthropic"]2. Keyword Engine (Fast)
The keyword engine performs simple string matching against defined keywords in the prompt. It's extremely fast and requires no local model.
Best for: Routing specific commands, simple tags, or when minimizing CPU/Memory usage.
routing:
policies:
semantic:
enabled: true
engine: "keyword"
groups:
- name: "image-gen"
intent_keywords: ["generate", "draw", "image", "paint"]
allow_providers: ["openai"]
- name: "fast-chat"
intent_keywords: ["hello", "hi", "hey"]
allow_providers: ["gemini"]How it Works
Regardless of the engine, the flow remains the same:
- Analysis: The prompt is processed by either the Embedding or Keyword engine.
- Filtering: If a match is found, the router restricts candidates to the
allow_providersdefined for that group. - Routing: The final selection strategy (Weighted, Cost, etc.) is applied to the filtered pool.
Which engine should I use?
| Feature | Embedding Engine | Keyword Engine |
|---|---|---|
| Logic | Neural (Local Model) | String Matching |
| Accuracy | High (Context Aware) | Simple (Exact/Partial) |
| Resources | Requires CPU/Memory for ONNX | Near Zero |
| Use Case | Intent-based routing | Tag/Keyword-based routing |
Defining Intent Groups
Intent groups are the core logic of semantic routing. They define how prompts are classified and which providers are allowed for each category.
Group Configuration Reference
| Field | Engine | Description |
|---|---|---|
name | Both | A unique identifier for the group (e.g., coding, image-gen). |
intent_description | Embedding | A natural language summary of what this group covers. Used to prime the embedding engine. |
examples | Embedding | Few-shot phrases that represent the intent. More examples lead to better matching. |
intent_keywords | Keyword | A list of specific words or tags that trigger the group when found in the prompt. |
allow_providers | Both | Optional. Restricts the router to these specific providers when this group is matched. |
required_capability | Both | Optional. Filters providers based on their catalog capabilities (e.g., vision, coding). |
use_system_default | Embedding | If true, appends built-in examples from Octo Router's core library (see below). |
Best Practices for Better Matching
To ensure the Embedding Engine is accurate, follow these guidelines:
- Be Descriptive: In
intent_description, use clear, distinct language. Instead of "coding", use "Requests involving writing, debugging, or explaining software code and architecture." - Quality Samples: Provide 5-10
examplesthat cover different ways a user might ask for the same thing. - Avoid Overlap: If two groups (e.g.,
supportandbilling) have very similar examples, the router may struggle to distinguish them. Keep them distinct. - Keyword Precision: Use the Keyword engine for unambiguous commands (like
/image) and the Embedding engine for conversational intent.
Extending System Defaults
Octo Router comes with pre-tuned intent examples for common categories like coding and fast-chat. You can inherit these examples to save time while still defining your own providers and capabilities.
Global Toggle: extend_default
Set extend_default: true at the policy level to enable the inheritance mechanism.
Group Toggle: use_system_default
Set use_system_default: true within a specific group to pull in built-in examples.
Example: Advanced Coding Setup In this example, we inherit the system's "coding" examples (which include various programming queries) but restrict the providers to OpenAI and Anthropic.
routing:
policies:
semantic:
enabled: true
engine: "embedding"
extend_default: true # Enable the extension system
groups:
- name: "coding"
use_system_default: true # Inherit pre-tuned coding examples
allow_providers: ["openai", "anthropic"]
required_capability: "coding" Built-in Groups
The following groups are supported for inheritance:
fast-chat: Greetings, basic math, and general knowledge.
Technical Configuration
For fine-grained control over the Embedding Engine, you can tune the following technical parameters:
| Option | Default | Description |
|---|---|---|
threshold | 0.5 | The minimum cosine similarity score (0.0 to 1.0) required to trigger a match. Higher values require the prompt to be closer to your examples. |
default_group | N/A | The group to use if no other group meets the threshold. This acts as a global fallback for semantic routing. |
model_path | N/A | Absolute or relative path to the local ONNX embedding model (e.g., MiniLM). |
shared_lib_path | N/A | Optional. Path to the ONNX Runtime library (.so, .dylib, or .dll). Useful if the library is not in your system's standard search path. |
Tuning the Threshold
- Precision (0.7+): Only route when you are very certain of the intent. Leads to more "default group" fallbacks.
- Recall (0.3 - 0.5): More aggressive routing. Useful if your intent groups are very distinct.
Troubleshooting
If using the Embedding engine, ensure your docker-compose.yml has the correct ONNX_LIB_PATH set (e.g., /usr/local/lib/libonnxruntime.so). For the Keyword engine, no extra setup is required.