Octo Router logoOctoRouter

Introduction

Overview of the Octo Router LLM Gateway.

Welcome to the Octo Router documentation. Octo Router is a high-performance LLM gateway designed to optimize cost, latency, and resilience for production-grade AI applications.

What is Octo Router?

It acts as a middleware between your application and various LLM providers (OpenAI, Anthropic, Google Gemini). Instead of hardcoding model names, your application sends requests to Octo Router, which then selects the best provider based on your policies.

Key Features

  • Semantic Routing: Classify user intent using local ONNX embeddings without external API calls.
  • Granular Cost Management: Set strict budgets per provider and let the router automatically skip overspent endpoints to protect your margins.
  • Redis-Backed State: Share usage tracking, rate limits, and budgets across multiple router instances.
  • Zero-Downtime Reload: Update your configuration via API without dropping connections.
  • Resilience: Open circuit breakers for failing providers and automatically fallback to healthy ones.
  • Fallback configuration: Configure provider fallbacks

Getting Started

Ready to optimize your LLM stack? Follow the Guided Setup to get up and running in minutes.

On this page