How to add fallback models
Use multiple deployments for one model to improve availability and manage provider limits.
How to add fallback models
Use multiple deployments for one model to improve availability and manage provider limits.
Use fallback models to give one custom model multiple deployments. This improves availability when one provider endpoint, region, or API key reaches a limit or becomes unavailable.
How fallback models work
A fallback model is another deployment for the same model. You add it in Configure deployment by clicking Add another deployment.
Use fallback models when you want the selected model to keep answering through another endpoint, region, or API key. The model stays the same, but Odeus has another deployment to try if one deployment is limited or unavailable.
Each deployment has its own API Key, Model Name / ID, optional Tokens per minute limit, and enabled toggle. Odeus routes requests across the enabled deployments for that model.
How to add fallback models
For the full setup flow, see How to add your own models.
- Go to Workspace Settings -> Models and click Add custom model.
- Select the model.
- Complete the model configuration until you reach Configure deployment.
- Click Add another deployment.
<img src="https://mintcdn.com/odeus-34/QuWFEjiJvL67287m/images/fallback_model_setup.png?fit=max&auto=format&n=QuWFEjiJvL67287m&q=85&s=a1f15ae765e00173151b1d3c94a24ccc" alt="Fallback model deployment setup with multiple deployments configured" style={{borderRadius: '6px'}} width="1616" height="1008" data-path="images/fallback_model_setup.png" />
- Add the API Key and Model Name / ID for the fallback deployment.
<img src="https://mintcdn.com/odeus-34/QuWFEjiJvL67287m/images/fallback_deployment2.png?fit=max&auto=format&n=QuWFEjiJvL67287m&q=85&s=91f1f3afceb048bc4fe9fb78c76989a6" alt="Fallback deployment configuration fields for API key and model name" style={{borderRadius: '6px'}} width="1724" height="1272" data-path="images/fallback_deployment2.png" />
- Optional: set a Tokens per minute limit for the fallback deployment.
- Click Test & continue, then click Save model after the test passes.
How requests are routed
Odeus routes requests across enabled deployments for the selected model. Disabled deployments are not used.
- Set a Tokens per minute limit on every deployment when deployments have different capacity. Deployments with higher limits receive more traffic.
- Leave Tokens per minute limit empty when deployments should share traffic evenly. Odeus rotates requests across available deployments.
- If a deployment reaches its limit or becomes unavailable, Odeus tries another enabled deployment.