Available Models

Subscriptions

Always-On, LoRA, and Embeddings models are included in every subscription.

Always-On Models

These models are included in all subscriptions. Per-token pricing is also available with usage-based billing.

Model Details

More information about each model is available via the /openai/v1/models endpoint.

Model	Provider	Context length	Status
hf:MiniMaxAI/MiniMax-M2.5	Synthetic	187k tokens	✓ Included
hf:moonshotai/Kimi-K2.5	Synthetic	256k tokens	✓ Included
hf:nvidia/Kimi-K2.5-NVFP4	Synthetic	256k tokens	✓ Included
hf:nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4	Synthetic	256k tokens	✓ Included
hf:zai-org/GLM-4.7	Synthetic	198k tokens	✓ Included
hf:zai-org/GLM-4.7-Flash	Synthetic	192k tokens	✓ Included
hf:zai-org/GLM-5	Synthetic	192k tokens	✓ Included
hf:deepseek-ai/DeepSeek-V3.2	Fireworks	159k tokens	✓ Included
hf:MiniMaxAI/MiniMax-M2.1	Fireworks	192k tokens	✓ Included
hf:moonshotai/Kimi-K2-Instruct-0905	Fireworks	256k tokens	✓ Included
hf:moonshotai/Kimi-K2-Thinking	Fireworks	256k tokens	✓ Included
hf:openai/gpt-oss-120b	Fireworks	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-R1-0528	Together AI	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-V3	Together AI	128k tokens	✓ Included
hf:meta-llama/Llama-3.3-70B-Instruct	Together AI	128k tokens	✓ Included
hf:Qwen/Qwen3-235B-A22B-Thinking-2507	Together AI	256k tokens	✓ Included
hf:Qwen/Qwen3-Coder-480B-A35B-Instruct	Together AI	256k tokens	✓ Included
hf:Qwen/Qwen3.5-397B-A17B	Together AI	256k tokens	✓ Included

LoRA Models

What's a LoRA?

Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.

We support LoRAs for the following base models:

Model	Provider	Context length	Status
meta-llama/Llama-3.2-1B-Instruct	Together AI	128k tokens	✓ Included
meta-llama/Llama-3.2-3B-Instruct	Together AI	128k tokens	✓ Included
meta-llama/Meta-Llama-3.1-8B-Instruct	Together AI	128k tokens	✓ Included
meta-llama/Meta-Llama-3.1-70B-Instruct	Together AI	128k tokens	✓ Included

Embedding Models

Embedding models convert text into numerical vectors for search, clustering, and other applications.

There's no additional charge for using embeddings, and embeddings requests don't count against your subscription rate limit.

Model	Provider	Context length	Status
hf:nomic-ai/nomic-embed-text-v1.5	Fireworks	8k tokens	✓ Included

Getting Started

Ready to start using our models? Check out:

Getting Started Guide - Your first API call
chat/completions - Most popular endpoint for conversations

Need help choosing the right model? Join our Discord community for recommendations!