AI Model Library

Browse our curated collection of AI models to meet your various needs. From text generation to image creation, a one-stop solution.

Found 142 models
Free
De
DeepSeek: DeepSeek V3 0324 (free)
ID:deepseek/deepseek-chat-v3-0324:free

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks.

DeepSeek
64K
De
DeepSeek: DeepSeek V3 0324
ID:deepseek/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks.

DeepSeek
64K
$0.27/M
$1.1/M
Free
Go
Google: Gemma 3 27B (free)
ID:google/gemma-3-27b-it:free

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it)

Google
128K
Go
Google: Gemma 3 27B
ID:google/gemma-3-27b-it

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it)

Google
128K
$0.3/M
$0.5/M
Op
dall-e-3
ID:dall-e-3

dall-e-3

OpenAI
$0.001/request
To
black-forest-labs/FLUX.1-redux
ID:black-forest-labs/flux.1-redux

FLUX.1 Redux [dev] is an adapter for all FLUX.1 base models for image variation generation. Given an input image, FLUX.1 Redux can reproduce the image with slight variation, allowing to refine a given image. It naturally integrates into more complex workflows unlocking image restyling.

Together
$0.025/request
Op
tts-1-hd
ID:tts-1-hd

No description available

OpenAI
$300/M
Free
To
black-forest-labs/FLUX.1-schnell-Free
ID:black-forest-labs/flux.1-schnell-free

`FLUX.1 [schnell]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/). # Key Features 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. 2. Trained using latent adversarial diffusion distillation, `FLUX.1 [schnell]` can generate high-quality images in only 1 to 4 steps. 3. Released under the `apache-2.0` licence, the model can be used for personal, scientific, and commercial purposes. # Usage We provide a reference implementation of `FLUX.1 [schnell]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux). Developers and creatives looking to build on top of `FLUX.1 [schnell]` are encouraged to use this as a starting point. ## API Endpoints The FLUX.1 models are also available via API from the following sources - [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`) - [replicate.com](https://replicate.com/collections/flux) - [fal.ai](https://fal.ai/models/fal-ai/flux/schnell) - [mystic.ai](https://www.mystic.ai/black-forest-labs/flux1-schnell) ## ComfyUI `FLUX.1 [schnell]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow. ## Diffusers To use `FLUX.1 [schnell]` with the 🧨 diffusers python library, first install or upgrade diffusers ```shell pip install -U diffusers ``` Then you can use `FluxPipeline` to run the model ```python import torch from diffusers import FluxPipeline pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16) pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power prompt = "A cat holding a sign that says hello world" image = pipe( prompt, guidance_scale=0.0, num_inference_steps=4, max_sequence_length=256, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("flux-schnell.png") ``` To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation --- # Limitations - This model is not intended or able to provide factual information. - As a statistical model this checkpoint might amplify existing societal biases. - The model may fail to generate output that matches the prompts. - Prompt following is heavily influenced by the prompting-style. # Out-of-Scope Use The model and its derivatives may not be used - In any way that violates any applicable national, federal, state, local or international law or regulation. - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content. - To generate or disseminate verifiably false information and/or content with the purpose of harming others. - To generate or disseminate personal identifiable information that can be used to harm an individual. - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals. - To create non-consensual nudity or illegal pornographic content. - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation. - Generating or facilitating large-scale disinformation campaigns.

Together
Go
Google: Gemini 2.0 Flash Lite
ID:google/gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5), all at extremely economical token prices.

Google
1M
$0.075/M
$0.3/M
Qw
Qwen: Qwen2.5 VL 72B Instruct
ID:qwen/qwen2.5-vl-72b-instruct

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

Qwen
131K
$0.7/M
$0.7/M
Me
Meta: Llama 3 70B (Base)
ID:meta-llama/llama-3-70b

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 70B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Meta Llama
8K
$0.59/M
$0.79/M
Me
Meta: Llama 3 8B (Base)
ID:meta-llama/llama-3-8b

Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This is the base 8B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Meta Llama
8K
$0.05/M
$0.08/M
An
Anthropic: Claude 3.7 Sonnet
ID:anthropic/claude-3.7-sonnet

Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. It introduces a hybrid reasoning approach, allowing users to choose between rapid responses and extended, step-by-step processing for complex tasks. The model demonstrates notable improvements in coding, particularly in front-end development and full-stack updates, and excels in agentic workflows, where it can autonomously navigate multi-step processes. Claude 3.7 Sonnet maintains performance parity with its predecessor in standard mode while offering an extended reasoning mode for enhanced accuracy in math, coding, and instruction-following tasks. Read more at the [blog post here](https://www.anthropic.com/news/claude-3-7-sonnet)

Anthropic
200K
$3/M
$15/M
Pe
Perplexity: R1 1776
ID:perplexity/r1-1776

Note: As this model does not return <think> tags, thoughts will be streamed by default directly to the `content` field. R1 1776 is a version of DeepSeek-R1 that has been post-trained to remove censorship constraints related to topics restricted by the Chinese government. The model retains its original reasoning capabilities while providing direct responses to a wider range of queries. R1 1776 is an offline chat model that does not use the perplexity search subsystem. The model was tested on a multilingual dataset of over 1,000 examples covering sensitive topics to measure its likelihood of refusal or overly filtered responses. [Evaluation Results](https://cdn-uploads.huggingface.co/production/uploads/675c8332d01f593dc90817f5/GiN2VqC5hawUgAGJ6oHla.png) Its performance on math and reasoning benchmarks remains similar to the base R1 model. [Reasoning Performance](https://cdn-uploads.huggingface.co/production/uploads/675c8332d01f593dc90817f5/n4Z9Byqp2S7sKUvCvI40R.png) Read more on the [Blog Post](https://perplexity.ai/hub/blog/open-sourcing-r1-1776)

Perplexity
128K
$2/M
$8/M
20%
Op
OpenAI: o3 Mini High
ID:openai/o3-mini-high

OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini) with reasoning_effort set to high. o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities. The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.

OpenAI
200K
$1.1/M
$4.4/M
20%
De
DeepSeek: R1
ID:deepseek/deepseek-r1

# DeepSeek-R1 ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the [Usage Recommendation](#usage-recommendations) section.** <p align="center"> <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-R1/raw/main/figures/benchmark.jpg"> </p> ## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1. <div align="center"> | Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 | </div> ### Distilled Model Evaluation <div align="center"> | Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 | </div> ## 4. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com), and switch on the button "DeepThink" We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/) ## 5. How to Run Locally ### DeepSeek-R1 Models Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm): ```shell vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager ``` You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang) ```bash python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2 ``` ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}." 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "\<think\>\n\n\</think\>") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "\<think\>\n" at the beginning of every output.**

DeepSeek
164K
$3/M
$8/M
Go
Google: Gemini Flash 2.0
ID:google/gemini-2.0-flash-001

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Google
1M
$0.1/M
$0.4/M
De
DeepSeek: DeepSeek R1 Distill Llama 70B
ID:deepseek/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek
131K
$0.23/M
$0.69/M
Ri
Sao10K: Llama 3 8B Lunaris
ID:sao10k/l3-lunaris-8b

Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning. For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.

Rifx.Online
8K
$0.03/M
$0.06/M
Ri
Inflatebot: Mag Mell R1 12B
ID:inflatebot/mn-mag-mell-r1

Mag Mell is a merge of pre-trained language models created using mergekit, based on [Mistral Nemo](/mistralai/mistral-nemo). It is a great roleplay and storytelling model which combines the best parts of many other models to be a general purpose solution for many usecases. Intended to be a general purpose "Best of Nemo" model for any fictional, creative use case. Mag Mell is composed of 3 intermediate parts: - Hero (RP, trope coverage) - Monk (Intelligence, groundedness) - Deity (Prose, flair)

Rifx.Online
16K
$0.9/M
$0.9/M
Me
Meta: Llama 3.3 70B Instruct
ID:meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)

Meta Llama
131K
$0.13/M
$0.4/M
Op
text-embedding-3-small
ID:text-embedding-3-small

text-embedding-3-small is OpenAI's cost-effective text embedding model, serving as the lightweight version in the text-embedding-3 series. This model maintains good performance while offering a more economical pricing option. ## Key Features - **Cost-Effective**: Approximately 1/6 the price of text-embedding-3-large - **Multilingual Support**: Supports embeddings for 100+ languages - **Context Length**: Handles up to 8192 tokens of input - **Embedding Dimension**: Fixed 1536-dimensional embedding vectors ## Performance - Outperforms the older text-embedding-ada-002 on most tasks - Slightly lower performance than text-embedding-3-large, but sufficient for most applications ## Use Cases 1. Text Similarity Matching 2. Information Retrieval 3. Text Classification 4. Small-scale RAG Applications 5. Personal Projects or Budget-conscious Commercial Applications ## Pricing - Input: $0.00002 / 1K tokens - Output: Free ## Usage Recommendations - Ideal for projects with limited budgets requiring quality embeddings - Recommended for testing and development - Consider upgrading to text-embedding-3-large if highest embedding quality is required text-embedding-3-small offers an excellent balance between performance and cost, making it particularly suitable for startups and budget-sensitive applications.

OpenAI
$0.02/M
Am
Amazon: Nova Lite 1.0
ID:amazon/nova-lite-v1

Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy. With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.

Amazon
300K
$0.06/M
$0.24/M
Un
Toppy M 7B
ID:undi95/toppy-m-7b

A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit. List of merged models: - NousResearch/Nous-Capybara-7B-V1.9 - [HuggingFaceH4/zephyr-7b-beta](/huggingfaceh4/zephyr-7b-beta) - lemonilia/AshhLimaRP-Mistral-7B - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b - Undi95/Mistral-pippa-sharegpt-7b-qlora #merge #uncensored

Undi95
4K
$0.07/M
$0.07/M
Un
ReMM SLERP 13B
ID:undi95/remm-slerp-l2-13b

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Undi95
4K
$1.125/M
$1.125/M
Mi
Mistral: Pixtral 12B
ID:mistralai/pixtral-12b

The first image to text model from Mistral AI. Its weight was launched via torrent per their tradition: https://x.com/mistralai/status/1833758285167722836

MistralAI
4K
$0.1/M
$0.1/M
Mi
Phi-3.5 Mini 128K Instruct
ID:microsoft/phi-3.5-mini-128k-instruct

Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/microsoft/phi-3-mini-128k-instruct). The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.

Microsoft Azure
128K
$0.1/M
$0.1/M
Op
OpenAI: ChatGPT-4o
ID:openai/chatgpt-4o-latest

Dynamic model continuously updated to the current version of [GPT-4o](/openai/gpt-4o) in ChatGPT. Intended for research and evaluation. Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.

OpenAI
128K
$5/M
$15/M
An
Anthropic: Claude 3.5 Sonnet (2024-06-20)
ID:anthropic/claude-3.5-sonnet-20240620

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal

Anthropic
200K
$3/M
$15/M
Me
Meta: Llama 3.2 1B Instruct
ID:meta-llama/llama-3.2-1b-instruct

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
$0.01/M
$0.02/M
Qw
Qwen: QwQ 32B Preview
ID:qwen/qwq-32b-preview

## Introduction **QwQ-32B-Preview** is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. As a preview release, it demonstrates promising analytical abilities while having several important limitations: 1. **Language Mixing and Code-Switching**: The model may mix languages or switch between them unexpectedly, affecting response clarity. 2. **Recursive Reasoning Loops**: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer. 3. **Safety and Ethical Considerations**: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it. 4. **Performance and Benchmark Limitations**: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding. **Specification**: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 32.5B - Number of Paramaters (Non-Embedding): 31.0B - Number of Layers: 64 - Number of Attention Heads (GQA): 40 for Q and 8 for KV - Context Length: Full 32,768 tokens For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwq-32b-preview/). You can also check Qwen2.5 [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/). ## Requirements The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: ``` KeyError: 'qwen2' ``` ## Quickstart Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Qwen/QwQ-32B-Preview" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "How many r in strawberry." messages = [ {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Citation If you find our work helpful, feel free to give us a cite. ``` @misc{qwq-32b-preview, title = {QwQ: Reflect Deeply on the Boundaries of the Unknown}, url = {https://qwenlm.github.io/blog/qwq-32b-preview/}, author = {Qwen Team}, month = {November}, year = {2024} } @article{qwen2, title={Qwen2 Technical Report}, author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan}, journal={arXiv preprint arXiv:2407.10671}, year={2024} } ```

Qwen
33K
$0.15/M
$0.6/M
Go
Google: Gemini 2.0 Flash Experimental
ID:google/gemini-2.0-flash-exp

Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to [Gemini 1.5 Flash](google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini 1.5 Pro](google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Google
1M
$0.2/M
$0.6/M
Free
Go
Google: Gemini 2.0 Flash Experimental (free)
ID:google/gemini-2.0-flash-exp:free

Gemini 2.0 Flash offers a significantly faster time to first token (TTFT) compared to [Gemini 1.5 Flash](google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini 1.5 Pro](google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Google
1M
Free
De
DeepSeek: R1 (free)
ID:deepseek/deepseek-r1:free

# DeepSeek-R1 ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the [Usage Recommendation](#usage-recommendations) section.** <p align="center"> <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-R1/raw/main/figures/benchmark.jpg"> </p> ## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1. <div align="center"> | Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 | </div> ### Distilled Model Evaluation <div align="center"> | Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 | </div> ## 4. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com), and switch on the button "DeepThink" We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/) ## 5. How to Run Locally ### DeepSeek-R1 Models Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm): ```shell vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager ``` You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang) ```bash python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2 ``` ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}." 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "\<think\>\n\n\</think\>") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "\<think\>\n" at the beginning of every output.**

DeepSeek
164K
Free
Me
Meta: Llama 3.3 70B Instruct (free)
ID:meta-llama/llama-3.3-70b-instruct:free

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)

Meta Llama
131K
Free
NV
NVIDIA: Llama 3.1 Nemotron 70B Instruct (free)
ID:nvidia/llama-3.1-nemotron-70b-instruct:free

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

NVIDIA
131K
Free
Qw
Qwen: Qwen2.5 VL 72B Instruct (free)
ID:qwen/qwen2.5-vl-72b-instruct:free

Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

Qwen
131K
Free
Go
Google: Gemini 2.0 Flash Thinking Experimental (free)
ID:google/gemini-2.0-flash-thinking-exp-1219:free

Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the [base Gemini 2.0 Flash model](/google/gemini-2.0-flash-exp).

Google
40K
Free
So
Rogue Rose 103B v0.2 (free)
ID:sophosympatheia/rogue-rose-103b-v0.2:free

Rogue Rose demonstrates strong capabilities in roleplaying and storytelling applications, potentially surpassing other models in the 103-120B parameter range. While it occasionally exhibits inconsistencies with scene logic, the overall interaction quality represents an advancement in natural language processing for creative applications. It is a 120-layer frankenmerge model combining two custom 70B architectures from November 2023, derived from the [xwin-stellarbright-erp-70b-v2](https://huggingface.co/sophosympatheia/xwin-stellarbright-erp-70b-v2) base.

Sophosympatheia
4K
Free
Go
Google: Gemini Pro 2.0 Experimental (free)
ID:google/gemini-2.0-pro-exp-02-05:free

Gemini 2.0 Pro Experimental is a bleeding-edge version of the Gemini 2.0 Pro model. Because it's currently experimental, it will be **heavily rate-limited** by Google. Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Google
2M
Free
Go
Google: Gemini Flash Lite 2.0 Preview (free)
ID:google/gemini-2.0-flash-lite-preview-02-05:free

Gemini Flash Lite 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5). Because it's currently in preview, it will be **heavily rate-limited** by Google. This model will move from free to paid pending a general rollout on February 24th, at $0.075 / $0.30 per million input / ouput tokens respectively.

Google
1M
Free
De
DeepSeek: R1 Distill Llama 70B (free)
ID:deepseek/deepseek-r1-distill-llama-70b:free

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek
131K
Free
Qw
Qwen: Qwen VL Plus (free)
ID:qwen/qwen-vl-plus:free

Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.

Qwen
8K
De
DeepSeek: R1 Distill Qwen 1.5B
ID:deepseek/deepseek-r1-distill-qwen-1.5b

DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks. Other benchmark results include: - AIME 2024 pass@1: 28.9 - AIME 2024 cons@64: 52.7 - MATH-500 pass@1: 83.9 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek
131K
$0.18/M
$0.18/M
De
DeepSeek: R1 Distill Llama 8B
ID:deepseek/deepseek-r1-distill-llama-8b

DeepSeek R1 Distill Llama 8B is a distilled large language model based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) - [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |

DeepSeek
32K
$0.04/M
$0.04/M
De
DeepSeek: R1 Distill Qwen 14B
ID:deepseek/deepseek-r1-distill-qwen-14b

DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 69.7 - MATH-500 pass@1: 93.9 - CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek
64K
$0.15/M
$0.15/M
De
DeepSeek: R1 Distill Qwen 32B
ID:deepseek/deepseek-r1-distill-qwen-32b

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 72.6 - MATH-500 pass@1: 94.3 - CodeForces Rating: 1691 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

DeepSeek
131K
$0.12/M
$0.18/M
De
DeepSeek: R1 (nitro)
ID:deepseek/deepseek-r1:nitro

# DeepSeek-R1 ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. **NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the [Usage Recommendation](#usage-recommendations) section.** <p align="center"> <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-R1/raw/main/figures/benchmark.jpg"> </p> ## 2. Model Summary --- **Post-Training: Large-Scale Reinforcement Learning on the Base Model** - We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. - We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model's reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models. --- **Distillation: Smaller Models Can Be Powerful Too** - We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. The open source DeepSeek-R1, as well as its API, will benefit the research community to distill better smaller models in the future. - Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community. ## 3. Evaluation Results ### DeepSeek-R1-Evaluation For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1. <div align="center"> | Category | Benchmark (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 | |----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------| | | Architecture | - | - | MoE | - | - | MoE | | | # Activated Params | - | - | 37B | - | - | 37B | | | # Total Params | - | - | 671B | - | - | 671B | | English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 | | | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** | | | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** | | | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** | | | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 | | | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 | | | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 | | | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** | | | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** | | | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** | | Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** | | | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 | | | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 | | | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | | | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 | | Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** | | | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** | | | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** | | Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** | | | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** | | | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 | </div> ### Distilled Model Evaluation <div align="center"> | Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating | |------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------| | GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 | | Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 | | o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** | | QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 | | DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 | | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 | | DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 | | DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 | | DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 | | DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 | </div> ## 4. Chat Website & API Platform You can chat with DeepSeek-R1 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com), and switch on the button "DeepThink" We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/) ## 5. How to Run Locally ### DeepSeek-R1 Models Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally. **NOTE: Hugging Face's Transformers has not been directly supported yet.** ### DeepSeek-R1-Distill Models DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models. For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm): ```shell vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager ``` You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang) ```bash python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2 ``` ### Usage Recommendations **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** 1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}." 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "\<think\>\n\n\</think\>") when responding to certain queries, which can adversely affect the model's performance. **To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "\<think\>\n" at the beginning of every output.**

DeepSeek
164K
$3/M
$8/M
Ri
MiniMax: MiniMax-01
ID:minimax/minimax-01

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens. The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model. To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2

Rifx.Online
1M
$0.2/M
$1.1/M
Mi
Microsoft: Phi 4
ID:microsoft/phi-4

[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. For more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)

Microsoft Azure
16K
$0.07/M
$0.14/M
30%
Op
OpenAI: o1-preview
ID:rifx/o1-preview

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1). Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.

OpenAI
128K
$15/M
$60/M
40%
Op
OpenAI: o1-mini
ID:rifx/o1-mini

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1). Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.

OpenAI
128K
$3/M
$12/M
De
DeepSeek V3
ID:deepseek/deepseek-chat-v3

## 1. Introduction We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. <p align="center"> <img width="80%" src="https://wsrv.nl/?url=https://huggingface.co/deepseek-ai/DeepSeek-V3/resolve/main/figures/benchmark.png"> </p> ## 2. Model Summary --- **Architecture: Innovative Load Balancing Strategy and Training Objective** - On top of the efficient architecture of DeepSeek-V2, we pioneer an auxiliary-loss-free strategy for load balancing, which minimizes the performance degradation that arises from encouraging load balancing. - We investigate a Multi-Token Prediction (MTP) objective and prove it beneficial to model performance. It can also be used for speculative decoding for inference acceleration. --- **Pre-Training: Towards Ultimate Training Efficiency** - We design an FP8 mixed precision training framework and, for the first time, validate the feasibility and effectiveness of FP8 training on an extremely large-scale model. - Through co-design of algorithms, frameworks, and hardware, we overcome the communication bottleneck in cross-node MoE training, nearly achieving full computation-communication overlap. This significantly enhances our training efficiency and reduces the training costs, enabling us to further scale up the model size without additional overhead. - At an economical cost of only 2.664M H800 GPU hours, we complete the pre-training of DeepSeek-V3 on 14.8T tokens, producing the currently strongest open-source base model. The subsequent training stages after pre-training require only 0.1M GPU hours. --- **Post-Training: Knowledge Distillation from DeepSeek-R1** - We introduce an innovative methodology to distill reasoning capabilities from the long-Chain-of-Thought (CoT) model, specifically from one of the DeepSeek R1 series models, into standard LLMs, particularly DeepSeek-V3. Our pipeline elegantly incorporates the verification and reflection patterns of R1 into DeepSeek-V3 and notably improves its reasoning performance. Meanwhile, we also maintain a control over the output style and length of DeepSeek-V3. --- ## 3. Model Downloads <div align="center"> | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-V3-Base | 671B | 37B | 128K | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3-Base) | | DeepSeek-V3 | 671B | 37B | 128K | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3) | </div> **NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.** To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: [How_to Run_Locally](#6-how-to-run-locally). For developers looking to dive deeper, we recommend exploring [README_WEIGHTS.md](./README_WEIGHTS.md) for details on the Main Model weights and the Multi-Token Prediction (MTP) Modules. Please note that MTP support is currently under active development within the community, and we welcome your contributions and feedback. ## 4. Evaluation Results ### Base Model #### Standard Benchmarks <div align="center"> | | Benchmark (Metric) | # Shots | DeepSeek-V2 | Qwen2.5 72B | LLaMA3.1 405B | DeepSeek-V3 | |---|-------------------|----------|--------|-------------|---------------|---------| | | Architecture | - | MoE | Dense | Dense | MoE | | | # Activated Params | - | 21B | 72B | 405B | 37B | | | # Total Params | - | 236B | 72B | 405B | 671B | | English | Pile-test (BPB) | - | 0.606 | 0.638 | **0.542** | 0.548 | | | BBH (EM) | 3-shot | 78.8 | 79.8 | 82.9 | **87.5** | | | MMLU (Acc.) | 5-shot | 78.4 | 85.0 | 84.4 | **87.1** | | | MMLU-Redux (Acc.) | 5-shot | 75.6 | 83.2 | 81.3 | **86.2** | | | MMLU-Pro (Acc.) | 5-shot | 51.4 | 58.3 | 52.8 | **64.4** | | | DROP (F1) | 3-shot | 80.4 | 80.6 | 86.0 | **89.0** | | | ARC-Easy (Acc.) | 25-shot | 97.6 | 98.4 | 98.4 | **98.9** | | | ARC-Challenge (Acc.) | 25-shot | 92.2 | 94.5 | **95.3** | **95.3** | | | HellaSwag (Acc.) | 10-shot | 87.1 | 84.8 | **89.2** | 88.9 | | | PIQA (Acc.) | 0-shot | 83.9 | 82.6 | **85.9** | 84.7 | | | WinoGrande (Acc.) | 5-shot | **86.3** | 82.3 | 85.2 | 84.9 | | | RACE-Middle (Acc.) | 5-shot | 73.1 | 68.1 | **74.2** | 67.1 | | | RACE-High (Acc.) | 5-shot | 52.6 | 50.3 | **56.8** | 51.3 | | | TriviaQA (EM) | 5-shot | 80.0 | 71.9 | **82.7** | **82.9** | | | NaturalQuestions (EM) | 5-shot | 38.6 | 33.2 | **41.5** | 40.0 | | | AGIEval (Acc.) | 0-shot | 57.5 | 75.8 | 60.6 | **79.6** | | Code | HumanEval (Pass@1) | 0-shot | 43.3 | 53.0 | 54.9 | **65.2** | | | MBPP (Pass@1) | 3-shot | 65.0 | 72.6 | 68.4 | **75.4** | | | LiveCodeBench-Base (Pass@1) | 3-shot | 11.6 | 12.9 | 15.5 | **19.4** | | | CRUXEval-I (Acc.) | 2-shot | 52.5 | 59.1 | 58.5 | **67.3** | | | CRUXEval-O (Acc.) | 2-shot | 49.8 | 59.9 | 59.9 | **69.8** | | Math | GSM8K (EM) | 8-shot | 81.6 | 88.3 | 83.5 | **89.3** | | | MATH (EM) | 4-shot | 43.4 | 54.4 | 49.0 | **61.6** | | | MGSM (EM) | 8-shot | 63.6 | 76.2 | 69.9 | **79.8** | | | CMath (EM) | 3-shot | 78.7 | 84.5 | 77.3 | **90.7** | | Chinese | CLUEWSC (EM) | 5-shot | 82.0 | 82.5 | **83.0** | 82.7 | | | C-Eval (Acc.) | 5-shot | 81.4 | 89.2 | 72.5 | **90.1** | | | CMMLU (Acc.) | 5-shot | 84.0 | **89.5** | 73.7 | 88.8 | | | CMRC (EM) | 1-shot | **77.4** | 75.8 | 76.0 | 76.3 | | | C3 (Acc.) | 0-shot | 77.4 | 76.7 | **79.7** | 78.6 | | | CCPM (Acc.) | 0-shot | **93.0** | 88.5 | 78.6 | 92.0 | | Multilingual | MMMLU-non-English (Acc.) | 5-shot | 64.0 | 74.8 | 73.8 | **79.4** | </div> Note: Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks. For more evaluation details, please check our paper. #### Context Window <p align="center"> <img width="80%" src="https://wsrv.nl/?url=https://huggingface.co/deepseek-ai/DeepSeek-V3/resolve/main/figures/niah.png"> </p> Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V3 performs well across all context window lengths up to **128K**. ### Chat Model #### Standard Benchmarks (Models larger than 67B) <div align="center"> | | **Benchmark (Metric)** | **DeepSeek V2-0506** | **DeepSeek V2.5-0905** | **Qwen2.5 72B-Inst.** | **Llama3.1 405B-Inst.** | **Claude-3.5-Sonnet-1022** | **GPT-4o 0513** | **DeepSeek V3** | |---|---------------------|---------------------|----------------------|---------------------|----------------------|---------------------------|----------------|----------------| | | Architecture | MoE | MoE | Dense | Dense | - | - | MoE | | | # Activated Params | 21B | 21B | 72B | 405B | - | - | 37B | | | # Total Params | 236B | 236B | 72B | 405B | - | - | 671B | | English | MMLU (EM) | 78.2 | 80.6 | 85.3 | **88.6** | **88.3** | 87.2 | **88.5** | | | MMLU-Redux (EM) | 77.9 | 80.3 | 85.6 | 86.2 | **88.9** | 88.0 | **89.1** | | | MMLU-Pro (EM) | 58.5 | 66.2 | 71.6 | 73.3 | **78.0** | 72.6 | 75.9 | | | DROP (3-shot F1) | 83.0 | 87.8 | 76.7 | 88.7 | 88.3 | 83.7 | **91.6** | | | IF-Eval (Prompt Strict) | 57.7 | 80.6 | 84.1 | 86.0 | **86.5** | 84.3 | 86.1 | | | GPQA-Diamond (Pass@1) | 35.3 | 41.3 | 49.0 | 51.1 | **65.0** | 49.9 | 59.1 | | | SimpleQA (Correct) | 9.0 | 10.2 | 9.1 | 17.1 | 28.4 | **38.2** | 24.9 | | | FRAMES (Acc.) | 66.9 | 65.4 | 69.8 | 70.0 | 72.5 | **80.5** | 73.3 | | | LongBench v2 (Acc.) | 31.6 | 35.4 | 39.4 | 36.1 | 41.0 | 48.1 | **48.7** | | Code | HumanEval-Mul (Pass@1) | 69.3 | 77.4 | 77.3 | 77.2 | 81.7 | 80.5 | **82.6** | | | LiveCodeBench (Pass@1-COT) | 18.8 | 29.2 | 31.1 | 28.4 | 36.3 | 33.4 | **40.5** | | | LiveCodeBench (Pass@1) | 20.3 | 28.4 | 28.7 | 30.1 | 32.8 | 34.2 | **37.6** | | | Codeforces (Percentile) | 17.5 | 35.6 | 24.8 | 25.3 | 20.3 | 23.6 | **51.6** | | | SWE Verified (Resolved) | - | 22.6 | 23.8 | 24.5 | **50.8** | 38.8 | 42.0 | | | Aider-Edit (Acc.) | 60.3 | 71.6 | 65.4 | 63.9 | **84.2** | 72.9 | 79.7 | | | Aider-Polyglot (Acc.) | - | 18.2 | 7.6 | 5.8 | 45.3 | 16.0 | **49.6** | | Math | AIME 2024 (Pass@1) | 4.6 | 16.7 | 23.3 | 23.3 | 16.0 | 9.3 | **39.2** | | | MATH-500 (EM) | 56.3 | 74.7 | 80.0 | 73.8 | 78.3 | 74.6 | **90.2** | | | CNMO 2024 (Pass@1) | 2.8 | 10.8 | 15.9 | 6.8 | 13.1 | 10.8 | **43.2** | | Chinese | CLUEWSC (EM) | 89.9 | 90.4 | **91.4** | 84.7 | 85.4 | 87.9 | 90.9 | | | C-Eval (EM) | 78.6 | 79.5 | 86.1 | 61.5 | 76.7 | 76.0 | **86.5** | | | C-SimpleQA (Correct) | 48.5 | 54.1 | 48.4 | 50.4 | 51.3 | 59.3 | **64.8** | Note: All models are evaluated in a configuration that limits the output length to 8K. Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results. DeepSeek-V3 stands as the best-performing open-source model, and also exhibits competitive performance against frontier closed-source models. </div> #### Open Ended Generation Evaluation <div align="center"> | Model | Arena-Hard | AlpacaEval 2.0 | |-------|------------|----------------| | DeepSeek-V2.5-0905 | 76.2 | 50.5 | | Qwen2.5-72B-Instruct | 81.2 | 49.1 | | LLaMA-3.1 405B | 69.3 | 40.5 | | GPT-4o-0513 | 80.4 | 51.1 | | Claude-Sonnet-3.5-1022 | 85.2 | 52.0 | | DeepSeek-V3 | **85.5** | **70.0** | Note: English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric. </div> ## 5. Chat Website & API Platform You can chat with DeepSeek-V3 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com/sign_in) We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.com](https://platform.deepseek.com/) ## 6. How to Run Locally DeepSeek-V3 can be deployed locally using the following hardware and open-source community software: 1. **DeepSeek-Infer Demo**: We provide a simple and lightweight demo for FP8 and BF16 inference. 2. **SGLang**: Fully support the DeepSeek-V3 model in both BF16 and FP8 inference modes. 3. **LMDeploy**: Enables efficient FP8 and BF16 inference for local and cloud deployment. 4. **TensorRT-LLM**: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon. 5. **vLLM**: Support DeekSeek-V3 model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism. 6. **AMD GPU**: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes. 7. **Huawei Ascend NPU**: Supports running DeepSeek-V3 on Huawei Ascend devices. Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation. Here is an example of converting FP8 weights to BF16: ```shell cd inference python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights ``` **NOTE: Huggingface's Transformers has not been directly supported yet.** ### 6.1 Inference with DeepSeek-Infer Demo (example only) #### Model Weights & Demo Code Preparation First, clone our DeepSeek-V3 GitHub repository: ```shell git clone https://github.com/deepseek-ai/DeepSeek-V3.git ``` Navigate to the `inference` folder and install dependencies listed in `requirements.txt`. ```shell cd DeepSeek-V3/inference pip install -r requirements.txt ``` Download the model weights from HuggingFace, and put them into `/path/to/DeepSeek-V3` folder. #### Model Weights Conversion Convert HuggingFace model weights to a specific format: ```shell python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16 ``` #### Run Then you can chat with DeepSeek-V3: ```shell torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200 ``` Or batch inference on a given file: ```shell torchrun --nnodes 2 --nproc-per-node 8 generate.py --node-rank $RANK --master-addr $ADDR --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE ``` ### 6.2 Inference with SGLang (recommended) [SGLang](https://github.com/sgl-project/sglang) currently supports MLA optimizations, FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks. Notably, [SGLang v0.4.1](https://github.com/sgl-project/sglang/releases/tag/v0.4.1) fully supports running DeepSeek-V3 on both **NVIDIA and AMD GPUs**, making it a highly versatile and robust solution. Here are the launch instructions from the SGLang team: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3 ### 6.3 Inference with LMDeploy (recommended) [LMDeploy](https://github.com/InternLM/lmdeploy), a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows. For comprehensive step-by-step instructions on running DeepSeek-V3 with LMDeploy, please refer to here: https://github.com/InternLM/lmdeploy/issues/2960 ### 6.4 Inference with TRT-LLM (recommended) [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) now supports the DeepSeek-V3 model, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link to experience the new features directly: https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3. ### 6.5 Inference with vLLM (recommended) [vLLM](https://github.com/vllm-project/vllm) v0.6.6 supports DeepSeek-V3 inference for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers _pipeline parallelism_ allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the [vLLM instructions](https://docs.vllm.ai/en/latest/serving/distributed_serving.html). Please feel free to follow [the enhancement plan](https://github.com/vllm-project/vllm/issues/11539) as well. ### 6.6 Recommended Inference Functionality with AMD GPUs In collaboration with the AMD team, we have achieved Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the [SGLang instructions](#63-inference-with-lmdeploy-recommended). ### 6.7 Recommended Inference Functionality with Huawei Ascend NPUs The [MindIE](https://www.hiascend.com/en/software/mindie) framework from the Huawei Ascend community has successfully adapted the BF16 version of DeepSeek-V3. For step-by-step guidance on Ascend NPUs, please follow the [instructions here](https://modelers.cn/models/MindIE/deepseekv3). ## 7. License This code repository is licensed under [the MIT License](LICENSE-CODE). The use of DeepSeek-V3 Base/Chat models is subject to [the Model License](LICENSE-MODEL). DeepSeek-V3 series (including Base and Chat) supports commercial use. ## 8. Citation ``` ``` ## 9. Contact If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).

DeepSeek
64K
$0.14/M
$0.28/M
Op
OpenAI: o1-mini
ID:openai/o1-mini

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1). Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.

OpenAI
128K
$3/M
$12/M
Op
OpenAI: o1
ID:openai/o1

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

OpenAI
200K
$15/M
$60/M
Free
Go
Google: Gemini 2.0 Flash Thinking Experimental (free)
ID:google/gemini-2.0-flash-thinking-exp:free

Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the [base Gemini 2.0 Flash model](/google/gemini-2.0-flash-exp).

Google
40K
50%
Ev
EVA Llama 3.33 70b
ID:eva-unit-01/eva-llama-3.33-70b

EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of [Llama-3.3-70B-Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct) on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model This model was built with Llama by Meta.

Eva-unit-01
16K
$4/M
$6/M
X-
xAI: Grok 2 Vision 1212
ID:x-ai/grok-2-vision-1212

Grok 2 Vision 1212 advances image-based AI with stronger visual comprehension, refined instruction-following, and multilingual support. From object recognition to style analysis, it empowers developers to build more intuitive, visually aware applications. Its enhanced steerability and reasoning establish a robust foundation for next-generation image solutions. To read more about this model, check out [xAI's announcement](https://x.ai/blog/grok-1212).

X-AI
33K
$2/M
$10/M
Ri
Sao10K: Llama 3.3 Euryale 70B
ID:sao10k/l3.3-euryale-70b

Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).

Rifx.Online
8K
$1.5/M
$1.5/M
Op
text-embedding-3-large
ID:text-embedding-3-large

text-embedding-3-large is OpenAI's latest text embedding model released in 2024. Compared to its predecessors, it offers several significant improvements: ## Key Features - **Enhanced Performance**: Outperforms the previous text-embedding-ada-002 model on most tasks - **Better Multilingual Support**: Supports embeddings for 100+ languages - **Longer Context**: Handles up to 8192 tokens of input - **Dimension Selection**: Allows embedding dimensions between 256 and 3072 ## Use Cases 1. Semantic Search 2. Text Classification 3. Recommendation Systems 4. Similarity Matching 5. RAG (Retrieval Augmented Generation) Applications ## Pricing - Input: $0.00013 / 1K tokens - Output: Free ## Usage Recommendations - Recommended for production environments requiring high-quality embeddings - Consider text-embedding-3-small for cost-sensitive applications - Choose embedding dimensions based on your specific needs to balance performance and efficiency text-embedding-3-large is one of the most powerful text embedding models available today, particularly suitable for applications requiring high-quality text representations.

OpenAI
$0.13/M
An
Magnum v4 72B
ID:anthracite-org/magnum-v4-72b

This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus). The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct).

Anthracite-org
33K
$1.875/M
$2.25/M
Ba
baichuan3-turbo
ID:baichuan/baichuan3-turbo

Baichuan3-Turbo is an advanced artificial intelligence language model designed to provide users with efficient and intelligent natural language processing solutions. Leveraging the latest deep learning technologies, this model boasts powerful text generation and comprehension capabilities, making it suitable for a wide range of applications including conversational systems, content creation, and information retrieval. ### Key Features: 1. **Efficiency**: Baichuan3-Turbo employs optimized algorithms that significantly enhance processing speed, allowing for quick responses to user queries. 2. **Diversity**: The model supports multiple languages and dialects, catering to the needs of users from various regions and increasing its versatility across different scenarios. 3. **Contextual Understanding**: By deeply learning contextual information, Baichuan3-Turbo excels at understanding user intent, generating more relevant and natural responses. 4. **Customizability**: Users can fine-tune the model according to specific requirements, ensuring alignment with industry standards or particular task demands. 5. **Safety and Compliance**: The design prioritizes data privacy and compliance with regulations, ensuring user information security while adhering to relevant laws. ### Application Scenarios: - **Customer Service**: Automating responses to customer inquiries to improve service efficiency. - **Content Generation**: Assisting creators in generating high-quality articles, reports, or social media content. - **Educational Support**: Providing personalized learning suggestions and answering student questions. - **Data Analysis**: Extracting key insights from large volumes of text for in-depth analysis. In summary, Baichuan3-Turbo delivers exceptional performance and flexible application capabilities, offering innovative solutions across various industries and serving as a vital tool in advancing the process of intelligent automation.

Baichuan
32K
$1.7/M
$1.7/M
Ba
baichuan4
ID:baichuan/baichuan4

### Baichuan4 Model Introduction Baichuan4 is a state-of-the-art artificial intelligence language model designed to enhance natural language understanding and generation capabilities. Built on cutting-edge deep learning techniques, Baichuan4 is tailored for diverse applications, ranging from conversational AI and content creation to data analysis and customer support. #### Key Features: 1. **Enhanced Performance**: Baichuan4 incorporates advanced algorithms that optimize processing efficiency, delivering faster response times and improved interaction quality. 2. **Multilingual Support**: The model is capable of understanding and generating text in multiple languages, making it accessible to a global audience and suitable for various linguistic contexts. 3. **Deep Contextual Awareness**: With its ability to comprehend nuanced context, Baichuan4 provides more accurate interpretations of user intent, resulting in responses that are not only relevant but also contextually appropriate. 4. **Adaptability**: Users can easily customize the model for specific use cases or industries, allowing for fine-tuning that meets unique operational needs or standards. 5. **Ethical Design**: The development of Baichuan4 emphasizes user privacy and ethical considerations, ensuring compliance with data protection regulations while maintaining high standards of security. #### Application Scenarios: - **Customer Interaction**: Streamlining customer service operations by automating responses and providing instant support. - **Content Creation**: Assisting writers in producing high-quality articles, marketing materials, or social media posts with ease. - **Educational Tools**: Supporting educators and learners with personalized tutoring solutions and resource recommendations. - **Data Mining and Analysis**: Extracting valuable insights from large datasets or unstructured text to facilitate informed decision-making. In conclusion, Baichuan4 stands at the forefront of AI language models, offering powerful capabilities that drive innovation across various fields. Its combination of performance, versatility, and ethical design makes it an essential tool for organizations seeking to leverage AI in their operations.

Baichuan
32K
$14.3/M
$14.3/M
Mo
moonshot-v1-8k
ID:moonshot/moonshot-v1-8k

### Moonshot-v1-8k Model Introduction Moonshot-v1-8k is a large-scale language model developed by Moonshot AI, known for its exceptional natural language processing capabilities. Utilizing advanced deep learning techniques, this model has been trained on a vast corpus of text data, enabling it to understand and generate human-like language, thereby providing users with an efficient and intelligent interactive experience. #### Key Features: 1. **Powerful Semantic Understanding**: Moonshot-v1-8k excels in semantic comprehension, accurately interpreting user inputs and generating appropriate responses. 2. **Instruction Following Ability**: The model effectively follows user instructions, handling everything from simple question answering to complex task execution with ease. 3. **Efficient Text Generation**: With support for up to 8192 tokens in context windows, Moonshot-v1-8k is particularly suited for real-time interactions involving short texts, quickly producing coherent and relevant output. 4. **Versatile Application Scenarios**: The model can be applied across various domains such as chatbots, customer support, content creation, and educational assistance, significantly enhancing its usability. 5. **Easy Integration and Use**: Moonshot-v1-8k offers API access that allows developers to seamlessly integrate the model into various applications, making the implementation of intelligent features more straightforward. #### Application Scenarios: - **Customer Support**: Automating customer inquiries to enhance response speed and service quality. - **Content Creation**: Assisting writers in generating articles, reports, or other forms of textual content. - **Educational Assistance**: Providing personalized learning suggestions and resolving student queries. - **Social Media Management**: Creating engaging social media posts to boost user interaction rates. In summary, Moonshot-v1-8k is a powerful language model with extensive potential applications across multiple fields. Its broad usability makes it an essential tool for advancing artificial intelligence developments. By leveraging this model, users can enhance productivity and achieve higher levels of human-computer interaction.

Moonshot
8K
$1.9/M
$1.9/M
Am
Amazon: Nova Pro 1.0
ID:amazon/nova-pro-v1

Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX). Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents. **NOTE**: Video input and tool calling is not supported at this time.

Amazon
300K
$0.8/M
$3.2/M
Am
Amazon: Nova Micro 1.0
ID:amazon/nova-micro-v1

Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has simple mathematical reasoning and coding abilities.

Amazon
128K
$0.035/M
$0.14/M
Op
GPT-4o mini
ID:gpt-4o-mini

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.

OpenAI
128K
$0.15/M
$0.6/M
40%
Op
gpt-4o
ID:rifx/gpt-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

OpenAI
128K
$2.5/M
$10/M
40%
Op
GPT-4o mini
ID:rifx/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.

OpenAI
128K
$0.15/M
$0.6/M
Gr
MythoMax 13B (extended)
ID:gryphe/mythomax-l2-13b:extended

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge _These are extended-context endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b). They may have higher prices._

Gryphe
8K
$1.125/M
$1.125/M
Free
Gr
MythoMax 13B (free)
ID:gryphe/mythomax-l2-13b:free

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge _These are extended-context endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b). They may have higher prices._

Gryphe
8K
Un
ReMM SLERP 13B (extended)
ID:undi95/remm-slerp-l2-13b:extended

A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

Undi95
4K
$1.125/M
$1.125/M
Go
Google: PaLM 2 Code Chat 32k
ID:google/palm-2-codechat-bison-32k

PaLM 2 fine-tuned for chatbot conversations that help with code-related questions.

Google
33K
$1/M
$2/M
01
01.AI: Yi Large
ID:01-ai/yi-large

The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service. It stands out for its multilingual proficiency, particularly in Spanish, Chinese, Japanese, German, and French. Check out the [launch announcement](https://01-ai.github.io/blog/01.ai-yi-large-llm-launch) to learn more.

01-ai
33K
$3/M
$3/M
Mi
Mistral Large 2411
ID:mistralai/mistral-large-2411

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.

MistralAI
128K
$2/M
$6/M
Mi
Mistral Large 2407
ID:mistralai/mistral-large-2407

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/). It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.

MistralAI
128K
$2/M
$6/M
Mi
Mistral: Pixtral Large 2411
ID:mistralai/pixtral-large-2411

Pixtral Large is a 124B open-weights multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is available under the Mistral Research License (MRL) for research and educational use; and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.

MistralAI
128K
$2/M
$6/M
Pe
Perplexity: Llama 3.1 Sonar 70B
ID:perplexity/llama-3.1-sonar-large-128k-chat

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/perplexity/llama-3.1-sonar-large-128k-online) of this model has Internet access.

Perplexity
131K
$1/M
$1/M
Pe
Perplexity: Llama 3.1 Sonar 8B
ID:perplexity/llama-3.1-sonar-small-128k-chat

Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. This is a normal offline LLM, but the [online version](/perplexity/llama-3.1-sonar-small-128k-online) of this model has Internet access.

Perplexity
131K
$0.2/M
$0.2/M
Op
OpenChat 3.5 7B
ID:openchat/openchat-7b

OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels. - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/openchat/openchat-7b). - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/openchat/openchat-8b). #open-source

Openchat
8K
$0.055/M
$0.055/M
Op
OpenAI: GPT-3.5 Turbo 16k (older v1106)
ID:openai/gpt-3.5-turbo-1106

An older GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021.

OpenAI
16K
$1/M
$2/M
Free
Un
Toppy M 7B (free)
ID:undi95/toppy-m-7b:free

A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit. List of merged models: - NousResearch/Nous-Capybara-7B-V1.9 - [HuggingFaceH4/zephyr-7b-beta](/huggingfaceh4/zephyr-7b-beta) - lemonilia/AshhLimaRP-Mistral-7B - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b - Undi95/Mistral-pippa-sharegpt-7b-qlora #merge #uncensored

Undi95
4K
Me
Meta: LlamaGuard 2 8B
ID:meta-llama/llama-guard-2-8b

This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API. It has demonstrated strong performance compared to leading closed-source models in human evaluations. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
8K
$0.18/M
$0.18/M
Mi
Mixtral 8x7B (base)
ID:mistralai/mixtral-8x7b

A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixtral 8x7B Instruct](/mistralai/mixtral-8x7b-instruct) for an instruct-tuned model. #moe

MistralAI
33K
$0.54/M
$0.54/M
Mi
Mistral Small
ID:mistralai/mistral-small

Cost-efficient, fast, and reliable option for use cases such as translation, summarization, and sentiment analysis.

MistralAI
32K
$0.2/M
$0.6/M
Mi
Mistral Tiny
ID:mistralai/mistral-tiny

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

MistralAI
32K
$0.25/M
$0.25/M
Go
Google: Gemini Pro 1.0
ID:google/gemini-pro

Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

Google
33K
$0.5/M
$1.5/M
Me
Llama 3 Lumimaid 70B
ID:neversleep/llama-3-lumimaid-70b

The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary. To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Meta Llama
8K
$3.375/M
$4.5/M
Al
Goliath 120B
ID:alpindale/goliath-120b

A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit). - [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios. #merge

Alpindale
6K
$9.375/M
$9.375/M
Go
Google: Gemini Pro Vision 1.0
ID:google/gemini-pro-vision

Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response. See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Google
16K
$0.5/M
$1.5/M
Mi
WizardLM-2 7B
ID:microsoft/wizardlm-2-7b

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It is a finetune of [Mistral 7B Instruct](/mistralai/mistral-7b-instruct), using the same technique as [WizardLM-2 8x22B](/microsoft/wizardlm-2-8x22b). To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/). #moe

Microsoft Azure
32K
$0.055/M
$0.055/M
Go
Google: Gemini Pro 1.5
ID:google/gemini-pro-1.5

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Google
2M
$1.25/M
$5/M
Co
Cohere: Command R+
ID:cohere/command-r-plus

command-r-plus-08-2024 is an update of the [Command R+](/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same. Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed). Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).

Cohere
128K
$2.85/M
$14.25/M
Da
Databricks: DBRX 132B Instruct
ID:databricks/dbrx-instruct

DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and [Mixtral-8x7b](/mistralai/mixtral-8x7b) on standard industry benchmarks for language understanding, programming, math, and logic. It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. See the launch announcement and benchmark results [here](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm). #moe

Databricks
33K
$1.08/M
$1.08/M
Ai
AI21: Jamba Instruct
ID:ai21/jamba-instruct

The Jamba-Instruct model, introduced by AI21 Labs, is an instruction-tuned variant of their hybrid SSM-Transformer Jamba model, specifically optimized for enterprise applications. - 256K Context Window: It can process extensive information, equivalent to a 400-page novel, which is beneficial for tasks involving large documents such as financial reports or legal documents - Safety and Accuracy: Jamba-Instruct is designed with enhanced safety features to ensure secure deployment in enterprise environments, reducing the risk and cost of implementation Read their [announcement](https://www.ai21.com/blog/announcing-jamba) to learn more. Jamba has a knowledge cutoff of February 2024.

Ai21
256K
$0.5/M
$0.7/M
Ri
Llama 3 Euryale 70B v2.1
ID:sao10k/l3-euryale-70b

Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom formatting / reply formats. - Very creative, lots of unique swipes. - Is not restrictive during roleplays.

Rifx.Online
8K
$0.35/M
$0.4/M
Mi
Mistral: Mistral 7B Instruct
ID:mistralai/mistral-7b-instruct

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*

MistralAI
33K
$0.055/M
$0.055/M
Mi
Phi-3 Mini 128K Instruct
ID:microsoft/phi-3-mini-128k-instruct

Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date.

Microsoft Azure
128K
$0.1/M
$0.1/M
Mi
Phi-3 Medium 128K Instruct
ID:microsoft/phi-3-medium-128k-instruct

Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing. At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance. For 4k context length, try [Phi-3 Medium 4K](/microsoft/phi-3-medium-4k-instruct).

Microsoft Azure
128K
$1/M
$1/M
Go
Google: Gemini Flash 1.5
ID:google/gemini-flash-1.5

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Google
1M
$0.075/M
$0.3/M
Co
Cohere: Command
ID:cohere/command

Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).

Cohere
4K
$0.95/M
$1.9/M
Co
Cohere: Command R
ID:cohere/command-r

Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. Read the launch post [here](https://txt.cohere.com/command-r/). Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).

Cohere
128K
$0.475/M
$1.425/M
Free
Qw
Qwen 2 7B Instruct (free)
ID:qwen/qwen-2-7b-instruct:free

Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen
33K
Go
Google: Gemma 2 27B
ID:google/gemma-2-27b-it

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Google
8K
$0.27/M
$0.27/M
Al
Magnum 72B
ID:alpindale/magnum-72b

From the maker of [Goliath](https://openrouter.ai/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet. The model is based on [Qwen2 72B](https://openrouter.ai/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.

Alpindale
16K
$3.75/M
$4.5/M
Free
Go
Google: Gemma 2 9B (free)
ID:google/gemma-2-9b-it:free

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Google
8K
Go
Google: Gemma 2 9B
ID:google/gemma-2-9b-it

Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness. See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).

Google
8K
$0.06/M
$0.06/M
Mi
Mistral: Codestral Mamba
ID:mistralai/codestral-mamba

A 7.3B parameter Mamba-based model designed for code and reasoning tasks. - Linear time inference, allowing for theoretically infinite sequence lengths - 256k token context window - Optimized for quick responses, especially beneficial for code productivity - Performs comparably to state-of-the-art transformer models in code and reasoning tasks - Available under the Apache 2.0 license for free use, modification, and distribution

MistralAI
256K
$0.25/M
$0.25/M
Mi
Mistral: Mistral Nemo
ID:mistralai/mistral-nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.

MistralAI
128K
$0.13/M
$0.13/M
Qw
Qwen 2 7B Instruct
ID:qwen/qwen-2-7b-instruct

Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning. It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen
33K
$0.054/M
$0.054/M
Me
Meta: Llama 3.1 405B (base)
ID:meta-llama/llama-3.1-405b

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version. It has demonstrated strong performance compared to leading closed-source models in human evaluations. Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
$2/M
$2/M
Free
Go
Google: Gemini Pro 1.5 Experimental
ID:google/gemini-pro-1.5-exp

Google's latest multimodal model, supporting image and video in text or chat prompts. Optimized for language tasks including: - Code generation - Text generation - Text editing - Problem solving - Recommendations - Information extraction - Data extraction or generation - AI agents Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms). #multimodal

Google
2M
An
Anthropic: Claude 3.5 Haiku (2024-10-22)
ID:anthropic/claude-3.5-haiku-20241022

Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries. It does not support image inputs. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use)

Anthropic
200K
$1/M
$5/M
An
Anthropic: Claude 3 Opus
ID:anthropic/claude-3-opus

Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Anthropic
200K
$15/M
$75/M
An
Anthropic: Claude 3 Sonnet
ID:anthropic/claude-3-sonnet

Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family) #multimodal

Anthropic
200K
$3/M
$15/M
An
Anthropic: Claude 3 Haiku
ID:anthropic/claude-3-haiku

Claude 3 Haiku is Anthropic's fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku) #multimodal

Anthropic
200K
$0.25/M
$1.25/M
An
Anthropic: Claude 3.5 Haiku
ID:anthropic/claude-3.5-haiku

Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries. It does not support image inputs. See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use)

Anthropic
200K
$1/M
$5/M
An
Anthropic: Claude 3.5 Sonnet
ID:anthropic/claude-3.5-sonnet

Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at: - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems) #multimodal

Anthropic
200K
$3/M
$15/M
Qw
Qwen2-VL 7B Instruct
ID:qwen/qwen-2-vl-7b-instruct

Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen
33K
$0.1/M
$0.1/M
Op
OpenAI: o1-preview
ID:openai/o1-preview

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1). Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.

OpenAI
128K
$15/M
$60/M
Ai
AI21: Jamba 1.5 Large
ID:ai21/jamba-1-5-large

Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality. It features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis. Built on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency. Read their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.

Ai21
256K
$2/M
$8/M
Ri
Llama 3.1 Euryale 70B v2.2
ID:sao10k/l3.1-euryale-70b

Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/sao10k/l3-euryale-70b).

Rifx.Online
8K
$0.35/M
$0.4/M
Ai
AI21: Jamba 1.5 Mini
ID:ai21/jamba-1-5-mini

Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency. It works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models. This model uses less computer memory and works faster with longer texts than previous designs. Read their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.

Ai21
256K
$0.2/M
$0.4/M
No
Nous: Hermes 3 70B Instruct
ID:nousresearch/hermes-3-llama-3.1-70b

Hermes 3 is a generalist language model with many improvements over [Hermes 2](/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

NousreSearch
131K
$0.4/M
$0.4/M
No
Nous: Hermes 3 405B Instruct
ID:nousresearch/hermes-3-llama-3.1-405b

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

NousreSearch
131K
$1.79/M
$2.49/M
Free
Me
Meta: Llama 3.2 11B Vision Instruct (free)
ID:meta-llama/llama-3.2-11b-vision-instruct:free

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
Me
Meta: Llama 3.2 11B Vision Instruct
ID:meta-llama/llama-3.2-11b-vision-instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
$0.055/M
$0.055/M
Me
Lumimaid v0.2 8B
ID:neversleep/llama-3.1-lumimaid-8b

Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).

Meta Llama
131K
$0.1875/M
$1.125/M
Op
GPT-4o
ID:gpt-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

OpenAI
128K
$2.5/M
$10/M
Op
OpenAI: GPT-4o
ID:openai/gpt-4o

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

OpenAI
128K
$2.5/M
$10/M
Op
OpenAI: GPT-4o-mini
ID:openai/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective. GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/). Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.

OpenAI
128K
$0.15/M
$0.6/M
Go
Google: Gemini 1.5 Flash-8B
ID:google/gemini-flash-1.5-8b

Gemini 1.5 Flash-8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results. [Click here to learn more about this model](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/). Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

Google
1M
$0.0375/M
$0.15/M
In
Inflection: Inflection 3 Productivity
ID:inflection/inflection-3-productivity

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines For emotional intelligence similar to Pi, see [Inflect 3 Pi](/inflection/inflection-3-pi) See [Inflection's announcement](https://inflection.ai/blog/enterprise) for more details.

Inflection
8K
$2.5/M
$10/M
In
Inflection: Inflection 3 Pi
ID:inflection/inflection-3-pi

Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It excels in scenarios like customer support, roleplay, and emotional intelligence.

Inflection
8K
$2.5/M
$10/M
Qw
Qwen2.5 7B Instruct
ID:qwen/qwen-2.5-7b-instruct

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen
131K
$0.27/M
$0.27/M
Th
Rocinante 12B
ID:thedrummer/rocinante-12b

Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives - Adventure-filled and captivating stories

Thedrummer
33K
$0.25/M
$0.5/M
Me
Meta: Llama 3.2 3B Instruct
ID:meta-llama/llama-3.2-3b-instruct

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
$0.03/M
$0.05/M
Free
Me
Meta: Llama 3.2 3B Instruct (free)
ID:meta-llama/llama-3.2-3b-instruct:free

Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
Qw
Qwen2-VL 72B Instruct
ID:qwen/qwen-2-vl-72b-instruct

Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen
33K
$0.4/M
$0.4/M
Qw
Qwen2.5 72B Instruct
ID:qwen/qwen-2.5-72b-instruct

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).

Qwen
131K
$0.35/M
$0.4/M
Free
Me
Meta: Llama 3.2 1B Instruct (free)
ID:meta-llama/llama-3.2-1b-instruct:free

Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
Me
Meta: Llama 3.2 90B Vision Instruct
ID:meta-llama/llama-3.2-90b-vision-instruct

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).

Meta Llama
131K
$0.35/M
$0.4/M