Best Local Ai Model For CPU in 2025

User avatar placeholder
Written by Soham Pratap

September 25, 2025

Why does AI use GPU instead of CPU? is one of the most asked questions today, AI models consume a lot of computing power; this is where parallel computation comes in and is mostly handled by GPUs. CPUs are known for sequential tasks, which are not efficient for running AI models. But this does not prevent us from running smaller AI models on our PCs.

If you are looking to build a good AI PC workstation, you can have a look at my article, where I have written a comprehensive guide on building your first AI PC rig.

Anyway, I’m here to talk about the best AI models that you can run on a CPU. In this article I personally tested these small language models on a CPU using Ollama.

1. Gemma 3n

Gemma 3n

Gemma 3n is a smaller model developed by Google. Gemma 3n comes in two parameter sizes — Gemma 3n Effective 2B and Gemma 3n Effective 4B. These models are trained on 140 spoken languages.

Gemma 3n was created in close collaboration with leading mobile hardware manufacturers. It shares architecture with the next generation of Gemini Nano to empower a new wave of intelligent, on-device applications. – Google


Gemma 3n is a privacy-focused, mobile-first large language model. The LLM uses the MatFormer architecture, which reduces compute and memory requirements by making inference more flexible. This is why you can run a 4B-parameter LLM on your CPU.

This model is capable of speech recognition, translation, and audio data analysis. You can also input visual and text data while interacting with the model.

Install Gemma3n via Ollama –

2. TinyLlama

TinyLlama

TinyLlama is a very small LLM with only 1.1B parameters. TinyLlama is a pretrained 1.1B LLaMA model trained on 3 trillion tokens. The model was trained in just 90 days using 16 A100-40G GPUs. This is a 4-bit quantized model whose weights are only about 637 MB.

Apart from Gemma 3n, this is one of the best models that can easily run on most modern CPUs. This model was specifically made to be deployed on devices with restrictive memory and computational resources. With TinyLlama it is possible to perform tasks like real-time machine translation — without any internet connection.
If you’re looking to use an assistive model in your AI agent, TinyLlama could be very useful.

To install it via Ollama –


ollama pull tinyllama:latest

3. DeepSeek R1 1.5B

DeepSeek R1

While DeepSeek is a very large model capable of processing large amounts of data, it is also available in a 1.5B-parameter version that can easily be run on your CPU. This model has a 128K context and is only 1.1GB in size, which makes it efficient to run on a CPU.

DeepSeek R1 1.5B is the only model in this list that can be run on a CPU and also has reasoning capabilities, so if you require a basic text-generation or code-support agent, this model is a great choice.

Resource Requirements –

  • RAM: Minimum 8GB; works effectively on modern CPUs
  • GPU: Optional but recommended — 4GB VRAM for quantized versions
  • Storage: Only 1.1GB for the quantized model
  • CPU Performance: 25.71 tokens/s on high-core CPU configurations
  • GPU Performance: Over 202 tokens/s with GPU acceleration

The model can run on-

  • Standard laptops and consumer-grade hardware
  • Apple MacBook Air with M2/M3 chips and 8GB RAM
  • Edge devices and mobile platforms
  • Web browsers using WebGPU

To install DeepSeek R1 1.5b via Ollama –


ollama pull deepseek-r1:1.5b

4. SmolLM2

SmolLM2

SmolLM2 is another small LLM that can be run on a CPU. This is version 2 of the earlier SmolLM; SmolLM2 comes in three sizes: 135M, 360M, and 1.7B parameters, which can run on both older and newer CPUs. The LLM is capable of running on mobile devices, laptops, and desktop computers.

The model was trained on 11 trillion tokens using various datasets, including code. You can use this model for tasks such as summarizing, rewriting, and generating text. The 1.7B model is about 1.8 GB and the smallest 135M model is about 271 MB.

You can have a look at the comparison of SmolLM2 to other models –

Benchmark SmolLM2-1.7B Llama3.2-1B Qwen2.5-1.5B
ARC (Science & Reasoning) 60.5% 49.2% 58.5%
OpenBookQA (Science & Reasoning) 42.2% 38.4% 40.0%
HellaSwag (Commonsense Reasoning) 68.7% 61.4% 66.4%
CommonsenseQA (Commonsense Reasoning) 43.6% 41.2% 34.1%
WinoGrande (Commonsense Reasoning) 59.4% 57.8% 59.3%
PIQA (Commonsense Reasoning) 77.6% 74.8% 76.1%
MMLU (Knowledge) 42.3% 36.6% 41.1%
TriviaQA (Knowledge) 36.7% 28.1% 20.9%
GSM8K (Math) 31.0% 7.2% 61.3%
 

To install SmolLM2 via Ollama –

5. Qwen 2.5

Qwen 2.5

Qwen 2.5 was developed by Alibaba and scaled on Alibaba’s large-scale datasets. There are three models under Qwen 2.5 available — Qwen 2.5, Qwen 2.5-Coder, and Qwen 2.5-Math — each with its own specialty. You can use Qwen 2.5-Coder for coding and Qwen 2.5-Math for mathematics; both have undergone substantial enhancements for their respective purposes.

The models are available in various sizes, ranging from 0.5B to 72B parameters. Smaller sizes can be run on your CPU, while larger ones require GPUs. The available sizes are below –

  • Qwen 2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
  • Qwen 2.5-Coder: 1.5B, 7B, and 32B on the way
  • Qwen 2.5-Math: 1.5B, 7B, and 72B.

The 0.5B and 1.5B models are only 398 MB and 986 MB respectively, both with a 32K context window. These models were trained on 18 trillion tokens. The models support over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

To install Qwen via Ollama –

6. Phi-4 Mini

Phi-4

 

The very last model in my list is Phi-4 Mini. Phi-4 Mini is slightly larger than models like TinyLlama, Gemma 3n, and SmolLM2. It comes with 3.8B parameters, a 138K context window, and is 2.5 GB in size. If your CPU can run this model, it will cover most of your smaller AI requirements.

The model was trained using an innovative approach developed by Microsoft Research on publicly available data. The model is intended to support commercial and research use and to be incorporated in various AI applications.

Phi-4 Mini is well-suited for PCs with constrained memory or compute environments and can also perform strong reasoning in math and logic. The model is designed to accelerate research on language and multimodal models and to serve as a building block for generative AI–powered features.

To install Phi-4 Mini via Ollama –


ollama pull phi4-mini:3.8b

Conclusion

As we reviewed the best small large language models that can run on CPUs, we found LLMs that have been scaled down to smaller sizes to work on CPUs. These smaller models are also good for training on your own datasets.

Without major memory and compute constraints, it’s easier to run these models. I believe Gemma 3n is the best model for someone looking to run a model on the CPU. All of these models can be run via Ollama or LM Studio.

If you have more questions, leave them in the comments section and I will address them.

Image placeholder

An AI and hardware enthusiast passionate about pushing the boundaries of technology. I design, train, and execute cutting-edge AI models while also building powerful AI-enhanced PCs and custom rigs. I also provide consultancy to help individuals and businesses unlock the full potential of AI-driven solutions.Build Your AI PC in 2025: A Step-by-Step Guide

Leave a Comment