Blog Post

Xiaomi MiMo AI Models: How Xiaomi Built a Powerful Open Source AI Lineup

Xiaomi’s MiMo AI models are redefining the global AI race with powerful open source models for text, voice, vision, and automation. From lower costs to self-hosted frontier AI, this new lineup could change how businesses build AI products at scale.

Xiaomi MiMo AI Models: How Xiaomi Built a Powerful Open Source AI Lineup - Blog post featured image

When most people hear "Xiaomi," they think of cheap phones, fitness bands, and electric scooters. That picture is now badly out of date.

Over the last six months, Xiaomi's AI team (called MiMo) has quietly shipped eight different AI models. Their newest one matches the best from OpenAI and Anthropic on the hardest tasks, costs about 80% less to use, and is free for anyone to download. By April 2026, Xiaomi's models were running roughly 21% of all traffic on OpenRouter, the world's largest AI marketplace. OpenAI was at about 7.5%.

That gap is not a typo. A phone company is now ahead of OpenAI on one of the most-watched usage charts in the industry.

Here is what the lineup actually looks like, why it matters, and what we are doing with it at Axentia.

How Xiaomi Got Here

The MiMo team is led by Luo Fuli, who used to work on DeepSeek (the Chinese lab that made headlines last year for matching ChatGPT at a fraction of the cost). Xiaomi has committed at least $8.7 billion to AI over three years. The release pace shows that the money is moving:

  • December 2025: their first model, MiMo-V2-Flash
  • March 2026: a trio called V2-Pro, V2-Omni, and V2-TTS
  • April 2026: the V2.5 family, plus a full voice stack

Each release has gotten meaningfully better. The April 2026 launch is the one that put them on most people's radar.

The Models That Read and Write

Think of these as the "brains" of the lineup. They do the same things ChatGPT or Claude do: answer questions, write emails, generate code, analyze documents.

MiMo-V2-Flash (December 2025)

Flash is the cheap, fast option. Roughly 3x faster than most competitors, and priced at about 3% of what Claude or GPT-5 cost per use. For high-volume work like customer support automation, document summarization, or anything where you are paying per question, this changed the math.

It also topped the global leaderboard for solving real-world software bugs, beating every other open-source model.

MiMo-V2-Pro (March 2026)

Pro is the bigger, smarter sibling. It can hold the equivalent of a 1,500-page document in memory at once and reason across all of it. It was also the first open model that genuinely competed with Claude and GPT on agent work, where the AI uses tools, runs code, and completes multi-step jobs on its own.

Pro is tuned to work especially well with OpenClaw, an open-source agent framework that we deploy for clients at Axentia. For OpenClaw projects specifically, Pro became the first open model we trusted with serious work.

MiMo-V2-Omni (March 2026)

Omni added eyes and ears. It can look at images, watch videos, and listen to audio, all in the same model. Xiaomi uses it inside their robotics and self-driving car projects, where the AI needs to actually see the world and act on what it sees.

MiMo-V2.5 (April 2026)

V2.5 merged Pro and Omni into a single model. One AI that can read documents, look at images, watch videos, listen to audio, and act on all of it. For everyday users, that means you can show it a photo of your fridge and ask for dinner ideas, drop in a video tutorial and get a step-by-step summary, or record a meeting and have it pull out action items, all without switching tools.

It scores roughly the same as Google's Gemini 3 Pro on video understanding, which is the current high water mark.

MiMo-V2.5-Pro (April 2026): The Headline Model

This is the flagship and the model that has people talking. Three demos from the launch tell the story better than any benchmark.

Demo 1: It built a compiler from scratch in an afternoon. A compiler is one of the harder pieces of software a computer science student ever writes. The reference project for this test normally takes a Peking University CS major several weeks to finish. MiMo-V2.5-Pro did it in 4.3 hours, working entirely on its own, and scored a perfect 233 out of 233 on the hidden test suite.

Demo 2: It built a working video editor over 11 hours. From a few simple prompts, the model produced a real desktop video editing app with a multi-track timeline, clip trimming, fade effects, audio mixing, and an export feature. About 8,000 lines of working code. No human in the loop. This is the kind of project that would normally cost a small dev team a few weeks.

Demo 3: It designed a real microchip. A graduate-level analog circuit design task that normally takes a trained engineer several days. MiMo-V2.5-Pro did it in about an hour by repeatedly running circuit simulations, reading the results, and tweaking the design.

The point is not the demos themselves. The point is that this model can sustain hours of focused work on hard problems without going off the rails. That is the property that turns "AI can help you with tasks" into "AI can actually finish jobs for you."

It matches Claude Opus 4.6 (Anthropic's flagship) on most tests, while costing about $1 per million words of input versus $5 for Opus. It is also fully open source, so a company can download it and run it on their own servers if they want to.

The Models That Speak and Listen

This is the voice half of the lineup, and it is the part most businesses will end up using first.

MiMo-V2-TTS and the V2.5-TTS Series

TTS stands for text-to-speech. Give the model a script, and it reads it out loud in a natural human voice. Xiaomi's version goes further than most. You can describe how the voice should sound in plain English ("speak slowly, sounds tired, gentle but firm") and the model just does it. No fiddling with sliders. It also handles emotions, accents, dialects, and even singing.

The V2.5 release added two new capabilities:

  • VoiceDesign lets you create an entirely new voice from a single text description. Useful for game characters, audiobooks, or branded customer service voices.
  • VoiceClone lets you clone an existing voice from a small audio sample. Useful when you want one specific person's voice across all your content.

Both work in Chinese and English, and you can drop little tags into the script to control emotion at specific moments (a sigh, a laugh, a pause).

MiMo-V2.5-ASR

ASR is the opposite of TTS. It listens to audio and writes down what was said. This is the model that powers transcription, voice assistants, meeting notes, and anything else where speech becomes text.

The reason this one matters is that most transcription tools fall apart in the real world. Background noise, multiple people talking at once, accents, mixed languages, song lyrics: all of these things break most systems. Xiaomi's ASR was trained specifically for messy real-world audio. It beats OpenAI's Whisper (the most popular open-source option) on English, and it handles Chinese dialects far better than anything else open source.

It is also free to download and run on your own servers.

What This Actually Means If You Are Not Building AI

Even if you are not in tech, three things from this release matter.

The price of good AI just dropped a lot. Whatever you were paying for ChatGPT, Claude, or Gemini at scale, there is now a comparable option for a fraction of the cost. For most businesses, this changes which AI features are worth building.

You can finally run frontier AI on your own servers. Most companies have data they cannot send to OpenAI or Google for legal or competitive reasons. Until recently, the open-source alternatives were a step behind. They are not anymore. If your business has been waiting for this to be possible, the wait is over.

Voice agents are about to get good. Voice has been the weak link in AI products for years. Robot voices, terrible transcription, no emotional control. Xiaomi's voice models close most of that gap. Expect a lot more voice-first AI products in the next twelve months, and expect them to actually work this time.

The Bigger Picture

There is a story underneath all of this that is worth noticing. Six months ago, the "AI race" was a small group of American labs with very expensive models. Today, a Chinese phone company is on top of the world's biggest AI usage chart, shipping models that are free to use, free to download, and competitive with anything in the US.

That does not mean OpenAI or Anthropic are in trouble. They are still pushing the frontier. But for the average company picking which AI to build with, the choice is no longer just between three or four expensive American options. The Chinese open-source ecosystem is now genuinely competitive, and Xiaomi has just become one of its most important players.

If your team has not looked at MiMo yet, it is worth a few hours of your engineers' time. And if you want help figuring out which model fits which problem, that is exactly what we do at Axentia.

Explore More Articles

Discover other insightful articles and stories from our blog.