Multimodal models

realtime multimodal model

GPT-4o

OpenAI

GPT-4o is a Foundation models product from OpenAI, focused on realtime multimodal model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
flagship multimodal model

Gemini 2.5 Pro

Google DeepMind

Gemini 2.5 Pro is a Foundation models product from Google DeepMind, focused on flagship multimodal model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
lightweight multimodal model

Gemini 2.5 Flash

Google DeepMind

Gemini 2.5 Flash is a Foundation models product from Google DeepMind, focused on lightweight multimodal model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
high-end reasoning model

Claude Opus 4

Anthropic

Claude Opus 4 is a Foundation models product from Anthropic, focused on high-end reasoning model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
mainstream reasoning model

Claude Sonnet 4

Anthropic

Claude Sonnet 4 is a Foundation models product from Anthropic, focused on mainstream reasoning model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
omni model family

Qwen Omni

阿里通义

Qwen Omni is a Foundation models product from 阿里通义, focused on omni model family with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
multimodal model family

Qwen VL

阿里通义

Qwen VL is a Foundation models product from 阿里通义, focused on multimodal model family with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
audio understanding model

Qwen Audio

阿里通义

Qwen Audio is a Foundation models product from 阿里通义, focused on audio understanding model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
open multimodal models

DeepSeek VL

DeepSeek

DeepSeek VL is a Foundation models product from DeepSeek, focused on open multimodal models with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
multimodal reasoning model

Seed 1.5

字节 Seed

Seed 1.5 is a Foundation models product from 字节 Seed, focused on multimodal reasoning model with tags such as Foundation model, Multimodal, China ecosystem.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodalChina ecosystem
multimodal foundation models

Hunyuan Models

腾讯混元

Hunyuan Models is a Foundation models product from 腾讯混元, focused on multimodal foundation models with tags such as Foundation model, Multimodal, China ecosystem.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodalChina ecosystem
ERNIE model family

ERNIE Models

百度文心

ERNIE Models is a Foundation models product from 百度文心, focused on ERNIE model family with tags such as Foundation model, China ecosystem.

closed-source / platformmodel access / platform distributionNo API
Foundation modelChina ecosystemMultimodal
multimodal model family

MiniMax Models

MiniMax

MiniMax Models is useful for seeing MiniMax across text, voice, video, and globally-oriented product layers.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodalChina ecosystem
multimodal model family

Step Models

阶跃星辰

Step Models is a Foundation models product from 阶跃星辰, focused on multimodal model family with tags such as Foundation model, Multimodal, China ecosystem.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodalChina ecosystem
video generation model

Wan 2.1

阿里通义

Wan 2.1 is a Foundation models product from 阿里通义, focused on video generation model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
video generation model

CogVideoX

智谱 AI

CogVideoX is a Foundation models product from 智谱 AI, focused on video generation model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
video generation model

Sora 2

OpenAI

Sora 2 is a Foundation models product from OpenAI, focused on video generation model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
image generation model

GPT Image 1.5

OpenAI

GPT Image 1.5 is a Foundation models product from OpenAI, focused on image generation model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
vision language model

InternVL

OpenGVLab

InternVL is a Foundation models product from OpenGVLab, focused on vision language model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
lightweight multimodal

MiniCPM-V

OpenBMB

MiniCPM-V is a Foundation models product from OpenBMB, focused on lightweight multimodal with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
vision language model

Pixtral

Mistral AI

Pixtral is a Foundation models product from Mistral AI, focused on vision language model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
multimodal base model

PaliGemma

Google

PaliGemma is a Foundation models product from Google, focused on multimodal base model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
open multimodal model

Molmo

Allen AI

Molmo is a Foundation models product from Allen AI, focused on open multimodal model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
vision foundation model

Florence-2

Microsoft

Florence-2 is a Foundation models product from Microsoft, focused on vision foundation model with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
Popular

ChatGPT

OpenAI

ChatGPT is OpenAI's mainstream AI assistant entry point, combining general Q&A, writing, search, file analysis, and multimodal interaction in one product.

Closed Source / PlatformFree / SubscriptionAPI
Text ModelChatAPI
Multimodal

Gemini

Google

Gemini is one of Google's consumer AI entry points, with real strength coming from its linkage to search, Google Workspace, Android, and multimodal capabilities.

Closed Source / PlatformFree / SubscriptionAPI
Text ModelChatAPI
Long-form

Claude

Anthropic

Claude is Anthropic's main end-user assistant, best known for long-form handling, stable writing, document understanding, and enterprise-oriented safety.

Closed Source / PlatformFree / SubscriptionAPI
Text ModelChatAPI
Realtime

Grok

xAI

Grok is a text model product from xAI, focused on realtime workflows and official access.

Closed Source / PlatformFree / SubscriptionNo API
Text ModelChatMultimodal
China popular

Kimi

Moonshot AI

Kimi is Moonshot AI's most representative consumer product page, best known for long context, Chinese experience, and information synthesis.

Closed Source / PlatformFree / SubscriptionAPI
China ModelAPIChat
Mass-market AI

豆包

ByteDance

Doubao is ByteDance's mainstream AI entry, with value in broad consumer reach, low barrier for Chinese users, and linkage to ByteDance's content ecosystem.

Closed Source / PlatformFree / SubscriptionNo API
China ModelChatMultimodal
Alibaba

通义千问

Alibaba Cloud

Tongyi Qianwen is Alibaba's major end-user AI entry, but its real importance lies in how it connects to the Qwen family, Alibaba Cloud, and enterprise ecosystem.

Closed Source / PlatformFree / SubscriptionAPI
China ModelAPIChat
Reasoning

DeepSeek

DeepSeek

DeepSeek is one of the most watched reasoning-oriented AI assistants in China, with its main appeal in reasoning quality and cost efficiency rather than flashy features.

Closed Source / PlatformFree / SubscriptionAPI
China ModelAPIChat
Tencent

腾讯元宝

Tencent

腾讯元宝 is a China AI model product from Tencent, focused on tencent workflows and official access.

Closed Source / PlatformFree / SubscriptionNo API
China ModelChatMultimodal
Zhipu

智谱清言

Zhipu AI

智谱清言是基于 GLM-5 的全能 AI 助手,支持精通对话、写作与编程。为你答疑解惑,激发创意,更能理解图片与文档,提升学习与工作效率。

Closed Source / PlatformFree / SubscriptionAPI
China ModelAPIChat
official model

Aurora Image

xAI

Aurora Image is a Foundation models product from xAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Aya Vision 32B

Cohere

Aya Vision 32B is a Foundation models product from Cohere, focused on official model with tags such as Foundation model, API, Open source.

open-source / self-hostedopen source / self-hostedAPI
Foundation modelAPIOpen source
open model

Chameleon 7B

Meta

Chameleon 7B is a Foundation models product from Meta, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
official model

CogView 4

智谱 AI

CogView 4 is a Foundation models product from 智谱 AI, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

CogVLM2

智谱 AI

CogVLM2 is a Foundation models product from 智谱 AI, focused on official model with tags such as Foundation model, API, China ecosystem.

open-source / self-hostedopen source / self-hostedAPI
Foundation modelAPIChina ecosystem
official model

Command A Vision

Cohere

Command A Vision is a Foundation models product from Cohere, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Cosmos Predict1

NVIDIA

Cosmos Predict1 is a Foundation models product from NVIDIA, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Cosmos Reason1

NVIDIA

Cosmos Reason1 is a Foundation models product from NVIDIA, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Cosmos Transfer1

NVIDIA

Cosmos Transfer1 is a Foundation models product from NVIDIA, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

DeepSeek Janus Pro

DeepSeek

DeepSeek Janus Pro is a Foundation models product from DeepSeek, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
reasoning model family

DeepSeek Models

DeepSeek

DeepSeek Models is useful for viewing DeepSeek's overall layout across general reasoning, deep thinking, multimodal capability, and API cost efficiency.

closed-source / platformmodel access / platform distributionNo API
Foundation modelChina ecosystemMultimodal
official model

DeepSeek VL2

DeepSeek

DeepSeek VL2 is a Foundation models product from DeepSeek, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Gemini 1.5 Flash

Google DeepMind

Gemini 1.5 Flash is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Gemini 1.5 Pro

Google DeepMind

Gemini 1.5 Pro is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Gemini 2.0 Flash

Google DeepMind

Gemini 2.0 Flash is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Gemini 2.0 Flash Lite

Google DeepMind

Gemini 2.0 Flash Lite is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Gemini 2.0 Flash Live

Google DeepMind

Gemini 2.0 Flash Live is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
multimodal model family

Gemini Models

Google DeepMind

Gemini Models is a Foundation models product from Google DeepMind, focused on multimodal model family with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
GLM model family

GLM Models

智谱 AI

GLM Models summarizes Zhipu AI's Chinese model family, reasoning capabilities, and platform entry points.

closed-source / platformmodel access / platform distributionNo API
Foundation modelChina ecosystemMultimodal
official model

GLM-4V 9B

智谱 AI

GLM-4V 9B is a Foundation models product from 智谱 AI, focused on official model with tags such as Foundation model, API, China ecosystem.

open-source / self-hostedopen source / self-hostedAPI
Foundation modelAPIChina ecosystem
official model

GPT Image 1

OpenAI

GPT Image 1 is a Foundation models product from OpenAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
general multimodal model

GPT-4.1

OpenAI

GPT-4.1 is a Foundation models product from OpenAI, focused on general multimodal model with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
official model

GPT-4o mini

OpenAI

GPT-4o mini is a Foundation models product from OpenAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

GPT-4o mini Realtime

OpenAI

GPT-4o mini Realtime is a Foundation models product from OpenAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

GPT-4o Realtime

OpenAI

GPT-4o Realtime is a Foundation models product from OpenAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
open model

Granite Vision 3.2

IBM

Granite Vision 3.2 is a Foundation models product from IBM, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
official model

Grok 2 Vision

xAI

Grok 2 Vision is a Foundation models product from xAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Grok Live

xAI

Grok Live is a Foundation models product from xAI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
reasoning model family

Grok Models

xAI

Grok Models is a Foundation models product from xAI, focused on reasoning model family with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
official model

Hailuo 02

MiniMax

Hailuo 02 is a Foundation models product from MiniMax, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Hailuo Video 01

MiniMax

Hailuo Video 01 is a Foundation models product from MiniMax, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Hunyuan 3D 2.0

腾讯混元

Hunyuan 3D 2.0 is a Foundation models product from 腾讯混元, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Hunyuan Video

腾讯混元

Hunyuan Video is a Foundation models product from 腾讯混元, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Hunyuan Vision

腾讯混元

Hunyuan Vision is a Foundation models product from 腾讯混元, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
image generation

Imagen 3

Google DeepMind

Imagen 3 is a Image generation product from Google DeepMind, focused on image generation with tags such as Image generation, Multimodal.

closed-source / platformfree / paidNo API
Image generationMultimodal
official model

Imagen 4

Google DeepMind

Imagen 4 is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
open model family

Llama

Meta

Llama is a Foundation models product from Meta, focused on open model family with tags such as Foundation model, Multimodal, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
open model

Llama 3.2 11B Vision

Meta

Llama 3.2 11B Vision is a Foundation models product from Meta, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
open model

Llama 3.2 90B Vision

Meta

Llama 3.2 90B Vision is a Foundation models product from Meta, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
open model

Llama 4 Maverick

Meta

Llama 4 Maverick is a Foundation models product from Meta, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
open model

Llama 4 Scout

Meta

Llama 4 Scout is a Foundation models product from Meta, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
official model

MedGemma

Google DeepMind

MedGemma is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Open source.

open-source / self-hostedopen source / self-hostedAPI
Foundation modelAPIOpen source
Multimodal

MiniMax

MiniMax

MiniMax是全球领先的通用人工智能科技公司,致力于"与所有人共创智能",自主研发了一系列多模态通用大模型,并面向全球推出一系列AI原生产品,已服务逾 2亿名用户

Closed Source / PlatformFree / SubscriptionAPI
China ModelAPIMultimodal
reasoning model family

Nemotron

NVIDIA

Nemotron is a Foundation models product from NVIDIA, focused on reasoning model family with tags such as Foundation model, Multimodal.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodal
closed model family

OpenAI Models

OpenAI

OpenAI Models aggregates OpenAI's flagship foundation, reasoning, realtime, and embedding model entry points.

closed-source / platformmodel access / platform distributionNo API
Foundation modelChatMultimodal
official model

PaddleOCR-VL

百度文心

PaddleOCR-VL is a Foundation models product from 百度文心, focused on official model with tags such as Foundation model, API, China ecosystem.

open-source / self-hostedopen source / self-hostedAPI
Foundation modelAPIChina ecosystem
open model

Phi-3.5 Vision

Microsoft

Phi-3.5 Vision is a Foundation models product from Microsoft, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
open model

Phi-4 Multimodal

Microsoft

Phi-4 Multimodal is a Foundation models product from Microsoft, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
official model

Pixtral 12B

Mistral AI

Pixtral 12B is a Foundation models product from Mistral AI, focused on official model with tags such as Foundation model, API, Multimodal.

open-source / self-hostedopen source / self-hostedAPI
Foundation modelAPIMultimodal
open model

QVQ-72B Preview

阿里通义

QVQ-72B Preview is a Foundation models product from 阿里通义, focused on open model with tags such as Foundation model, China ecosystem, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelChina ecosystemOpen source
open model family

Qwen

阿里通义

Qwen is one of the most complete China-based open model families, spanning text, vision, audio, coding, and omni directions.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelMultimodalOpen source
open model

Qwen2.5 Audio 7B

阿里通义

Qwen2.5 Audio 7B is a Foundation models product from 阿里通义, focused on open model with tags such as Foundation model, China ecosystem, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelChina ecosystemOpen source
open model

Qwen2.5 Omni 7B

阿里通义

Qwen2.5 Omni 7B is a Foundation models product from 阿里通义, focused on open model with tags such as Foundation model, China ecosystem, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelChina ecosystemOpen source
open model

Qwen2.5 VL 72B

阿里通义

Qwen2.5 VL 72B is a Foundation models product from 阿里通义, focused on open model with tags such as Foundation model, China ecosystem, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelChina ecosystemOpen source
open model

Qwen2.5 VL 7B

阿里通义

Qwen2.5 VL 7B is a Foundation models product from 阿里通义, focused on open model with tags such as Foundation model, China ecosystem, Open source.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelChina ecosystemOpen source
foundation model family

Seed Models

字节 Seed

Seed Models aggregates ByteDance foundation-model and multimodal entry points with an emphasis on productization and content workflows.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodalChina ecosystem
multimodal model platform

SenseNova

商汤科技

SenseNova is a Foundation models product from 商汤科技, focused on multimodal model platform with tags such as Foundation model, Multimodal, China ecosystem.

closed-source / platformmodel access / platform distributionNo API
Foundation modelMultimodalChina ecosystem
official model

SenseNova 3D

商汤日日新

SenseNova 3D is a Foundation models product from 商汤日日新, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

SenseNova Vision

商汤日日新

SenseNova Vision is a Foundation models product from 商汤日日新, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
open model

SmolVLM 500M

Hugging Face

SmolVLM 500M is a Foundation models product from Hugging Face, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
open model

SmolVLM2 2.2B

Hugging Face

SmolVLM2 2.2B is a Foundation models product from Hugging Face, focused on open model with tags such as Foundation model, Open source, Multimodal.

open-source / self-hostedopen source / self-hostedNo API
Foundation modelOpen sourceMultimodal
text to video model

Sora

OpenAI

Sora is a Video generation product from OpenAI, focused on text to video model with tags such as Multimodal, Video editing.

closed-source / platformfree trial / subscriptionNo API
MultimodalVideo editing
official model

Step 1V

阶跃星辰

Step 1V is a Foundation models product from 阶跃星辰, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Step Audio

阶跃星辰

Step Audio is a Foundation models product from 阶跃星辰, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Step R1 V Mini

阶跃星辰

Step R1 V Mini is a Foundation models product from 阶跃星辰, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Step Video I2V

阶跃星辰

Step Video I2V is a Foundation models product from 阶跃星辰, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
official model

Step Video T2V

阶跃星辰

Step Video T2V is a Foundation models product from 阶跃星辰, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
video model

Veo

Google DeepMind

Veo is a Video generation product from Google DeepMind, focused on video model with tags such as Multimodal, Video editing.

closed-source / platformfree trial / subscriptionNo API
MultimodalVideo editing
video generation

Veo 2

Google DeepMind

Veo 2 is a Video generation product from Google DeepMind, focused on video generation with tags such as Video editing, Multimodal.

closed-source / platformfree trial / subscriptionNo API
Video editingMultimodal
official model

Veo 3

Google DeepMind

Veo 3 is a Foundation models product from Google DeepMind, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Voxtral Mini

Mistral AI

Voxtral Mini is a Foundation models product from Mistral AI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Voxtral Small

Mistral AI

Voxtral Small is a Foundation models product from Mistral AI, focused on official model with tags such as Foundation model, API, Multimodal.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIMultimodal
official model

Yi Vision

零一万物

Yi Vision is a Foundation models product from 零一万物, focused on official model with tags such as Foundation model, API, China ecosystem.

closed-source / platformmodel access / platform distributionAPI
Foundation modelAPIChina ecosystem
Selection guide

How to choose Multimodal models

  • Verify input type first: image-text, video, and audio understanding are distinct capabilities and should be filtered by real input.
  • Evaluate recognition accuracy and context length before checking mixed multi-turn input support.
  • For integrations, focus on API latency, image size limits, and batch-processing support.
  • Before commercial use, verify privacy policy, retention rules for uploads, and media compliance.

What matters first on Multimodal models category pages?

Start with official access, pricing model, API support, open/closed status, and common use cases.