AI Concepts·intermediate
Multimodal
An AI that can handle text plus images, audio, or video — not just typed input.
Early LLMs only did text in, text out. Modern models are multimodal — you can feed them images, PDFs, audio, sometimes video, and they'll reason about all of it alongside your text.
Practical examples:
- Upload a screenshot of a bug and ask "what's wrong?"
- Photograph a handwritten recipe and ask for the ingredient list as JSON.
- Paste a chart and ask the model to describe the trend.
Mosaic students meet this early when they upload homework photos to claude.ai. "Multimodal" is just the technical name for that capability.