AI Concepts·intermediate
Training data
The enormous pile of text the AI learned from — before you ever talked to it.
Before a model can answer anything, it's trained on an enormous corpus of text. Books, websites, code, documentation, transcripts. The model reads it all and learns the patterns.
Two things follow:
- The model only knows what was in its training data. If something happened after the cutoff, or was never written down, the model doesn't know.
- Biases in the data end up in the model. If the data skews, the output skews too.
Training data has a cutoff date. Ask the model "what do you know about events after January 2024?" and it'll usually tell you. After that, it has nothing.