← Dictionary
AI Concepts·intermediate

Training data

The enormous pile of text the AI learned from — before you ever talked to it.

Before a model can answer anything, it's trained on an enormous corpus of text. Books, websites, code, documentation, transcripts. The model reads it all and learns the patterns.

Two things follow:

  • The model only knows what was in its training data. If something happened after the cutoff, or was never written down, the model doesn't know.
  • Biases in the data end up in the model. If the data skews, the output skews too.

Training data has a cutoff date. Ask the model "what do you know about events after January 2024?" and it'll usually tell you. After that, it has nothing.

Related