OpenAI says this is the key reason hallucinations still exist, even as models like GPT-5 have become much better at reasoning.
If you've ever used ChatGPT and noticed it confidently telling you something that later turned out to be wrong, you've experienced what experts call a "hallucination." Now, OpenAI has explained why this happens and what it's doing to fix the problem.
In plain words, hallucinations are when an AI makes up answers that sound convincing but aren't true. For example, OpenAI researchers asked a chatbot about the title of a scientist's PhD dissertation. The AI confidently gave three answers -- every one of them false. It even got his birthday wrong. The problem isn't that the model is "lying", it's that the way these systems are trained actually encourages guessing.
Here's why. Imagine you're taking a multiple-choice test. If you don't know the answer, you might guess. Sometimes you'll get lucky and earn points. If you leave it blank, you get nothing. Over time, guessing looks like the smarter strategy. Chatbots work in a similar way. Most evaluations of AI only check whether answers are right or wrong. Saying "I don't know" counts as zero, while guessing could get credit. So, the AI learns that it's better to guess even if it's wrong.
OpenAI says this is the key reason hallucinations still exist, even as models like GPT-5 have become much better at reasoning. The company argues that tests need to change. Instead of only rewarding accuracy, scoreboards should penalize confident wrong answers more than honest uncertainty. In other words, it should be better for a chatbot to admit it doesn't know something than to make up an answer.
The issue also comes down to how these models are built. During training, they learn to predict the "next word" in billions of sentences. This works well for things like grammar and spelling, which follow clear patterns. But when it comes to rare facts, like someone's exact birthday, there are no obvious patterns, so errors are unavoidable.
Still, OpenAI insists hallucinations aren't an unsolvable glitch. A model can reduce errors by knowing its limits and abstaining when uncertain. Smaller models, in fact, sometimes do this better than bigger ones.
The bottom line: ChatGPT and other AI tools are getting more reliable, but they're not perfect yet. OpenAI's new research suggests the fix lies not just in smarter models but also in smarter ways of grading them -- rewarding honesty over confident mistakes.