Entropy Neurons
Neurons encoding uncertainty, ambiguity, and information noise.
About Entropy Neurons
Entropy neurons are the rarest atom type — accounting for only 0.2–0.6 percent of classified features across all models — but they may be the most diagnostically important for AI safety. They encode the model's internal representation of its own uncertainty, the ambiguity of its inputs, and the noise in its signal. The existence of a dedicated Entropy neuron type suggests that transformers have evolved (through training) a specialized system for representing epistemic state. The key finding from our hallucination research: when Entropy neurons are absent from a model's response on a difficult question, the model produces confident wrong answers rather than hedged uncertain ones. Entropy neurons are the mechanism by which models know what they don't know — when they fail, the model doesn't know that it doesn't know.