Have you ever noticed that we tend to to skip certain words and we don’t always notice when there are two of of them? (Did you find the two doubles in this sentence)?
This happens because we read probabilistically. We don’t look at each word, figure it out, then move to the next word. When our eyes make ballistic movements – they don’t run smoothly across the page, but rather move in jerks, called saccades, from position to position. We are not looking at a video feed of a page, but rather a bunch of “tiny snapshots”.
Unconsciously, we are trying to minimize the number of saccades it takes to read something. So our brain is trying to guess where to aim the next saccade based on what we’ve read so far and the different alternatives we are considering for what’s next. The goal is to figure out what it says, which means to raise the probability of one of the possible words/phrases/sentences/discourses as much as possible. (the article continues after the ad)
In a lot of cases, we are not going to be aiming at every word, or even aiming so as to put every word into that high-detail area at all. If we can be pretty sure what a word is without looking at it, or without looking at all the letters, we can skip a saccade that we would have spent looking at it. The word “the” is very predictable (i.e., not very informative), very common, and very short. In fact, it’s the most common word. It’s also part of a very restricted syntactic class (think “part of speech”, the is a determiner), so there aren’t many alternatives and all of them are much less frequent. Which all adds up to the being the least informative word in just about all contexts. In a sense, you probably don’t actually read most instances of the word “the”.