About the Inferential System I
A further examination of perception will tell us more about how the Inferential System works. This information is useful, but it doesn't have a lot of direct practical applications, because I am still discussing perception, which works about as well as it is going to work. So if you want to skip this essay (and the next) and go on to the more interesting stuff, that's fine. This essay will still be here when you want to read it.
The subtheme for this essay (and the next) is that there is good evidence for the Inferential System. Some people will naturally and easily believe in intuition and flow. Others will be more skeptical. Intuition and flow seem flakey, because they contradict our culture's beliefs about how the brain works. Our culture's beliefs are wrong. But skeptics need evidence, and without evidence, they will not believe. So the evidence in these essays is meant to bring the skeptics on board. In the last essay, the evidenc ineluctably led to the conclusion that you have a powerful unconscious system for making conclusions. This essay brings out the nature of this system.
Automatic Operation
The task for the Stroop effect is simply to name the colors that words are printed in. So if you saw a word printed in blue, you should say "blue." In the Stroop effect, the words themselves are color names. So you might see the word 'green' printed in blue.
Of course, you are supposed to be ignoring what the word says, because that is irrelevant to the task. In practice, you cannot ignore the word. There is a some chance that you will say "green" when you see the word 'green' written in blue. Most of the time you will correctly say "blue", but it will take you longer because the word is green.
The Stroop effect exists because perception occurs automatically, even if you don't consciously want to perceive something. For example, when you see the word "green", your brain perceives "green", even though your consciousness knows it will be misleading for you to read the word. Put in terms of our model of perception, it means that the Inferential System takes in information and produces conclusions automatically. Conscious intention is not needed to start this process, and conscious intention cannot easily stop this process.
You didn't really need the Stroop effect to come to that conclusion. You do not need to be consciously looking for money to see a dollar lying in the street, and you don't need to be intentionally listening to someone in order to hear that they are talking to you.
When people try to argue for the existence of the Inferential System (or something like the Inferential System, because of course they don't use that term), they usually point out automatic perception. For example, Bankei, a 17th century Japanese Zen Buddhist, said, "As you're all turned this way listening to me talk, if out back there's the cawing of crows, the chirping of sparrows or the rustling of the wind, even though you're not deliberately trying to hear each of these sounds, you recognize and distinguish each one. Nobody here can claim he heard these sounds because he'd made up his mind beforehand to listen for them when they were made." (Haskel, 1984, pp. 4-5)
Just Conclusions and the Word Superiority Effect
Suppose one group of subjects is given the task of distinguishing whether B or C is presented on a computer screen. The other subjects have to of distinguishing whether BALL or CALL is presented on a computer screen. Distinguishing BALL and CALL requires distinguishing B and C, so the two tasks – logically – are the same. Nonetheless, subjects can distinguish BALL and CALL slightly faster than they can distinguish B and C. In other words, people are better at perceiving words than they are at perceiving letters. This is called the
word superiority effect.
Why? There is the boring possibility that the extra letters help locate the B and C on the computer screen. But this isn't the explanation, because distinguishing BRLL and CRLL is no quicker than distinguishing B and C. Instead, BALL and CALL are distinguished more quickly ONLY because they are meaningful words.
This finding is counterintuitive, which is to say, most people find it surprising. That's because it contradicts our culture's theory of how the brain works. Someplace in perception, presumably, the subjects have to first distinguish B and C before being able to distinguish BALL and CALL. Because the letters are distinguished first, subjects should be able to report them first, or at least as quickly. Or so you might think.
But that logic assumes that any part of perceptual processing can become conscious, which isn't true. The processing in the Inferential System is not conscious; only the conclusion of that processing is conscious. So, yes, in the Inferential System, C and B presumably have to be distinguished before the meanings can be distinguished. But that doesn't help consciousness.
Apparently, when it comes to producing conclusions, the Inferential System is better at producing meaningful words than individual letters. Perhaps it has had more practice, or perhaps meaning is more just more important. In any case, conclusions about meaningful words, such as BALL and CALL, occur more quickly than conclusions about letters.
The ultimate experiment on this was done by Purcell and Stewart. (In my opinion, this is one of the best experiments in psychology.) A drawing of a face was presented on a computer screen. Sometimes the face was normal, and sometimes it was scrambled -- the parts of the face were in random locations. Just as people can discriminate BALL and CALL faster than B and C, people can discriminate normal faces faster than scrambled faces -- this is the object superiority effect, whereby meaningful objects can be perceived faster than parts of meaningful objects.
The subjects in Purcell and Stewart’s experiment did not have to distinguish faces. All they had to do was see the face. To be more precise, they had to say whether the face was on the right side or the left side of the computer screen. But answering that question requires only seeing the face. The difficulty was that the face was presented quickly enough that it was difficult to see.
Purcell and Stewart reasoned that if this “nonsense” about consciousness and unconsciousness were really true -- meaningful objects became conscious faster than meaningless objects -- then their subjects would be more likely to see normal faces than scrambled faces. To their surprise, Purcell and Stewart discovered that this nonsense was true: Given equal duration of presentations, subjects were more likely to consciously experience a normal face than a scrambled face.
Anthony Marcel was the first person I know of to argue for an alien unconscious containing information consciousness cannot understand. But in his model of perception, the results of every stage of processing are available to consciousness. So after letters were identified, they could become directly available to consciousness. His model incorrectly predicts the faster perception of letters than words, so his model is not correct. The Inferential System completes its processing, and only then does it present its conclusion to consciousness. The "preliminary" results, such as letters or parts of faces, become conscious if and only if they are a part of the final conclusion.
In a funny way, this makes sense of the difficulty of drawing. If you could consciously experience a purely visual picture, untainted by any depth, meaning, or organization, then to draw what is in front of you, all you would have to do is replicate that picture on the piece of paper. In fact, your supposed visual conscious experience that is imbued with depth, meaning, and organization. It takes skill to actually perceive the purely visual aspects of a picture. For example, once I needed to draw an ellipse. I had seen ellipses and I thought I could visualize an elllipse, but when I tried to draw one on paper, it didn't look right. (That means my brain knew what an ellipse looked like well enough to tell me that I wasn't drawing an ellipse.) Then I had an insight -- if I took a coin and viewed it an angle, the outline of the coin would be an ellipse. So I just tried to copy on paper what I was "seeing" when I looked at a coin at an angle. That didn't help, it still didn't look like an ellipse.
Similarly, when you listen to language, you don't hear sounds, you hear parts of our alphabet, which are called phonemes. So you hear things like the 't' sound and the 'd' sound. It is very difficult to hear the underlying sound. That makes it very difficult to learn a foreign language that sorts sounds differently from ours. For example, in Hindi, there is a 't' phoneme, a 'd' phoneme (usually spelled dh), and a third phoneme right in between these two. You have to practice just to hear the difference between the phonemes -- your Inferential System naturally just splits things into the English 't' and the English 'd'. You think you are hearing the real sound, but you are not.
Similarly, if you listen to someone speaking a language you do not know, it will sound like they are talking fast and that there is no space between the words. That is correct; in fact, there is rarely any real space between words when people speak. However, it is the same for English. When you listen to English, you "hear" a space between words, but that is just your brain sorting the sound into words. Someone who did not know English would hear you as talking fast with no space between words.
Combining Sensory Systems
Suppose a subject is listening, via earphones, to a repetition of the sound “ba.” Meanwhile, the subject is looking at a video of someone repeatedly saying “ga”, in synchrony with the sound “ba.” The subject will hear the sound “da.” This is called the McGurk effect. "Da" is in between "ga" and "ba" in terms of where the constriction of your vocal tract occurs, and hence is in between the two in sound. So the subjects are "simply" averaging the two sounds.
Why does visual input influence what subjects hear? "Ba" is the only real sound going into the auditory system, so you would expect subjects to just hear "ba".
But you do not become conscious of that auditory system, at least not directly. Instead, it feeds its information into your Inferential System. Meanwhile, the visual system is also feeding information into your Inferential System. In the case of reading lips, that information is converted to auditory information. You can experience this if you turn off the sound on your TV and just look at the people talking. Occasionally, you will hear a word or phrase. It won’t sound like actually speech; it might sound like when you talk to yourself. The point remains that you will getting an auditory experience from just a visual picture.
These two different pieces of auditory perception are then combined together in the Inferential System. The Inferential System can't produced two auditory images. So it assumes that there is one sound, averages the two inputs, and concludes "da".
The usual perceptual experience is of course that we DO know what sensory modality is producing our perceptions. For example, you will look at a cat, perceive a cat, think that your percpeption of the cat is based on visual perception, and be correct. If the cat meows, you know you are also using auditory information.
It is rare that we do not know what sensory modality is responsible for our conscious experience, but that situation can occur. Once my wife was going to say something, then she changed her mind and said nothing. To encourage her to say what she wanted to say, I said “What?” Then she told me what she was going to say. I had done this several times before -- known that she was going to speak but didn't. So this was not unusual. However, in the past, I had used visual cues. Or at least I thought I was using visual cues -- I would see her mouth open, but then she wouldn't say anything. This time, as I realized soon after, my wife was in one room and I was in another, and I couldn't see her. That puzzled me for a while -- how did I know she was going to speak if she didn't speak and I couldn't see her?
I finally realized that she might have made sounds. For example, she might have made a sound in drawing a breath. Or she might have made a sound in closing her throat while air was being exhaled. The point is, I did not know what sensory system had provided the information needed to conclude that she was going to speak but did not, even though I was quite confident in my conclusions. Actually, I assumed I was using visual information until I realized I didn't have any visual information.
More commonly when we do not know what sensations are producing a conclusion from the Inferential System, we are likely to call it an intuition. In my grilled-cheese example, the visual image of a cooked sandwich could have been produced by the visual input of bubbles or olfactory input. At the time, I had no idea that any sensory system could provide cues.
Another time I had an intuition that my soup was boiling. Because I thought my soup had not been cooking long enough to boil, I ignored my intuition and returned to my work. I was again interrupted by the thought that my soup was boiling. I thought about if there could be any cues that my soup was boiling. That brought my attention to smell, and yes, I could smell my soup. And yes, when I finally checked, it was indeed boiling. If I had known that my first thought was produced from olfactory information, I would have examined smell immediately.
Conclusion
The Inferential System has its mysteries, but we can also know a lot about it. This chapter discussed how it works automatically (does not need prompting or direction from conscious) and how it combines evidence from all of the sensory modalities. There was also ample evidence for the idea that we are conscious of just the conclusions from the Inferential System and cannot introspect into its processing.