HAL's Legacy
 
HOME | THE DAWN OF HAL | TRIUMPH OF THE MACHINE | COMPUTER SPEECH AND VISION
COMMON SENSE | EMBODIED INTELLIGENCE | EMOTIONAL INTELLIGENCE | EVOLUTION OF INTELLIGENCE
spacer
Raymond Kurzweil Interview
Interviewer, Dr. David G. Stork
  Main Page > Part 1 > Part 2
spacer Stork: Tell us what was it like the first time you saw 2001: A Space Odyssey?
Kurzweil: It was quite gripping because it was the first movie I’d seen that presented an intelligent machine. While it’s not my conception of the first intelligent machine, it was quite a compelling vision. It sparked imagination in what was feasible.

Stork: HAL could recognize people’s voices, tell us about some of the problems in making machines understand speech.
Kurzweil: There’s a great deal of ambiguity in language. That’ s something that Alan Turing recognized. Until machines adopt human levels intelligence, speech recognition is not going to operate at quite at human levels. But there’s a lot we can do with current technology. As computers have become more powerful at an exponential rate, we’re actually making linear progress. And it is good enough now for millions of people to use.

Stork: So tell us about some of the actual steps and the problems you have to confront when you make those steps?
Kurzweil: Speech is a communication medium with many levels. We have to be concerned with the production process of speech, the resonant cavities of the vocal tract, what the tongue and the lips are capable of doing and the sounds that they make. Then there’s the phonetic level where we put these sounds together in the basic letters of speech, the basic phonemes. T or P sounds or different vowels. Then there’s dialect where different regions of even the same nation that may speak the same language will put these phonemes together in different sequences. And finally we have words that are strung together to create meaning, and then we create many levels of higher ambiguity. Understanding context really requires human knowledge.

Stork: So how is learning important to this process?
Kurzweil: Learning is really key to any pattern recognition problem. My field is pattern recognition. The human brain is devoted primarily to pattern recognition. We do that by learning from real world situations. And we’ve learned that when we’ve built computer systems that recognize patterns, whether its printed letters or speech sounds, leaning from the actual real world examples so that is quite critical. All the commercial research systems are constantly learning at many different levels from the environment. They will constantly be adapting to changes in background noise and sound levels and voice quality. They will actually learn how a specific speaker pronounces different words. They will learn common mistakes that it might make on my speech so as I correct it will learn from those errors. It will learn how I as a user string words together and what kinds of word sequences I’m likely to use. It learned how to resolve ambiguities. If it makes a mistake and I correct it, it will be able to learn from that in he future. A typical speech recognition system will have a file of user specific information that is constantly being adapted as the user uses the system. That’s critical to improving quality and accuracy. We also need to learn in general about the whole group of users. So we’ll model different dialects. We will build up large databases of speech, language patterns in each dialect and so on.

Stork: Take us back to the state of the art of speech recognition in 1968. What was it like back then?
Kurzweil: Speech recognizers were capable recognizing speech provided you… spoke…like…this…., had very small vocabularies, and could recognize maybe 100 words. They had to be trained on every single word. They weren’t really developing invariance properties of the word; they were matching sequences of sound that represented each word. The computers weren’t powerful at that time. Not continuous speech. In the 1970’s we had the big Federal program funded by the Defense Department that began to do some pioneering research in large vocabulary systems, even systems that could recognize continuous speech. We allowed those systems to operate in non real time so someone would speak, it would be recorded, and then the computer could operate 100 or 1000 times slower than real time on the assumption that computers would get faster, which was a good assumption. A lot of pioneering techniques were developed then that we still use today.

Stork: Kubrick and Clark envisioned flawless speech recognition in the year 2001. How far are we from that state?
Kurzweil: One of the plot concepts of the movie is that machine intelligence would be flawless. Which we recognize now is a flawed assumption. We won’t really reach human levels of speech recognition until machines are operating at a human level of intelligence. Ultimately language does embody our intelligence. We can reflect our entire intelligence in spoken language and written language. That was a concept that the film did actually deal with. We won’t reach those levels in my opinion for at least 30 years. I think we will reach them. I think Arthur Clarke was presenting a vision that wasn’t tied to a specific date. He was really presenting a vision of what things might be like and what some of the moral and ethical issues would be when this event occurred. In my mind we’re thirty years away from machines achieving human levels of intelligence. At which point machines will match and perhaps exceed human speech recognition capabilities, but still won’t be perfect.

Stork: Can you describe the Turing test?
Kurzweil: Alan Turing was describing a test where human beings could assess the potential intelligence level of a machine. A human judge interviews a computer and a human being with text messages, and he can’t see either of them. If the judge was unable to tell the difference, then the computer was deemed to have passed the test and to be operating at a human level of intelligence. In this case written language is sufficient to represent human intelligence.

Stork: You touched on some of the moral and ethical dilemmas that will occur when we have machines with human levels of intelligence. Why don’t you outline some of them?
Kurzweil: One ethical worry is whether these machines might be harmful to humans, whether they might go off on their own and create their own next generation. Right now humans, using computers as amplifiers of our own intelligence, create the next generation of machines. Humans still drive the process. But if you have computers operating at human levels, it may be actually thinking faster because our neurons are actually very slow. They’re millions of times slower than electronic circuits. If you have a brain that’s as dense as a human brain, and organized in a similar way but is operating at electronic speeds, it would be thinking far faster than a humans. If they had enough knowledge they could then design the next generation of computers that would be even more powerful.
The evolution of technology then would go on without human supervision. Will these machines be our allies, or servants, our friends? There’s concern about that. My own view is that the intelligence that we’re developing is a projection of our human civilization and even when machines are operating at a human levels, they will continue to be an expression, an amplification of human civilization. But there are dangers both from intelligent computers and robots that could conceivably be malevolent. We worry about that because we know that humans can be malevolent. Or even non-intelligent robots that are self replicating that could then self replicate and become a non biological plague of some kind.
These are dangers. Technology has always been a double-edged sword. I think in fact it is the major challenge confronting the human race in the 21st century. I tend to be optimistic. But I don’t think it is a set of dangers that we should just sweep under the rug. I think we have to deal with it through ethical standards, law enforcement and technological safeguards, and being mindful of how we apply technology. But technology is power. And even in this past century has amplified both our creative and destructive impulses and as it gets more and more powerful at an exponential rate, both creative and destructive sides will continue to amplify.

Stork: When HAL kills the crew who is to blame?
Kurzweil: Part of the responsibility lies in the basic assumption that machine intelligence would be perfect. We are discovering that as machines are attacking more intelligent problems like recognizing human speech, or diagnosing illnesses, that these are areas that don’t necessarily have perfect answers, and perfect performance is inherently impossible. Machines, as they attack more intelligent problems are going to be imperfect. We have to accept that and build moral codes into machines as they’re dealing with more sophisticated types of processes.

Stork: In the design of AI and speech recognition systems, how much should we base on research of how humans solve these problems?
Kurzweil: We are beginning to learn some things about the human brain works. That is useful information. In the mid 90s we learned about what the brain does in the early stages of processing sound information. We applied that in the front end of speech recognition software and it improved the accuracy. So as we learn more about how the brain it does give us clues. We also use what we call biologically inspired models of intelligence. We know a little bit about how neurons are connected together and we can build emulations of them. Simplified ones called neural nets. Those do intelligent things if set up correctly. It’s not an exact copy of the human brain but inspired by what little we know. As we get more powerful models of how neurons work and better interconnection information from brain scanning, our ability to create these biologically inspired models of intelligence will improve and ultimately that will guide us into developing intelligent software. That is at least one important source of information. Because we have an example of an intelligent entity in our midst which is the human brain. And there is a lot that we can, and will, and are learning form it.

Stork: But in some problems, like computer chess, the successful systems approached the problem very unlike the way humans do.
Kurzweil: Sometimes we can solve a problem very differently than human beings. The airplane flies very differently than biological animals like birds – we solved it in a very different way. Chess being a very computational game, we could use the prodigious power of computers to pick out every combination of move and counter move (or at least billions of those combinations) to play a good game of chess. That’s something that humans can’t do. We actually use our pattern recognition capabilities.
Ultimately computers can do both. They can emulate our pattern recognition capabilities, which today they don’t do nearly as well as humans. And combine it with their advantage in the speed and accuracy of their memories. So ultimately computers can combine some of their natural advantages of machine intelligence – speed, accuracy and sharing ability of memory with the more human like qualities and our ability to recognize subtle patterns.

Stork: Can you tell us about the problem of text to speech and the history of text to speech.
Kurzweil: We actually developed the first full text to speech system in 1975, and it requires a number of different levels of processing. First you have to figure out how each word is pronounced. There are some general rules but the rules make mistakes so you need an exception dictionary. You then need to do some analysis of the sentence and what the words are doing in the sentence so you can get a reasonable inflection pattern. Not at human levels of emphasis but at least something that doesn’t sound totally robotic. You then need to emulate the structure of the human vocal tract and actually model that in a computer and create the human sound waves that the human auditory system produces. Over time our models have become more and more accurate and synthetic speech has been sounding more and more natural, but not nearly at a human level.

  Main Page > Part 1 > Part 2
The Documentary | The Book | Resources | Contact