Wham - A tale of speech recognition.


"Wham!", said the man as he entered the hotel lift. Wham? I paused a millisecond or to ponder his meaning. Wham? There was a popular beat combo of that name, could this be their number one fan? No, the man standing in front of me was too-old-to-disco, too-young-to-cha-cha-cha. Not the right age group at all. My neural datapack tried another route. I was in Spain at the time, perhaps he was a foreigner. I have worked on several translation projects for operating systems and applications over the years and know a little of 10 languages. A quick scan through my vocabulary produced no match. My brain was now hurting, trying desperately to make sense of a senseless situation. I took another look at the man. He was pale skinned with brown hair, certainly not from a Latin country, but not blond enough to be Nordic. He was wearing a singlet vest, a pair of shorts that finished at the knee and a pair of sandals on his feet. A-Ha! This confirmed it, he was British. It was all coming together now. He was from the north (I really can’t say where) and he was referring to the weather. He said ‘Warm’ but I heard ‘Wham’. All this analysis happened in the time it takes to blink an eye. The human brain with its 40 million year development cycle (and still in beta) had taken my 40-odd years of life experience; what I could see; what I could hear and where I was all into account and resolved a meaning that my ears alone could not have made.

Speech recognition systems have been around for a few years but there has been a flury of activity recently as the technology improves sufficiently to make the dream a reality. Speech recognition is only the first step on the road to computer understanding, however. Working out the words themselves is only the first small step. Working out what the person using the words means is the task that must be successfully be completed before true intelligent machines can evolve.

Take my friend in the Benidorm lift, he said ‘warm’ but what he meant was ‘its bloody hot’, typical British understatement. The fiery Latin types on the other hand can make the smallest annoyance into a drama. If we are to have intelligent machines listening to us and taking action upon what they hear then computer understanding is critical. If the Enterprise computers hears someone say "I cannee give her any more cap'n, shes gonna blow" it needs to know just who is saying it and know something of that persons temperament before it can work out if A): the photon tubes are just running a bit hot, or B): they are in total meltdown.

Readers interested in speech recognition should read the following InfoHighway news briefs.

4Home Productions, a subsidiary of Computer Associates announced a deal with IBM aimed at adding speech recognition to consumer software developed with 4Home's Meta4 technology (Story 6724).

Philips has teamed up with Technology for Business (TFB) to offer the industry's first "continuous stream" voice dictation system. Continuous-stream speech-recognition systems allow the user to talk normally without leaving a small silence--between--each--word. TFB have used a software toolkit developed by Philips to produce a dictation system for solicitors. (Story 7736).

More recently Compaq has announced it will work with PureSpeech Inc. to develop speaker-independent, continuous speech recognition for personal computers. Developing speaker-independent systems will allow users to use the technology without first having to train the computer to recognise their voice. This technology is not yet available and Eckhard Pfeiffer, Compaq president and CEO said that the technology will be implemented in Compaq systems "over the next few years." (Story 8247).

As with any new technology litigation has reared its ugly head in the speech recognition arena as well, and details are in stories 6554 and 8094.

Ken Clark - 10th July 1995