Denny (Xiao Wang)_11200007
Report of Automatic Speech Recognition Tchonolgy
—-the important tchonolgy regarding sound area
Automatic Speech Recognition (ASR) is an interdisciplinary technology; it crosses the acoustics, information technology (IT) and other fields. In last two decades, ASR has made significant progress, starting from the laboratory to the market. Scientists expect that in the next 10 years, ASR will enter the industrial, household appliances, interactive media, communications, automotive electronics, medical, family services, consumer electronics and other fields. Many experts believe that ACR is the top ten key information technology development technologies between 2000-2010.
ASR Related Theory
ASR has three basic principles: firstly, the language of the speech signal is based on the changes in patterns of time of amplitude spectrum information in short-term to encode. Second, speech can be read, it means the acoustic signals of speech can be represeneted by distinction of dozens of discrete symbols. Thrid, voice interaction is a cognitive process, therefore, it can not be separated with language syntax, semantic and pragmatic structure.The basic operation of the ASR process as shown (Grabianowski,2011):
Before the invention of computer technology, the idea of incenting automatic speech recognition had been put on the agenda; the earliest version of “VODER” can be regarded as the prototype of ASR (Dudley,1939). The 1920’s production of “Radio Rex” toy dog may be the first voice recognition device: when the dog was called, it would pop up from the base. The earliest computer-based speech recognition system was composed with AT&T Bell Labs-developed Audrey
speech recognition system that can recognize 10 numbers pronounced in English. The speech recognition method in this case was to track the resonance peak. The system has ninety-eight percent accuracy. To the late 1950s, Denes from College of London had added the grammar probability in speech recognition. Then in the 1960s, artificial neural network was introduced to speech recognition. The two major breakthroughs of this era was Linear Predictive Coding (LPC), and Dynamic Time Warp thconolgy. The most significant breakthrough of ASR was application of implied Hidden Markov
Model(Ferguson,1980). Baum was made base on the relevant mathematical reasoning—by the Labiner study—Kai-fu Lee, from Carnegie Mellon University, finally created the first hidden Markov model based on large vocabulary speech recognition system—Sphinx(Lee,1989).
development of ASR can be seen in the graph above. After nearly five decades of research, speech recognition technologies had finally entered the marketplace, benefiting the users in a variety of ways. Throughout the course of development of such systems, knowledge of speech production and perception was used in establishing the technological foundation for the resulting speech recognizers. All of these technologies came out because of the significant research contributions from academia, private industry and the government. As the technology continues to be matured, it is clear that many new applications will emerge and becoming part of our way of life—thereby taking full advantage of machines that are partially able to mimic.
(Juang and Rabiner, 2004)
ASR can be divided into two development directions: one direction is large vocabulary continuous speech recognition systems; dictation machine is mainly used in computers and the internet with the telephone network or a combination of voice information service system; these systems are implemented on the computer platform. Another important direction of development is small, portable audio products for applications such as wireless phone dialing, voice control of automotive equipment, smart toys, remote-controlled home appliances and other applications. Most of these applications use specialized hardware to achieve, especially in recent years the rapid development of speech signal processing Application Specific Integrated Circuit (ASIC) and voice recognition System on Chip (SOC) the emergence of its wide applications created extremely favorable conditions.
Speech recognition technology for the development of modern society has an important influence and significance. It makes people’s lives become even more simple and quick. The trend in human development is to make life even more convenient, voice recognition technology just to meet the large demand of human development. The principle of voice technology may not understand ordinary people, but its application is indeed accompanied in each side. Such as voice recording systems, interactive media, voice control.
The most basic application of speech recognition system is voice-entry system. People pronounce the sound to a computer or a medium, and it work as typing and entering into the system, voice signal through the identification, compilation, and ultimately form a digital signal, into the form of text displayed; it can be utilized on devices such as computers and mobile phones. In the past, huge load of data entry needed a lot time and labour cost—to type in non-stop on the keyboard; now, with the technology of voice input, the data entry efficiency can be increased by 10-12 times. Moreover, with the technology, speech recognition accuracy rate would have increased gradually. In the near future, voice input can basically replace the traditional mouse and keyboard input method.
Traditional media such as television, newspapers, radio and the media were merely presenting the message to the viewers or listeners; there is lack of interaction between the media and the audience. As the social development growing rapidly nowadays, the technology of interactive media was invented to serve the demand that—people want to participate and response to the information provided by the publishers. In this case, voice recognition system provides the solution to this demand—with the interactive features, such as web 2.0 sites, the audience can interact with the information providers on their own computer—for example, filling up questionnaire; some of these site even have got the voice controlling function.
Science fiction 10 years ago, put such a scenario: a modern house, there is no power switch; the owner said some instructions, and the furniture and electrical appliances would automatically make the appropriate action. By that time, we believed that these cannot be happened; but these fictional scenarios are becoming a reality, into our daily life. This is a great convenience to individual. In a number of years, each family would have had an electronic butler, people can talk with it, telling people’s needs; electronic housekeeper can make the corresponding move once we use our voice to command it; the technology could be available to the market in no time.
Speech recognition system can also be used for entertainment purposes. Recently, there is a very popular iPhone game, which is applying the concept and using voice control in playing it. It is popular in the APP store, because of the ease for controlling the game through the built-in voice control function, like
“Ahhh” to control the aircraft taking off, and “Pah” to control the firing shells; it is indeed very interesting.
The Future of ASR
It can be predicted that in the next 10 years, the application of speech recognition system will be applied in our daily life more extensively. A variety of voice
recognition systems will appear in the market. Not surprisingly, I would assume that it would also adjust the way in recognition technology to suit a variety
of identification systems. However, in the short term, I do not argue that the speech recognition technology would be seamlessly as comparable as a real voice, because of the building of such a system is still a huge challenge to the scientist; but I do assume that there will be a gradual improvements in the accuracy, accent, tone, intonation and other characteristics of a higher recognition rate.
Speech recognition technology in nowadays, especially in small and medium vocabulary speaker-independent speech recognition system has more than 98% recognition accuracy, speech recognition systems for specific people to higher recognition accuracy. These technologies have been able to meet the application requirements generally. As large-scale integrated circuit technology, these complex speech recognition systems has been completely dedicated chip can be made, mass production. Some mobile phones and electrical appliances also come with the built-in speech recognition function, such as voice notepad, voice smart toys and other products may integrate voice recognition and voice synthesis in the near future. People can call the speech recognition spoken dialogue system with inquiries about tickets, travel, banking information, and get good results. Speech recognition is an interdisciplinary—speech recognition is gradually becoming a key technology in the man-machine interface; it also enable the possibility for people to get rid of the keyboard—to operate machine through voice command. Speech technology has become a new competitive high-tech which would largely impact on our daily life.
1.Dudley, H. 1939. The Vocoder, Bell Labs Record, Vol. 17, pp. 122-126.
2.Ferguson, J.D. 1980. Hidden Markov Analysis: An Introduction, in Hidden Markov Models for Speech, Institute for Defense Analyses, Princeton, NJ.
3.Grabianowski, Ed. 2011. How Speech Recognition Works, How stuff works.
4.Juang, B.H. and Rabiner, Lawrence R. 2004. Automatic Speech Recognition – A Brief History of the Technology Development, Rutgers University and the University of California, Santa Barbara
5.Lee, Kai-Fu. 1989. Automatic speech recognition: the development of the SPHINX system, Springer.
6.Web interactive audio / video agent with voice recognition
7.Pah! For iPhone – Voice Controlled Action Shooter
8.Smart-Home Voice Recognition and Control with Virtual Human