1) an introduction to speech on Android devices and components of a voice-based personal assistant;
2) text-to-speech synthesis using Google TTS;
3) getting and processing speech recognition results;
4) creating and using libraries for TTS and speech recognition;
5) developing voice-based apps such as a simple conversation assistant;
6) future trends.
Virtual Agents will be one of the “hot topics” this year at SpeechTEK. Here are the sessions, including one by the authors of our book:
The fourth annual Mobile Voice Conference takes place from 3 – 5 March, 2014 at the Hyatt Fisherman’s Wharf hotel, San Francisco.
The Mobile Voice Conference examines the practical, business, and technical implications of voice and multimodal interfaces for mobile phones and other mobile devices.
The conference is co-organized by Applied Voice Input Output Society (AVIOS) and Bill Meisel, President, TMA Associates, an industry consulting firm, and editor of SpeechStrategy News, the industry’s professional newsletter.
Michael McTear will be attending the conference to present a paper by Dr Richard Wallace of Pandorabots entitled “AIML 2.0 – Virtual assistant technology for a mobile era”. AIML and the Pandorabots web service was featured in chapter 8 of our book: Dialogs with Virtual Personal Assistants.
Visit the Mobile Voice Conference web page to view the program and to find out about recent trends in mobile voice applications.
If you are interested in a guided tour over the last 40 years of speech recognition, Reddy, Baker and Huang are willing to show you the advances in the topic from their rich perspective and dilated experience.
Raj Reddy is winner of the Turing Award and founding Director of the Robotics Institute at Carnegie Mellon University, James Baker is founder of Dragon Systems, and Xuedong Huang is founder of the Speech Group at Microsoft.
To learn all the details you can read their survey in the last number of the Communications of the ACM: http://cacm.acm.org/magazines/2014/1/170863-a-historical-perspective-of-speech-recognition.
Language models (LM) define the most probable word sequences in your app. Ideally, a good LM should assign high probability to correct phrases and low probability to incorrect ones.
This way, if the acoustic model of your speech recognizer assigns similar probabilities to two phrases that sound pretty much the same, for example:
Speech technology rules
Speech enology rules
the LM can help to select one or the other.
N-grams basically compute the probability of each word depending on a sequence of N previous words. That is, in a 2-gram model, it would compute:
P(speech technology rules) = P(speech) * P(technology | speech) * P(rules | speech technology)
P(speech enology rules) = P(speech) * P(enology | speech) * P(rules | speech enology)
In this case P(speech technology rules) > P(speech enology rules) as if we have a good LM, P(technology | speech) will be higher than P(enology | speech).
How is it posible to calculate all these probabilities? Using a big amount of phrases, a linguistic corpus.
Building your own LM
In the book we use the speech recognizer provided by Google. Google uses an enormous amount of phrases (imagine all the information they have just from web searches). To build your own LM you would need also a big amount of data. Fortunatelly, you can obtain a lot of free text from ebooks and newspapers that are available on the Internet.
Lately there has also appeared an interesting initiative: the 1 billion word language modeling benchmark, which is available to you! Check it out here: https://code.google.com/p/1-billion-word-language-modeling-benchmark/
Integrating your LM in a speech recognizer
To build a speech recognizer for your apps that uses your brand new LM, check out Pocket Sphinx, a great open source tool for developers.
Find out more
If you want to learn more on the statistical foundations of language modelling, you will find this book very interesting:
Are you interested in adding speech to your Android apps? Do you wish to explore new ways of interacting with your Android devices?
You are on a roll! Our book is already out to show you how to use different speech and language processing technologies to build impressive apps. You can get it now from:
And if you can’t wait to start playing, you can already access the code here: https://github.com/zoraidacallejas/Sandra