SpeechTEK 2014 Tutorial: Develop Voice Applications for Android

Posted on


STKU-3 – Develop Voice Applications for Android
9:00 a.m – 12:00 p.m
Michael McTear, Professor – University of Ulster
Learn how to develop voice-based apps for Android devices using open source software. Based on a series of step-by- step examples, you learn how to create apps that talk and recognize your speech. Looking first at the basic technologies of text-to-speech and speech recognition, this workshop shows you how to create more complex apps that can perform useful tasks. By the end of the workshop, you know how to develop a simple voice-based personal assistant that you can customize to suit your own needs. 
This workshop covers
1) an introduction to speech on Android devices and components of a voice-based personal assistant;
2) text-to-speech synthesis using Google TTS;
3) getting and processing speech recognition results;
4) creating and using libraries for TTS and speech recognition;
5) developing voice-based apps such as a simple conversation assistant;
6) future trends.
Attendees receive a copy of Voice Application Development for Android by Michael F. McTear and Zoraida Callejas.

Virtual Agents At SpeechTEK 2014

Posted on


Virtual Agents will be one of the “hot topics” this year at SpeechTEK. Here are the sessions, including one by the authors of our book:

Mobile Voice Conference Summary

Posted on Updated on

Mobile Voice Conference pic

Mobile Voice 2014 was a great success. There were many interesting papers and in particular there was a lot of focus on personal assistants.

Here is a link to a summary by the conference program organizer, Bill Meisel.

Mobile Voice Conference

Posted on Updated on

Mobile Voice Conference pic

The fourth annual Mobile Voice Conference takes place from 3 – 5 March, 2014 at the Hyatt Fisherman’s Wharf hotel, San Francisco.

The Mobile Voice Conference examines the practical, business, and technical implications of voice and multimodal interfaces for mobile phones and other mobile devices.

The conference is co-organized by Applied Voice Input Output Society (AVIOS) and Bill Meisel, President, TMA Associates, an industry consulting firm, and editor of SpeechStrategy News, the industry’s professional newsletter.

Michael McTear will be attending the conference to present a paper by Dr Richard Wallace of Pandorabots entitled “AIML 2.0 – Virtual assistant technology for a mobile era”.  AIML and the Pandorabots web service was featured in chapter 8 of our book: Dialogs with Virtual Personal Assistants.

Visit the Mobile Voice Conference web page to view the program and to find out about recent trends in mobile voice applications.

To the 70s and back: a journey through 40 years of speech recognition

Posted on

If you are interested in a guided tour over the last 40 years of speech recognition, Reddy, Baker and Huang are willing to show you the advances in the topic from their rich perspective and dilated experience.

Raj Reddy is winner of the Turing Award and founding Director of the Robotics Institute at Carnegie Mellon University, James Baker is founder of Dragon Systems, and Xuedong Huang is founder of the Speech Group at Microsoft.

Learn more

To learn all the details you can read their survey in the last number of the Communications of the ACM: http://cacm.acm.org/magazines/2014/1/170863-a-historical-perspective-of-speech-recognition.

N-Gram language models

Posted on Updated on

Language models (LM) define the most probable word sequences in your app. Ideally, a good LM should assign high probability to correct phrases and low probability to incorrect ones.

This way, if the acoustic model of your speech recognizer assigns similar probabilities to two phrases that sound pretty much the same, for example:

Speech technology rules

Speech enology rules

the LM can help to select one or the other.

N-grams basically compute the probability of each word depending on a sequence of N previous words. That is, in a 2-gram model, it would compute:

P(speech technology rules) = P(speech) * P(technology | speech) * P(rules | speech technology)

P(speech enology rules) = P(speech) * P(enology | speech) * P(rules | speech enology)

In this case P(speech technology rules) > P(speech enology rules) as if we have a good LM, P(technology | speech) will be higher than P(enology | speech).

How is it posible to calculate all these probabilities? Using a big amount of phrases, a linguistic corpus.

Building your own LM

In the book we use the speech recognizer provided by Google. Google uses an enormous amount of phrases (imagine all the information they have just from web searches). To build your own LM you would need also a big amount of data. Fortunatelly, you can obtain a lot of free text from ebooks and newspapers that are available on the Internet.

Lately there has also appeared an interesting initiative: the 1 billion word language modeling benchmark, which is available to you! Check it out here: https://code.google.com/p/1-billion-word-language-modeling-benchmark/

Integrating your LM in a speech recognizer

To build a speech recognizer for your apps that uses your brand new LM, check out Pocket Sphinx, a great open source tool for developers.

Find out more

If you want to learn more on the statistical foundations of language modelling, you will find this book very interesting:

Book: Foundations of statistical natural language processing
Press in the cover to go to web page

The book is out!

Posted on

Are you interested in adding speech to your Android apps? Do you wish to explore new ways of interacting with your Android devices?

You are on a roll! Our book is already out to show you how to use different speech and language processing technologies to build impressive apps. You can get it now from:

And if you can’t wait to start playing, you can already access the code here: https://github.com/zoraidacallejas/Sandra

Have fun!