Simple Voice Interactions
Available in GitHub
A closer look at similarity
Try out the ComparisonTest project to check which values are obtained for orthograpic and phonetic similarity with all the sample words you can imagine.
Keep your distance
There are several alternatives to the distance measures employed in the chapter. For example, Euclidean distance, Jaccard index, Hamming distance, Sorensen similarity index or Metaphone. Some of them are also implemented in Apache Commons. Investigate!
A better method for normalization
The normalization method presented in this chapter only removes spaces and changes to lowercase. A more sophisticated normalization could be carried out to cope with situations in which the user says just one word of a two word name (e.g. kindle instead of kindle reader). You can use some of the methods of the Java String class, such as contains in combination with the similarity criteria.
The importance of being Levenshtein
In VoiceLaunch we presented an interface in which the user selects the similarity criterion that is to be used to compute the appropriate app to be launched. However, in a real setting it would be better to let the app automatically choose how to use the similarity criteria. For example, it may use all the distance measures and assign them different weights or importance in order to select the most similar word.
Saying yes or no
A more advanced confirmation dialogue would allow the user to change parts of their query rather than simply saying ‘yes’ or ‘no’. The example in the chapter could result in a round of confirmation dialogues in which the app continually misrecognizes the user’s query and no progress is made. A more satisfactory approach would be to correct the part of the recognized input that is incorrect rather than asking the user to simply repeat. For example :
App: What is your query? User: What is the capital of France? App: Did you say: what is the capital of France? User: No, not France, Sweden App: You want to know: what is the capital of Sweden? User: Yes App: (launches query: What is the capital of Sweden)
W. Meisel (ed.) VUI Visions. TMA Associates, 2006.
W. Meisel (ed.) Speech in the User Interface: Lessons from Experience. TMA Associates, 2010.
Greg Milette and Adam Stroud. Professional Android™ Sensor Programming, Wrox, 2012. Chapter 18.
R. Pieraccini. The Voice in the Machine, MIT Press, 2012.