Language models (LM) define the most probable word sequences in your app. Ideally, a good LM should assign high probability to correct phrases and low probability to incorrect ones.
This way, if the acoustic model of your speech recognizer assigns similar probabilities to two phrases that sound pretty much the same, for example:
Speech technology rules
Speech enology rules
the LM can help to select one or the other.
N-grams basically compute the probability of each word depending on a sequence of N previous words. That is, in a 2-gram model, it would compute:
P(speech technology rules) = P(speech) * P(technology | speech) * P(rules | speech technology)
P(speech enology rules) = P(speech) * P(enology | speech) * P(rules | speech enology)
In this case P(speech technology rules) > P(speech enology rules) as if we have a good LM, P(technology | speech) will be higher than P(enology | speech).
How is it posible to calculate all these probabilities? Using a big amount of phrases, a linguistic corpus.
Building your own LM
In the book we use the speech recognizer provided by Google. Google uses an enormous amount of phrases (imagine all the information they have just from web searches). To build your own LM you would need also a big amount of data. Fortunatelly, you can obtain a lot of free text from ebooks and newspapers that are available on the Internet.
Lately there has also appeared an interesting initiative: the 1 billion word language modeling benchmark, which is available to you! Check it out here: https://code.google.com/p/1-billion-word-language-modeling-benchmark/
Integrating your LM in a speech recognizer
To build a speech recognizer for your apps that uses your brand new LM, check out Pocket Sphinx, a great open source tool for developers.
Find out more
If you want to learn more on the statistical foundations of language modelling, you will find this book very interesting: