Как гугль обрабатывает документы
полезная для многих статья
http://www.internetnews.com/xSP/article.php/3487041
особенно обратите внимание на этот момент :
The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words
И как следствие это :
To do this, the system tries to cluster concepts into "reasonably coherent" subclusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. This kind of machine learning has had little success in academic trials, Hoelzle said, because they didn't have enough data. "If you have enough data, you get reasonably good answers out of it."
Ну и как ?

многое обьясняет ?
Вот так-вот ! 
MORE...