A problem with the standard importance function? Trading off query terms against one another

Working notes ahead! This post is different to my last two posts. Those posts were broad reviews of topics of general interest (at least if you’re interested in data-driven intelligence) – the Pregel graph framework, and the vector space model of documents. This post is not a review or distillation of a topic in the… Continue reading A problem with the standard importance function? Trading off query terms against one another

Documents as geometric objects: how to rank documents for full-text search

When we type a query into a search engine – say “Einstein on relativity” – how does the search engine decide which documents to return? When the document is on the web, part of the answer to that question is provided by the PageRank algorithm, which analyses the link structure of the web to determine… Continue reading Documents as geometric objects: how to rank documents for full-text search