The web is a great way of outsourcing tasks to specialized parallel supercomputers. Here’s a crude order-of-magnitude estimate of the amount of computing power used in a single Google search.
The size of the Google server cluster is no longer public, but online estimates typically describe it as containing about half a million commodity machines, each comparable in power to the personal computers widely used by consumers. As a rough estimate, let’s say about 200,000 of those are involved in serving search results.
I don’t know how many searches are done using Google. But online estimates put it in the range of hundreds of millions per day. At peak times this means perhaps 10,000 searches per second.
In usability studies, Google has found that users are less happy with their search experience when it takes longer than 0.4 seconds to serve pages. So they aim to serve most of their pages in 0.4 seconds or less. In practice, this means they’ve got to process queries even faster, since network communication can easily chew up much of that 0.4 seconds. For simplicity we’ll assume that all that time is available.
What this means is that at peak times, the Google cluster’s effort is being distributed across approximately 4,000 searches.
Put another way, each time you search, you’re making use of a cluster containing the equivalent of (very) roughly 50 machines.
I’d be very interested to know what fraction of computing power is contained in such supercomputers, versus the fraction in personal computers. Even more interesting would be to see a graph of how this fraction is changing over time. My guess is that at some point in the not-too-distant future most power will be in specialized services like Google, not personal computers.
One additional question that might be important in this approximation is the fraction of novel queries. Google can cache the results of common queries (i.e. searching for “Britney Spears” is probably computational cheaper for Google than for “lateral geniculate nucleus”).
OTOH. As Google attempts to deliver personalized results, using your web history to optimize the results for you, this caching will be less important.
Google also works as a calculator. Interestingly, there are unstated limits on what it’ll calculate for you free of charge.
Try typing in “100!” and “1000!”and you’ll see what I mean.
Hi Jonathan – Yep, you’re right. On average, I guess that means that somewhat more than 50 computers are involved in uncached queries, and less than 50 in cached queries.
(It seems like 1 computer may be enough for the cached queries, but I don’t know enough about how Google’s redundant filesystem works to be sure of that.)