I've done extensive work with caching in web applications and basically, I think the solution is in the intersection of the following trivial facts:
- RAM is cheap, especially if you buy in bulk. 1U servers can carry 192 GB today. [*]
- Distributed algorithms like Map-Reduce are fairly well understood. Data partitioning kicks ass for these things.
- There is actually a finite number of words with the "hardest hits" - those which are entered first. By "finite" I'm thinking of things like "the number of words in various languages" (for example: 500,000 for the English language according to Wikipedia). First-page results for all of these can be cached verbatim, even taking into question slight per-user customizations (at least binning user into large groups). The other cases (i.e. phrases, uncommon words) can be handled normally.
- Broadband is really common with the group of people who will be impressed the most by this feature.
- Bandwidth can be massivly better utilized by using HTTP compression - and storing the compressed results in the cache rather than compressing on the fly (actually it would be faster to decompress on the fly for those few clients which don't support compression).
- To enhance all of the above, distributed data centers can be used (e.g. per-continent or per-country, which Google already uses).
- It is really necessary to only be as good as human perception is, not in absolute terms. The "fades" in page loading offer precious milliseconds for the servers to respond in!
All this doesn't make it any less of an impressive achievement. Salute!
[*] Offtopic: did you know that the newest-generation Atom CPUs have the memory bandwidth comparable to last-generation Core 2 CPUs? For some perverse reason I'm fond of the idea of massively parallel applications running on a large number of power-saving servers...