Google Search and Who Knew?

This is off-topic a bit, but I found it useful and others may also.

After all these years using Google, I finally read through the stuff on the Google Web Search Help Center. Duh, should have done this long ago. For example, I had no idea you could do synonym searches.

But what surprised me the most was Google’s statement “keep in mind that the order in which the terms are typed will affect the search results.” How many thousands of searches have I made and never noticed this? But it’s true. The search [vacation paris] produces 560,000 results, while the search [paris vacation] produces 913,000 results. Note: I’ve used square brackets [] for readability only. In an actual search, of course, the brackets would not be used.

I figured an AND would be commutative like an INTERSECT set operator. You get the vacation result set (call it V) and the paris result set (call it P). V intersect P would give the same results as P intersect V. But V AND P != P AND V in a search.

I didn’t see any explanation of this in Google Web Search Help, but found a pretty helpful discussion that makes sense. Search word order matters because bakers dozen means something different from dozen bakers, and factoring word order into the relevance algorithm improves search results.

Well, ok, I get the bakers dozen idea. But vacation paris and paris vacation? Why do these two searches produce such different results?

Which brings me to a couple peeves about search.

It should be possible to string search result sets together. Here’s an illustration. Suppose I wanted to find out how the search V AND P differs from P AND V. I could do the V AND P search to get a result set (call it R1). Then I could do the P AND V search to get another result set (call it R2). Having these 2 result sets, I could produce R2 MINUS R1 to find out what results were in R2 but not in R1. Also interesting would be R1 MINUS R2 and R1 INTERSECT R2. In other words, I’d like to produce search result sets that could be refined with other search result sets. This would certainly help with complex queries.

Which brings me to a second but related peeve. Right now we do searches. But to use the web as a research tool, we need to do queries. For example, here’s a very simple query that you cannot now translate into a Google search: “give me a list of all the web design courses”. The query ["web design" ~course] produces roughly 22 million results. It’s great if you’re looking for a few examples. But it’s lousy if you want a reasonably complete universe of unique web design courses.

In his book Ambient Findability, Peter Morville has an interesting discussion about different types of searches (pp 49-50). He distinguishes between sample, existence, and exhaustive searches. In a sample search, you’re looking for a few examples. Google does this great. An existence search is just a binary yes/no search (does document x exist)? Google also does this great. An exhaustive search should return all of the relevant items. As Morville discusses, the effectiveness of exhaustive searches falls rapidly with collection size. So it’s no wonder that this type of query/search is not available.

I suppose you could get something like an exhaustive search using Amazon Web Services and the Alexa search engine. I think you can get up to 10 million results returned that can be saved in a file for further manipulation. And maybe for a specific high return-value query, I would do this. But not for day-to-day work.

ok, that’s it for this search novice.

  1. Usmanou Nsangou — August 12, 2008 @ 5:47 pm

    Hi Gary,
    Great topic! Have you tried the new search engine yet?
    http://www.cuil.com; it’s supposed to rival google.

    Usmanou

  2. Gary Lewis — August 13, 2008 @ 7:14 am

    Hi Usmanou - Great to hear from you! I haven’t done much with Cuil yet. Sounds cool though :-)

    Please say hi to the DA folks for me.