Google: Search Technology for the Millennium

Here in the U.S., Google is everyone’s favorite search engine. That’s because it’s fast, is user friendly, and has a huge database. But the real clincher is that it works like magic - you can find anything. Google hit the Web in 1998.

Sergey Brin and Larry Page were grad students at Stanford, working on a class project to identify meaningful patterns in Web link structure. They became fascinated with the significance of “backlinks” (pages linking back to a site) realizing this information could be used to build a better search engine.

Named by a Kid

Google was derived from the word googol, coined by Milton Sirotta, nine-year-old nephew of mathematician Edward Kasner. It refers to the number 10 to the 100th power (1 followed by 100 zeros), reflecting the company’s goal of organizing the vast collection of documents on the Web. After presenting their project to an angel investor, Brin and Page received a check made out to Google Inc. They pondered it for a couple weeks, then opened an account in the name Google, Inc. in order to use the cash. And the rest is history.

The PageRank Revolution

Google turned the search world on its head with PageRank. PageRank turned out to be a real technological breakthrough as most major search engines use link popularity for relevancy now. How does PageRank work?

“Google’s PageRank search technology works by first identifying the link structure of the entire Web, then ranking individual pages based on the number and importance of pages linked to them,” said Google software engineer Matt Cutts. I gather from talking to Cutts that importance (the popularity and relevance of the backlink) is more critical than the number of backlinks.

Does Google Have a Weakness?

Google works better on searches for specific information (like snowfall in Sweden) than for general information (like dogs) because search results aren’t categorized, thus unwieldy when searching broad terms. In case you didn’t know, Google has a directory with category listings, but most people use Google.com.

The newer search engines like Wisenut and Teoma started classifying results by category. For instance, Teoma (in Beta) divides the dogs query into folders: Dogs Breeds, Dogs Training, Dogs Cats, German Shepherd Dogs, Animals Shelter, Dogs Lovers, Great Pyrenees , Humane Society. Most users don’t know how to narrow their searches because they’re not familiar with research procedures.

Will Google follow the trend and start categorizing? “Google is in its second generation of experimenting with category based results,” replied Cutts. “Users apparently do not like having too many category options, but presenting clear and concise categories is important to users.”

Google’s Success Formula

Google has two main sources of revenue: advertising and search services. Cutts reports that their Ad Words program currently yields up to five times the average click-through rate for traditional banner ads.

The real cash cow might be Google’s search services to major portals and corporate sites. Google has about 130 customers in over 30 countries. Such customers include Yahoo! and international properties, Sony and global affiliates, AOL/Netscape, Cisco Systems, and many others. Partners pay Google an upfront search service fee and a CPM (cost-per-thousand results sets delivered) to power search on their Web sites. Google receives a fee for every search conducted on its partners’ sites.

New Google Tool Bar

Are you one of several million who have downloaded the Google Tool Bar? Google just released a beta version that permits you to vote on site popularity. This could bring human opinion into the equation rather than the sole emphasis on link structure.

If you download the beta version, you can rank your search results with a voting button. Will they incorporate this feedback into their algorithm? “Rather than using the votes to tinker with the specific rankings of particular pages or sites, the feature would most likely be used to bolster the relevance of overall results,” replied Cutts. He said the data collected thus far looks promising, but it’s too soon to make any conclusions.

How Does Google’s Algorithm Work?

It ranks sites by the words listed on the page and by the keyword phrases in the title and description. The spider considers about 25 factors including “keyword” and “description” meta tags, ranking the page’s popularity (based on the number and importance of sites linked to the page).

How to get high rankings? Cutts says the guidelines are pretty simple. “Stay away from hidden text, hidden links, cloaking, sneaky redirects, lots of duplicate content on different domains, and doorway pages. Webmasters should also stay away from programs that send automatic queries to Google. The worst thing you can do is try to cheat: shortcuts to boost PageRank or rankings usually do more harm than good. Even if an SEO does think they’ve found a shortcut, about 2/3rds of the time it may be a sting operation. Don’t bother with link exchanges, signing guestbooks, or other tricks — the best use of a webmaster’s time is building good content and honestly promoting their site. When Google punishes spam like cloaking, we sometimes take out not only the cloaked domain but the SEO’s client as well.”

Google’s Crystal Ball

Google wants a deeper, fresher, and more personalized index. “The future will be less about features and more about the overall usefulness of an engine,” noted Cutts. “We believes users want relevancy, but they also want quick, clean results with proven integrity.” Do you see XML in the future? “Not anytime soon,” replied Cutts. “The main benefit of HTML is that anyone can write it. That’s part of why the Web had such meteoric growth. XML is great for machine-to-machine communication, but it’s much more difficult for a person to produce by hand.”

Google plans to increase its lead across the board in the coming year. “We’ll be introducing new ways to search. We don’t want to give away any secrets, but Google will provide many helpful surprises in 2002,” said Cutts. As usual, Google’s focus will be on search and the user experience.

(Note: Since this interview, Google announced the availability of the Google Search Appliance, described as “an integrated hardware/software solution that extends the power of Google.com to corporate intranets and web servers.” )

How Deep?

Google provides support for hundreds of file formats found in the deep Web: PDF, RTF, PostScript, Word, Excel, PowerPoint, and more. It crawls and indexes millions of dynamic pages. Every 28 days, Google indexes 3 billion Web documents; it performs a “fresh crawl” of more than 3 million important Web pages every day. Google’s news crawl gives you up-to-the-minute headlines on news queries.

About the Author:

Web-Ignite CEO Paul Bruemmer is an SEO Today columnist. He writes for ClickZ and many other Internet marketing publications. Bruemmer has been quoted in The Los Angeles Times, MarketingSherpa and Silicon 2.0. He’s a frequent speaker at Internet.com’s Search Engine Strategies. Founded in 1995, Web-Ignite has helped promote over 15,000 Web sites and was recognized by ICONOCAST and MarketingSherpa as a top SEO firm based on reputation. Client testimonials report search engine traffic increases of 150 to 500 percent.

0 comments:

Post a Comment