Research News

On the Origins of Google

Even in the early days of the Internet, people saw the need for better interfaces to growing data collections. A graduate student supported by an NSF digital library project at Stanford University uncovered the missing links in Web page ranking.

In the primordial ooze of Internet content several hundred million seconds ago (1993), fewer than 100 Web sites inhabited the planet. Early clans of information seekers hunted for data among the far larger populations of text-only Gopher sites and FTP file-sharing servers. This was the world in the years before Google.

Even in this primitive Internet world, the need for more accessible interfaces to growing data collections had already been recognized. The National Science Foundation led the multi-agency Digital Library Initiative (DLI) that, in 1994, made its first six awards. One of those awards supported a Stanford University project led by professors Hector Garcia-Molina and Terry Winograd.

None of the early DLI proposals -- submitted before the World Wide Web experienced its Cambrian explosion -- explicitly included research into the Web. However, by the time DLI funding began, the information landscape had changed.

In 1994, some of the first Web search tools crawled out of the Internet sea. Two Stanford students started Yahoo!, a manually constructed "table of contents" for Web sites. Other early search engines emerged, such as Lycos and WebCrawler, and began automatically indexing Web pages, focusing on keyword-based techniques to rank search results.

Around the same time, one of the graduate students funded under the NSF-supported DLI project at Stanford took an interest in the Web as a "collection." The student was Larry Page.

Page uncovered the missing links, so to speak, in Web page ranking. His evolutionary leap was to recognize that the act of linking one page to another required conscious effort, which in turn was evidence of human judgment about the link's destination. Individually, each link was a simple but effective tool. But collectively, millions of these links provided a key adaptation for the natural selection of search results.

Page was soon joined by Sergey Brin, another Stanford graduate student working on the DLI project. (Brin was supported by an NSF Graduate Student Fellowship.) Together, Page and Brin constructed an ambitious prototype in their Stanford student offices. The equipment for the prototype, called BackRub, was funded by the DLI project and other industrial contributions.

The prototype used well-established technology to crawl from page to page by following links. However, in addition to compiling a standard text index, the prototype also mapped out a vast family tree that reflected the Web links among pages.

To calculate rankings from this family tree, the pair developed the PageRank method. In short, the method ranks a particular Web page highly if many other highly ranked Web pages link to it. Those other page's rankings, in turn, depend on the pages that link to them. Such logic could spiral out of control, but PageRank eventually stops because, as a rule, the more distantly related a page is, the less it contributes to the final rank of its descendants.

Page and Brin wrote an initial paper on their ideas and the theoretical underpinnings of PageRank and tested the fitness of the ranking approach on live Web data -- initially a test set of 24 million pages. PageRank survives as one of the main components of today's Google search service.

By late 1997, as the Dot-Com Era began to flourish, the BackRub approach proved to be sound, expandable and popular. By the end of the Early DLI Age in 1998, Page and Brin obtained funding that allowed them to move their growing hardware facility from the Stanford campus into a friend’s garage and to incorporate Google, Inc.

The rest, as they say, is history.

  -- David Hart