Grub is "an open source, distributed internet crawler!"1 Also known as grubclient (so as not to be confused with grub the boot loader).

For about two and a half years now, Kord Campbell has been trying to make Grub an important part of the internet. He has set up a "for profit" company that uses the distributed computing power of volunteers plus a central database to construct a comprehensive, up to date list of urls. His goal is to index the entire web updating the list every day!

This Open Source project is licensed under the GNU General Public License (GPL) with the source code at SourceForge.

One business model has his company selling a live feed of his database to large search engines like Yahoo and Google. These search engines can then get an "up to the minute" status report of a url. This means fewer broken links in the search results. Search engines can also use statistics on how frequently a url changes to prioritize their updates and the search engine can then re-index pages that have changed giving more accurate and timely search results.

He also recognizes that the "little guy" may need a low usage feed which he would offer as a free service.

What's in it for me? The grub client has a feature that allows local servers to be crawled as a priority. This means that your urls will be updated in the database first. In theory, every web server could also run Grub. As the content of a web site changes so would the data collected by the search engines using the Grub database.

Grub can be set to use as much or as little bandwidth as you like. If you donate 1% of a DSL connection (3.5 mega-bits per second), it would index your entire site plus a few others about once a day (up to about an estimated 2,400 pages a day). This is a very low CPU / bandwidth cost for the advantages it offers. Furthermore, with a local crawler, you would have better control over what content gets indexed and what doesn't.

Grub has been featured on Slashdot. And currently has about 900 registered clients but only a small fraction of those are actually running.

Distributed computing is here. Now! The power of Open Source + idle CPU / bandwidth = next generation of software systems. Grub is there at the bleeding edge.

P.S. The mascot for Grub is officially "Grubby" the cute little worm.


1 http://www.grub.org


Update: March 17, 2003 - The search engine "LookSmart" (http://www.looksmart.com/) aquired grub.org for 1.3 million in cash and stocks according to CNET news.

Ref: http://news.com.com/2100-1032-993591.html


Grub was Slashdotted (again) April 19, @10:15PM (2003)