rp says WonkoDSane says (to me, by mistake) I don't want to add a response WU to Beowulf Cluster, but you may want to add that the other reason why SETI doesn't use one big SC is because it can't afford to do such things.
What is a beowulf cluster or Beowulf cluster vs Supercomputer Beowulf cluster is essentially a number of commodity or of the shelf PCs tied together using some sort of protocol. Developed by NASA, Beowulf clusters gained much attention in the last few years. Often at a fraction of a cost of a "real" supercomputer a beowulf cluster is comparable (or even exceeds) the computational power of a super computer. For instance the a cluster made out of 320 ubiquitous P3-800 CPUs takes 34th place on the TOP500 super computer list with peak performance of over a teraflop! What is required for a beowulf cluster Essentially all that is needed is a number of available machines. Historically most of the software written for beowulf clusters is Linux based. Due to centralized nature of beowulf clusters machines will need to communicate to the central server. Ethernet is both cheap and widely available which makes it especially convenient. On top of Linux you'll have to run some sort of communication layer which will provide the required infrastructure between the nodes of your beowulf cluster. There are few choices; among them OSCAR from NCSA and LAM-MPI developed by a team from University of Notre Dame. We've chosen LAM-MPI because of it's simplicity and because of the fact that it's being continuously developed. Why would you want to have a beowulf cluster Although there are certain drawbacks to beowulf clusters they provide a cheap alternative to buying a "real" super computer. For instance in a school like Langara there are hundreds unused computers during off peak hours. Students doing simulations or other research could possibly use these idle computers to their advantage. Supercomputers in general and beowulf clusters in particular have been used for everything from chess playing to cancer and nuclear research. Moving away from serious research subjects we've found that there are applications out there that will allow you to encode your music into mp3 format using distributed approach - namely clustering software. MPI-LAM Message passing interface (MPI) is one of the widely recognized standards for communication between beowulf nodes. Local Area Multicomputer (LAM) from the University of Notre Dameis one of the most popular implementation of the MPI protocol. It provides an abstraction layer for C/++, Perl, Python and other popular languages and enables them to create multi-computer parallel application. MPI-LAM deals with communication between heterogenous nodes and provides standard MPI interface. LAM provides an extensive set of debugging applications. It supports MPI-1 and parts of MPI-2. LAM is not only restricted to Linux. It will compile on many commercial Unicies. Our particular implementation For the ease of implementation we've decided to use NFS to mount the shared software (MPI-LAM) across all the nodes. On boot up we pass the required parameters to the LILO which boots the workstation from the network. On the server, we have a directory structure which is mirrored through the network across every node. Since LAM does all of it's processing in /tmp we have it created as a RAM drive which improves performance immensely. Since we're using a very recent kernel, we can set the ram disk size to be dynamic (max 8MB) so it only uses as much RAM as it actually has files in /tmp. Thoughts on this setup Having been through this process, we'd say that using NFS-root is worthwhile for a large, fast network, but that on a 10MB network it's too expensive to be justified. We've tested nodes running at 10MB while playing with home network, and while stability wasn't compromised, it hugely magnified the time the test suite took to run. On the other hand even though mounting the shared software from one central repository eases the maintenance, the performance is hit because Ethernet is the primary method of communication between the nodes in the beowulf cluster. Further tests should be performed comparing performance in a NFS-network vs no-NFS network. Hardware used for the tests Home test network (successfully tested with LAM-MPI test suite): server: celeron 500, 192MB RAM node1: celeron 900, 128MB RAM node2: Pentium III 933, 256MB RAM Dungeon test network (marginal stability at best): server: Pentium 133, 32MB RAM nodes 1,4,5: Pentium 133, 32MB RAM These nodes were the most stable, likely because they were on the same hub as the server. The rest were on the other side of a coax uplink, which seemed a bit flaky under load. Problems & Difficulties for a prospective user of a beowulf clusters Like practically any other task in life the difficulty for a first time user will depend on his or her expertise. As was mentioned, Beowulf clusters are usually built using Linux. However using MPI-LAM many other forms of Unicies become available to the user. Once the user establishes his operating system preference, the network topology will have to chosen. Ethernet seems to be a winning choice because of it's availability. Last but not least is the software to drive the cluster. The communication software doesn't have to be written from scratch; instead communication libraries like LAM-MPI can be used. The user then, will have to write his own application employing the communication abstraction layer to achieve his particular goal. To summarize, the decisions in building a beowulf cluster will have to be made in roughly the four following categories: 1. Operating System 2. Network Topology 3. Communication 4. Software From personal experience (we're fairly familiar with everything up to "Software") it seems that writing a massively parallel application is the hardest task of all. Summary & Conclusion The purpose of this assignment was to research and try to build a Linux Beowulf Cluster. Hopefully this paper managed to show that we've successfully scaled both problems. MPI-LAM was chosen to implement the cluster as a result of our research. We've successfully built a node network and run the tests which prove the cluster functions as promised. Although writing software for a parallel cluster seems like beyond the scope of this course (and many magnitudes harder than this project) it's nevertheless something we may want to peruse for the next assignment.
printable version chaos
Everything2 Help