What is a cluster?

From WakeDEAC

(Redirected from Clusters)

Computational clusters, traditionally called Beowulf clusters, or Linux clusters, when based on the Linux operating system, vary greatly in design depending on the problems they are designed to solve. There are, however, some commonalities that span across different cluster design.

Cluster Types

A Cluster is a set of individual distinct computer systems that are linked together to perform sets of calculations. These computers can be a heterogenous aggregate of workstations that have some common set of necessary tools for all calculations to use. For example, the SETI@home project uses a wide variety of hardware and operating systems in a form of distributed cluster to analyze an extremely large data set. In the SETI@home project, the freely downloadable software contains all the necessary tools to perform its calculations.

A more common cluster setup uses almost identical hardware running almost identical OS configurations. In this way, management of the cluster becomes a lot simpler. The addition of new computers (or nodes) is a simple matter of cloning the setup onto the new node. In the SETI@home example above, while the project managers do not directly support the computers their program is running on, each computer (at worst) has its own systems manager (the owner of the home-based PC). Clearly this is not a situation that is desirable in a larger scale facility.

Minimal Hardware

In addition to the nodes which provide the CPU power (compute nodes), typically a central server is also deployed. This server provides several useful benefits. First, since the nature of distributing tasks to the compute nodes requires a lot of trust, the server node typically acts as a gateway (and firewall) between the compute nodes and the (untrustworthy) Internet. Second, in acting as the gateway, the server provides a central login point for users to develop their applications on. This central location will accept the tasks, or "job requests", from the user and distributes the workload appropriately across the cluster. A job scheduler is used to determine which job request should be processed or queued until a later time. This determination is governed by a scheduling policy determined by the administrator, typically with user input. Finally, the server acts as a central location to manage the compute nodes on an operating system and hardware level.

Finally, some internal cluster networking is required to allow the server to distribute the job tasks and also for the compute nodes to talk to each other. For parallel applications, "fast ethernet" (100Mbps) is an absolute minimum. Other faster connectivity options are available, ranging from Gigabit Ethernet, Fibre Channel, to Myrinet. The choice of networking infrastructure is critically tied to the types of problems that the cluster is designed to solve.

And that's the absolute basic components required to build a cluster. For small clusters, which will not grow significantly, these components could be enough. Any thoughts of expansion require additional infrastructure beyond that mentioned here.