The WFU DEAC Cluster provides the critical infrastructure necessary
for researchers to reliably upload research codes, perform large
scale computations, store their actively utilized results, and
have confidence in the persistence of their data in the event of
storage failures. A comprehensive list of these services can be
found on the Services page.
Below, you'll find a comprehensive list of hardware resources that are currently in production within the WFU DEAC Cluster facility.
Current DEAC compute-node configurations:
194 - IBM Blades (8 cores each) - 1,552 cores total
94 - Cisco B-series Blades - 2,412 cores total
5 - IBM GPU Blades (12 cores each) - 60 cores total
2 - UCS GPU Blades (32 cores each) - 64 cores total
Total GPU cores - 12,032 cores
Total x86 cores - 4,088 cores
197.86 TB for Research Data:
The WFU DEAC cluster utilizes multiple storage devices hosting shared storage via NFS (Network File System). This high speed storage devices map storage to all cluster nodes while providing flexible configurations, quick snapshot backup capability, and easy growth. Storage device information is as follows:
NetApp FS8040 Storage Device:
Primary function is the principal data store for home directories and actively used research data on the cluster. Solid State Disk flash pool provides fast read/writes and higher performance I/O:
EMC VNX 5600 Storage Device:
Provides storage for the archive data for research groups that need to clear storage space.
Parallel Programs and Inter-processor communication
A number of parallel computation problems require a great deal of coordination traffic between the processors/nodes in order to assure accurate calculations that are consistent. While bandwidth can be a big component of this inter-processor communication (IPC), typically the messages passed are quite small thereby making the latency caused by creating and transmitting the data a critical performance factor.
Traditional gigabit Ethernet switches typically have a latency in the 50-80 microseconds. For CPUs with a capability of 3 gigaflops, which translates to 3 floating point operations every nanosecond, this can cause processors to sit idle for extraordinary lengths of time waiting for data from a participating node.
The industry standard high speed interconnect technology available today is Infiniband (IB). The IB specification and measured performance of these adapters and switch technologies yield node-to-node message latencies around 1-2 microseconds.
122 IBM blades in the cluster have been upfitted to use the Voltaire 4X DDR Expansion Card (CFFh) and connect via the IB pass through module to a Voltaire ISR 9024D-M 24-port Infiniband DDR switch.
Cisco Core Ethernet Environment
The network design and connectivity available in the WFU DEAC Cluster follow the standard, best practices approach of a "switch block" design: two gateway switches provide redundant connectivity to the WFU campus and security for the cluster through redundant firewall switch modules. The back-end connectivity for the computational nodes are driven by two key Cisco technologies, VSS and DFC, to ensure the availability of bandwidth (raw speed) and throughput (packet rates) necessary for high-performance computing. In addition, the serviceability inherent in the VSS technology permits network maintenance concurrent with active computational research.
Looking forward, this recently implemented design change supports several capabilities for research that had not previously been possible:
General Notes of HPL:
UCS on HPL:
IBM on HPL: