Google

WWW
DEAC Cluster

 

The WFU DEAC Cluster provides the critical infrastructure necessary for researchers to reliably upload research codes, perform large scale computations, store their actively utilized results, and have confidence in the persistence of their data in the event of storage failures. A comprehensive list of these services can be found on the Services page.

Below, you'll find a comprehensive list of hardware resources that are currently in production within the WFU DEAC Cluster facility.

64-bit Hardware

Current cluster configuration:

196 - IBM Blades (8 cores each) - 1568 cores total
76 Blades with 16GB RAM
30 Blades with 48GB RAM
84 Blades with 96GB RAM
6 Blades with 144GB RAM
56 - Cisco B-series Blades (128GB RAM each) - 1004 cores total
29 Blades with 16 cores
27 Blades with 20 cores
Total - 2572 cores

Storage

60 TB for Research Data: The WFU DEAC cluster utilizes multiple storage devices attached to a SAN (Storage Area Network) through which several disk server infrastructure nodes export the space. The specific storage devices, configurations, and primary functions are:
IBM DS4200 Storage Device: Primary function is the principal data store for home directories and actively used research data on the cluster.
13TB via 32 SATA drives, 500GB capacity, 7200 RPM
27TB via 48 SATA drives, 750GB capacity, 7200 RPM
IBM DS3400 Storage Device: Provides storage for the infrastructure systems and services as well as supports large data stores needed for research data.
5.4TB via 24 SAS drives, 300GB capacity, 10000 RPM (infrastructure, VMs) 20TB via 24 SATA drives, 1TB capacity, 7200RPM (research data)

Network

Parallel Programs and Inter-processor communication
A number of parallel computation problems require a great deal of coordination traffic between the processors/nodes in order to assure accurate calculations that are consistent. While bandwidth can be a big component of this inter-processor communication (IPC), typically the messages passed are quite small thereby making the latency caused by creating and transmitting the data a critical performance factor.

Traditional gigabit Ethernet switches typically have a latency in the 50-80 microseconds. For CPUs with a capability of 3 gigaflops, which translates to 3 floating point operations every nanosecond, this can cause processors to sit idle for extraordinary lengths of time waiting for data from a participating node.

The industry standard high speed interconnect technology available today is Infiniband (IB). The IB specification and measured performance of these adapters and switch technologies yield node-to-node message latencies around 1-2 microseconds.

Voltaire Infiniband
122 IBM blades in the cluster have been upfitted to use the Voltaire 4X DDR Expansion Card (CFFh) and connect via the IB pass through module to a Voltaire ISR 9024D-M 24-port Infiniband DDR switch.
Cisco Core Ethernet Environment
The network design and connectivity available in the WFU DEAC Cluster follow the standard, best practices approach of a "switch block" design: two gateway switches provide redundant connectivity to the WFU campus and security for the cluster through redundant firewall switch modules. The back-end connectivity for the computational nodes are driven by two key Cisco technologies, VSS and DFC, to ensure the availability of bandwidth (raw speed) and throughput (packet rates) necessary for high-performance computing. In addition, the serviceability inherent in the VSS technology permits network maintenance concurrent with active computational research.

Looking forward, this recently implemented design change supports several capabilities for research that had not previously been possible:
  • 10 Gbps connectivity internally between compute node chassis'
  • 10 Gbps network connectivity to NC-REN and other research networks
  • Dedicated optical connectivity to another site via NC-REN and NLR or I2 (DWDM)

HPL

General Notes of HPL:
  • High Performance Linpack (HPL) is the Top500 benchmark for all clusters.
  • The program was utilized to benchmark the Peack Performance for the DEAC cluster.
  • For the Basic Linear Algebra Subroutine libraries, ATLAS was built to be tuned for the IBM BladeCenter and UCS CISCO blades.
  • The moduel openmpi/1.6-intel was used for spwawing multi-processes.
  • UCS on HPL:
  • HPL was launched to utilize all available UCS blades (448 cores)
  • Plot below shows peak performance in GFLOPs as the Matrix of size N increases.
  • Top peak performance reached = 3.32 Tera-FLOPS
  • IBM on HPL:
  • IBM was launched to utilize all BladeCenter Chassis types:
  • 96GB, Ethernet, 14 Blades (BC03, BC04, BC05)
  • 96GB, Infiniband, 14 Blades (BC06, BC07, BC08)
  • 48GB, Infiniband, 14 Blades (BC12, BC13)
  • 16GB, Ethernet, 14 Blades (BC09, BC10, BC11
  • 16GB, Infiniband, 14 Blades (BC01, BC02, BC14)
  • Plot below shows peak performance in GFLOPs as the Matrix of size N increases.
  • 64-bit Hardware
    Storage
    Network

    horizontal bar
    blank spacer
    Wake Forest University
    Information Systems
    University Corporate Center
    1100 Reynolds Blvd
    Winston-Salem, NC 27105
    E-mail:
    is-cluster **AT** wfu.edu
    Last Updated: 2014-May-23, 15:34 EDT Website templates and general design
    by the WFU Physics Department