Overview

Our HPC Resources

The main components of our infrastructure


The WFU DEAC Cluster provides the critical infrastructure necessary for researchers to reliably upload research codes, perform large scale computations, store their actively utilized results, and have confidence in the persistence of their data in the event of storage failures. A comprehensive list of these services can be found on the Services page.

Below, you'll find a comprehensive list of hardware resources that are currently in production within the WFU DEAC Cluster facility.


64-bit Hardware

Current DEAC compute-node configurations:

194 - IBM Blades (8 cores each) - 1,552 cores total
  • 75 Blades with 16GB RAM
  • 30 Blades with 48GB RAM
  • 83 Blades with 96GB RAM
  • 6 Blades with 144GB RAM

  • 94 - Cisco B-series Blades - 2,412 cores total
  • 29 Blades with 16 cores -- 128GB RAM
  • 27 Blades with 20 cores -- 128GB RAM
  • 22 Blades with 32 cores -- 128GB RAM
  • 16 Blades with 44 cores -- 256GB RAM

  • 5 - IBM GPU Blades (12 cores each) - 60 cores total
  • 4 GPU cards per node
  • All are NVIDIA Tesla S2050
  • 448 CUDA cores per Tesla

  • 2 - UCS GPU Blades (32 cores each) - 64 cores total
  • 1 GPU card per node
  • All are NVIDIA Tesla M6
  • 1,536 CUDA cores per Tesla

  • Total GPU cores - 12,032 cores

    Total x86 cores - 4,088 cores

    Storage

    197.86 TB for Research Data:
    The WFU DEAC cluster utilizes multiple storage devices hosting shared storage via NFS (Network File System). This high speed storage devices map storage to all cluster nodes while providing flexible configurations, quick snapshot backup capability, and easy growth. Storage device information is as follows:

    NetApp FS8040 Storage Device:
    Primary function is the principal data store for home directories and actively used research data on the cluster. Solid State Disk flash pool provides fast read/writes and higher performance I/O:

  • 14TB via 24 Solid State Drives, 800GB capity
  • 194TB via 120 SATA drives, 2TB capacity, 7200 RPM
  • 177.86TB of usable storage via NFS

  • EMC VNX 5600 Storage Device:
    Provides storage for the archive data for research groups that need to clear storage space.

  • 32TB via 8 NL-SAS drives, 2TB capacity, 7200 RPM
  • 20TB of usable storage via NFS

  • Network

    Parallel Programs and Inter-processor communication

    A number of parallel computation problems require a great deal of coordination traffic between the processors/nodes in order to assure accurate calculations that are consistent. While bandwidth can be a big component of this inter-processor communication (IPC), typically the messages passed are quite small thereby making the latency caused by creating and transmitting the data a critical performance factor.

    Traditional gigabit Ethernet switches typically have a latency in the 50-80 microseconds. For CPUs with a capability of 3 gigaflops, which translates to 3 floating point operations every nanosecond, this can cause processors to sit idle for extraordinary lengths of time waiting for data from a participating node.

    The industry standard high speed interconnect technology available today is Infiniband (IB). The IB specification and measured performance of these adapters and switch technologies yield node-to-node message latencies around 1-2 microseconds.

    Voltaire Infiniband

    122 IBM blades in the cluster have been upfitted to use the Voltaire 4X DDR Expansion Card (CFFh) and connect via the IB pass through module to a Voltaire ISR 9024D-M 24-port Infiniband DDR switch.

    Cisco Core Ethernet Environment

    The network design and connectivity available in the WFU DEAC Cluster follow the standard, best practices approach of a "switch block" design: two gateway switches provide redundant connectivity to the WFU campus and security for the cluster through redundant firewall switch modules. The back-end connectivity for the computational nodes are driven by two key Cisco technologies, VSS and DFC, to ensure the availability of bandwidth (raw speed) and throughput (packet rates) necessary for high-performance computing. In addition, the serviceability inherent in the VSS technology permits network maintenance concurrent with active computational research.

    Looking forward, this recently implemented design change supports several capabilities for research that had not previously been possible:

  • 10 Gbps connectivity internally between compute node chassis'
  • 10 Gbps network connectivity to NC-REN and other research networks
  • Dedicated optical connectivity to another site via NC-REN and NLR or I2 (DWDM)

  • High Performance Linpack (HPL)

    General Notes of HPL:
  • High Performance Linpack (HPL) is the Top500 benchmark for all clusters.
  • The program was utilized to benchmark the Peack Performance for the DEAC cluster.
  • For the Basic Linear Algebra Subroutine libraries, ATLAS was built to be tuned for the IBM BladeCenter and UCS CISCO blades.
  • The moduel openmpi/1.6-intel was used for spwawing multi-processes.

    UCS on HPL:
  • HPL was launched to utilize all available UCS blades (448 cores)
  • Plot below shows peak performance in GFLOPs as the Matrix of size N increases.
  • Top peak performance reached = 3.32 Tera-FLOPS


    IBM on HPL:
  • IBM was launched to utilize all BladeCenter Chassis types:
  • 96GB, Ethernet, 14 Blades (BC03, BC04, BC05)
  • 96GB, Infiniband, 14 Blades (BC06, BC07, BC08)
  • 48GB, Infiniband, 14 Blades (BC12, BC13)
  • 16GB, Ethernet, 14 Blades (BC09, BC10, BC11
  • 16GB, Infiniband, 14 Blades (BC01, BC02, BC14)
  • Plot below shows peak performance in GFLOPs as the Matrix of size N increases.