NOVEL YALE UNIVERSITY COMPUTER CLUSTER
ACHIEVES RECORD BANDWIDTH
New Haven, April 06, 2006 – Despite the great strides in high-performance computing (HPC), its full potential in science and its broader impact on society have yet to be realized. While HPC to date has been able to leverage increased CPU power, developments have lagged in deploying, accessing, searching, and mining large datasets. Even over the past 10 years, input/output (I/O) access and throughput to large-scale (disk) storage has lagged by a factor 10 the increase in CPU power. Some have argued that this problem could be the Achilles’ heel of (massive) parallel computing (see, e.g., the cover story in the March 23, 2006 issue of Nature magazine).
A novel 57 node computer cluster at Yale University, purchased with the support of the Air Force Office of Scientific Research, promises to pave the way to correct the imbalance between computing and I/O. The Yale cluster has now achieved 10 gigabytes per second of aggregate I/O throughput from a single directory in a 14 terabyte single namespace file system spread across 114 disk drives. The new cluster architecture uses conventional off-the-shelf processing nodes with the single namespace parallel file system created from local storage attached to each node. With this design, clusters can become highly effective for data-intensive applications ranging from bioinformatics to machine learning to satellite imagery to commerce to multimedia, at a fraction of the cost of traditional clusters utilizing networked storage.
The Yale researchers address the issue of HPC data storage/access using a ‘divide-and-conquer’ strategy in which local storage units are aggregated into a massive single namespace file system. Until now, ‘networked storage’ with ‘physically contiguous’ data access pools have been the norm in cluster computing. At Yale, direct-attached data pools are made logically contiguous, thereby closing the gap between compute capability and large-scale storage access and throughput. With this ‘divide-andconquer’ approach to cluster storage architecture, extremely high application performance is realized on the broad range of problems and applications for which there would otherwise be a significant I/O bottleneck.
Balanced data-intensive computing requires data throughput to disk of roughly 1 gigabyte per second per terabyte of storage (1 GB/s/TB). Indeed, the so-called StorCloud Grand Challenge Problem is to achieve 1 GB/s/TB. For enterprise and large-scale scientific cluster supercomputing, the current records are about 0.07 GB/s/TB. With the Yale cluster, throughput of 0.7 GB/s/TB has now been achieved even for data transfers to a single directory spread across many compute nodes. This is expected to open a new era in massive HPC.
In order to enable a focused effort on these data-intensive problems, Yale has formed the Center for Hyperscalable Mathematics and Computing as a joint effort between its Mathematics, Applied Mathematics, and Computer Science Departments. The lead faculty involved include Professors Steven Orszag and Ronald Coifman of Mathematics and Steven Zucker of Computer Science. A proposal for a 5 year grant to support work at this new Center was submitted to the Department of Energy in March. More than 20 senior faculty members from Yale, Boston University and Princeton University are involved.