UA’s free high performance computing (HPC) resources are popular and demand is increasing. Researchers from a broad spectrum of disciplines are exploring new areas such as machine learning (teaching computers to understand a problem) and deep learning (computers that can teach themselves how to understand the problem). The UITS Research Technologies group has been working to meet this new demand head on.
El Gato Reconfiguration
For the first time ever, research principal investigators will receive a standard allocation of 7,000 CPU-hours per month from the El Gato HPC system, now newly revamped. El Gato was launched in 2013 with a National Science Foundation grant received by the Department of Astronomy. The astronomers had priority use of 70% of the system’s capacity, and 30% was available to campus researchers. When the grant expired last year, the system could have been scrapped. Instead, Research Technologies staff decided to maximize computing availability for the campus.
Chris Reidy, Systems Administrator Principal for the Research Data Center, pulled the processing nodes from El Gato and reconfigured them. The system is still specialized in GPU (graphics processing unit) architecture, with 50 dual GPU nodes and 40 single GPU nodes. This is ideal for graphical processing like creating images, but also allows for extremely fast processing of other data. The Department of Astronomy decided to invest in an additional 45 nodes for their priority use. Chris and other UITS staff took turns installing 3,100 feet (0.6 miles) of new cabling. That’s enough cable to run from the Computer Center to the football stadium!
The HPC infrastructure team then updated El Gato’s operating system to the latest version, so that it can run modern code. They also put El Gato on the same scheduling system as the RDC’s Ocelote system, making it easier for researchers who use both systems to queue their jobs.
The Ocelote system now has additional compute hours available, as well. After examining usage patterns, Research Technologies staff determined they could increase the previous 24,000 CPU-hours per month by 50% without impacting other system needs. Ocelote now offers a total of 36,000 CPU-hours per month.
This year is also a refresh year in the Research Data Center and planning is in place for a new system to be built. The refurbished El Gato is being used as a test case for a new management tool which is expected to improve system management efficiency. When the new system comes online in the first quarter of 2020, available compute time will increase even more, giving campus researchers even more room to expand their imaginations.