Recommended prerequisites:
You will have a better foundation for this workshop if you have attended these earlier workshops, from this semester or prior semesters:
- The World is at your Command...line
- Intro to High Performance Computing (HPC) with the Rice NOTS cluster
or if you have basic familiarity with running a SLURM batch job on a high-performance computing cluster, as well as experience with basic file navigation and manipulation using Bash shell commands.
Overview:
AI models are everywhere, and it seems like everyone at Rice wants to create and/or utilize these models as part of their research! Rice has invested heavily to help you achieve those goals. This course will introduce Rice's new GPU-centric, high-performance computing cluster for AI/ML workloads: Rice's AI Network GPU Engine (RANGE).
Please note: This workshop covers RANGE, not RAPID. RAPID is also a cluster for running AI/ML workloads, but despite the similar name and purpose, it is a completely separate computing system. RANGE uses newer, larger GPUs, and has a much greater capacity to run large AI models across multiple GPUs.
This workshop will cover concepts needed to run and/or train resource intensive AI models on RANGE, including:
- GPUs and related computing concepts for AI/ML models
- Overview of RANGE architecture and capabilities
- Requesting an allocation of GPU time on RANGE
- Understanding job scheduling: partitions, wall times, and the SLURM scheduler
- Storing and managing your data on RANGE
- Managing software environments using modules (LMOD)
- Managing and monitoring jobs
- Demonstration of running a simple machine learning application
Contact information:
Please contact researchdata@rice.edu if you have questions about the Data@Rice workshop series.