论文标题
安排CPU超越HPC
Scheduling Beyond CPUs for HPC
论文作者
论文摘要
高性能计算(HPC)正在发生重大变化。新兴的HPC应用程序包括计算和数据密集型应用程序。为了满足新兴数据密集型应用程序的强烈I/O需求,生产系统中部署了爆发缓冲区。现有的HPC调度程序主要以CPU为中心。硬件设备的极端异质性与工作量更改相结合,迫使调度程序在决策中考虑CPU以外的多个资源(例如,爆发缓冲区)。在这项研究中,我们提出了一个名为BBSCHED的多资源调度方案,该方案不仅基于其CPU要求,而且还基于其他可计划资源(例如爆发缓冲区)。 BBSCHED将调度问题提出为多目标优化(MOO)问题,并使用多目标遗传算法快速解决该问题。 BBSCHED生成的多个解决方案使系统经理能够探索各种资源之间的潜在权衡,因此可以更好地利用所有资源。带有实际系统工作负载的痕量驱动模拟表明,与现有方法相比,BBSCHED可提高计划性能高达41%,这表明明确优化CPU以外的多个资源对于HPC计划至关重要。
High performance computing (HPC) is undergoing significant changes. The emerging HPC applications comprise both compute- and data-intensive applications. To meet the intense I/O demand from emerging data-intensive applications, burst buffers are deployed in production systems. Existing HPC schedulers are mainly CPU-centric. The extreme heterogeneity of hardware devices, combined with workload changes, forces the schedulers to consider multiple resources (e.g., burst buffers) beyond CPUs, in decision making. In this study, we present a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer. BBSched formulates the scheduling problem into a multi-objective optimization (MOO) problem and rapidly solves the problem using a multi-objective genetic algorithm. The multiple solutions generated by BBSched enables system managers to explore potential tradeoffs among various resources, and therefore obtains better utilization of all the resources. The trace-driven simulations with real system workloads demonstrate that BBSched improves scheduling performance by up to 41% compared to existing methods, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling.