工作负载干扰的序列到序列模型

论文标题

工作负载干扰的序列到序列模型

Sequence-to-sequence models for workload interference

论文作者

Prats, David Buchaca, Marcual, Joan, Berral, Josep Lluís, Carrera, David

论文摘要

在数据中心的工作中合作进行工作是一个具有挑战性的情况，在该场景中，乔布斯可以争夺资源造成严重减速或执行失败的情况。在共享资源的环境上有效的工作安置需要认识到在执行过程中的作业如何干扰，远远超出了无效的资源超预订技术。当前的技术，其中大多数已经涉及机器学习和工作建模，是基于跨时间的工作量行为摘要，而不是在执行的每一个瞬间关注有效的工作需求。在这项工作中，我们提出了一种基于基于经常性神经网络的序列到序列模型，基于其对资源和执行时间的行为来建模数据中心的工作的方法。目的是在其执行时间（从单个工作所显示的个人资料）上预测资源的共同执行的作业足迹，以增强资源经理和调度员的安置决策。此处提出的方法使用基于不同框架（例如Hadoop和Spark）和应用程序（CPU BONDEND，IO BOND，IO BOND，机器学习，SQL查询...）的高性能计算基准验证。实验表明，该模型可以正确地识别以前看到甚至看不见的共同工作的作业的资源使用趋势。

Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions. Efficient job placement on environments where resources are shared requires awareness on how jobs interfere during execution, to go far beyond ineffective resource overbooking techniques. Current techniques, most of them already involving machine learning and job modeling, are based on workload behavior summarization across time, instead of focusing on effective job requirements at each instant of the execution. In this work we propose a methodology for modeling co-scheduling of jobs on data-centers, based on their behavior towards resources and execution time, using sequence-to-sequence models based on recurrent neural networks. The goal is to forecast co-executed jobs footprint on resources along their execution time, from the profile shown by the individual jobs, to enhance resource managers and schedulers placement decisions. The methods here presented are validated using High Performance Computing benchmarks based on different frameworks (like Hadoop and Spark) and applications (CPU bound, IO bound, machine learning, SQL queries...). Experiments show that the model can correctly identify the resource usage trends from previously seen and even unseen co-scheduled jobs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题