原始数据查询处理的资源利用率监控

论文标题

原始数据查询处理的资源利用率监控

Resource Utilization Monitoring for Raw Data Query Processing

论文作者

Patel, Mayank, Bhise, Minal

论文摘要

科学实验，模拟和现代应用会产生大量数据。数据以原始格式存储，以避免传统数据库管理系统的高加载时间。研究人员提出了许多技术来提高原始数据的查询执行时间，并减少传统系统的数据加载时间。所有提出的技术的核心是通过仅处理所需数据或减少数据操作来有效利用资源。主内存或磁盘中的处理后数据缓存可以解决此问题，并避免重复处理数据。但是，需要考虑使用诸如主内存空间，存储IO速度以及其他存储空间要求之类的资源的局限性，以便为云或内部部署提供可靠且可扩展的解决方案。本文通过集成资源监视模块来改进原始数据查询处理框架。使用科学数据集已知的斯隆数字天空调查（SDSS）进行实验。对受监控资源的分析表明，采样查询的资源利用率最低。后Grestraw可以比PostgreSQL更快地回答简单的0-Join查询。虽然需要使用PostgreSQL来回答一个或多个复杂查询，以减少工作负载执行时间（湿）。结果部分讨论了简单，复杂和采样类型查询的资源要求。查询类型和资源利用模式的结果分析有助于提出查询复杂性意识（QCA）和资源利用率Aware（RUA）数据分配技术针对RAW引擎和DBMS，以将成本或数据降低到结果时间。

Scientific experiments, simulations, and modern applications generate large amounts of data. Data is stored in raw format to avoid the high loading time of traditional database management systems. Researchers have proposed many techniques to improve query execution time for raw data and reduce data loading time for traditional systems. The core of all the proposed techniques is efficient utilization of resources by processing only required data or reducing operations on data. The processed data caching in the main memory or disk can resolve this issue and avoid repeated processing of data. However, limitations of resources like main memory space, storage IO speeds, and additional storage space requirements on disk need to be considered to provide reliable and scalable solutions for cloud or in-house deployments. This paper presents improvements to the raw data query processing framework by integrating a resource monitoring module. The experiments were performed using a scientific dataset known Sloan Digital Sky Survey (SDSS). Analysis of monitored resources revealed that sampling queries had the lowest resource utilization. The PostgresRAW can answer simple 0-JOIN queries faster than PostgreSQL. While one or more JOIN complex queries need to be answered using PostgreSQL to reduce workload execution time (WET). The results section discusses resource requirements of simple, complex, and sampling type queries. The result analysis of query types and resource utilization patterns assisted in proposing Query Complexity Aware (QCA) and Resource Utilization Aware (RUA) data partitioning techniques for raw engines and DBMS to reduce cost or data to result time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题