当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HPCache: memory-efficient OLAP through proportional caching revisited
The VLDB Journal ( IF 4.2 ) Pub Date : 2023-12-22 , DOI: 10.1007/s00778-023-00828-7
Hamish Nicholson , Periklis Chrysogelos , Anastasia Ailamaki

Analytical engines rely on in-memory data caching to avoid storage accesses and provide timely responses by keeping the most frequently accessed data in memory. Purely frequency- and time-based caching decisions, however, are a proxy of the expected query execution speedup only when storage accesses are significantly slower than in-memory query processing. On the other hand, fast storage offers loading times that approach fully in-memory query response times, rendering purely frequency-based statistics incapable of capturing the impact of a caching decision on query execution. For example, caching the input of a frequent query that spends most of its time processing joins is less beneficial than caching a page for a slightly less frequent but scan-heavy query. Thus, existing caching policies waste valuable memory space to cache input data that offer little-to-no acceleration for analytics. This paper proposes HPCache, a buffer management policy that enables fast analytics on high-bandwidth storage by efficiently using the available in-memory space. HPCache caches data based on the speedup potential instead of relying on frequency-based statistics. We show that, with fast storage, the benefit of in-memory caching varies significantly across queries; therefore, we quantify the efficiency of caching decisions and formulate an optimization problem. We implement HPCache in Proteus and show that (i) estimating speedup potential improves memory space utilization, and (ii) simple runtime statistics suffice to infer speedup. We show that HPCache achieves up to a 1.75x speed-up over frequency-based caching policies by caching column proportions and automatically tuning them. Overall, HPCache enables efficient use of the in-memory space for input caching in the presence of fast storage, without requiring workload predictions.



中文翻译:

HPCache:通过重新审视比例缓存实现内存高效的 OLAP

分析引擎依靠内存中的数据缓存来避免存储访问,并通过将最常访问的数据保留在内存中来提供及时响应。然而,仅当存储访问明显慢于内存中查询处理时,纯粹基于频率和时间的缓存决策才能代表预期的查询执行加速。另一方面,快速存储提供的加载时间接近完全内存中的查询响应时间,使得纯粹基于频率的统计数据无法捕获缓存决策对查询执行的影响。例如,缓存大部分时间用于处理连接的频繁查询的输入不如缓存频率稍低但扫描量大的查询的页面有利。因此,现有的缓存策略浪费了宝贵的内存空间来缓存输入数据,而这些数据几乎无法为分析提供加速。本文提出了 HPCache,这是一种缓冲区管理策略,可通过有效使用可用内存空间来实现对高带宽存储的快速分析。 HPCache 根据加速潜力来缓存数据,而不是依赖基于频率的统计数据。我们表明,在快速存储的情况下,内存缓存的优势在不同查询之间存在显着差异。因此,我们量化缓存决策的效率并制定优化问题。我们在 Proteus 中实现 HPCache,并表明 (i) 估计加速潜力可以提高内存空间利用率,(ii) 简单的运行时统计足以推断加速。我们表明,通过缓存列比例并自动调整它们,HPCache 比基于频率的缓存策略实现了高达 1.75 倍的加速。总体而言,HPCache 可以在快速存储的情况下有效利用内存空间进行输入缓存,而无需进行工作负载预测。

更新日期:2023-12-22
down
wechat
bug