论文标题
CTDGM:基于非结构化数据存储系统缓存交易的数据分组模型
CTDGM: A Data Grouping Model Based on Cache Transaction for Unstructured Data Storage Systems
论文作者
论文摘要
缓存预取技术已成为数据中心中的主流数据访问优化策略。但是,非结构化数据的迅速增加会产生庞大的成对访问关系,这可能会导致现有的预取模型造成沉重的计算负担,并导致数据访问执行的严重降级。我们提出了基于缓存 - 交易的数据分组模型(CTDGM),以通过优化特征表示方法和分组效率来解决上述问题。首先,我们提供缓存交易的定义,并提出提取缓存事务功能(CTF)的方法。其次,我们设计了基于CTF和时空位置的数据块算法,以优化关系计算效率。第三,我们通过构建一个关系图来提出CTDGM,该图形根据数据访问关系的强度将数据分组为独立组。根据实验的结果,与最先进的方法相比,我们的算法在MSR数据集上平均提高了缓存大小的MSR数据集的12%(占所有数据的0.001%)的平均增长率,这反过来又将数据I/O访问的数量降低了50%,当CACHE尺寸少于0.008%时,该数据I/O访问的数量就会减少50%。
Cache prefetching technology has become the mainstream data access optimization strategy in the data centers. However, the rapidly increasing of unstructured data generates massive pairwise access relationships, which can result in a heavy computational burden for the existing prefetching model and lead to severe degradation in the performance of data access. We propose cache-transaction-based data grouping model (CTDGM) to solve the problems described above by optimizing the feature representation method and grouping efficiency. First, we provide the definition of the cache transaction and propose the method for extracting the cache transaction feature (CTF). Second, we design a data chunking algorithm based on CTF and spatiotemporal locality to optimize the relationship calculation efficiency. Third, we propose CTDGM by constructing a relation graph that groups data into independent groups according to the strength of the data access relation. Based on the results of the experiment, compared with the state-of-the-art methods, our algorithm achieves an average increase in the cache hit rate of 12% on the MSR dataset with small cache size (0.001% of all the data), which in turn reduces the number of data I/O accesses by 50% when the cache size is less than 0.008% of all the data.