DeepSketch：一种基于机器学习的新的参考搜索技术，用于DEDELICATIC DELTA压缩

论文标题

DeepSketch：一种基于机器学习的新的参考搜索技术，用于DEDELICATIC DELTA压缩

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

论文作者

Park, Jisung, Kim, Jeoggyun, Kim, Yeseong, Lee, Sungjin, Mutlu, Onur

论文摘要

作为将数据中心的管理成本降至最低的有效解决方案，存储系统的数据减少越来越重要。为了最大程度地提高数据降低效率，现有的二次填充后增压技术执行Delta压缩以及传统的数据删除和无损压缩。不幸的是，我们观察到，由于识别相似的数据块的准确性有限，因此现有技术的数据还原比明显低于最佳数据。在本文中，我们提出了DeepSketch，这是一种新的参考搜索技术，用于用于降低后DELTA压缩，该技术利用学习对准方法来实现更高的准确性，以参考搜索DELTA压缩，从而提高了数据减少数据效率。 DeepSketch使用深层神经网络来提取数据块的草图，即创建可以保留与其他块相似性的块的近似数据签名。我们使用11个现实世界的工作负载进行评估表明，与最先进的Deduplication Duplication Duplication Duplication Dulta-Compression技术相比，DeepSketch将数据还原比提高了33％（平均为21％）。

Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center. To maximize data-reduction efficiency, existing post-deduplication delta-compression techniques perform delta compression along with traditional data deduplication and lossless compression. Unfortunately, we observe that existing techniques achieve significantly lower data-reduction ratios than the optimal due to their limited accuracy in identifying similar data blocks. In this paper, we propose DeepSketch, a new reference search technique for post-deduplication delta compression that leverages the learning-to-hash method to achieve higher accuracy in reference search for delta compression, thereby improving data-reduction efficiency. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block that can preserve similarity with other blocks. Our evaluation using eleven real-world workloads shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题