loading...
Sequence of Hashes Compression in Data De-duplication
March 25-March 27
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DCC.2008.80Data Compression Conference (dcc 2008)
 This Article 
 
PURCHASE ARTICLE: $0
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Data de-duplication is a simple compression method that became verypopular in storage archival and backup. It has the advantage ofdirect, random access to any piece ("chunk") of a file in one tablelookup; that's not the case with differential file compression, theother common storage archival method. The compression efficiency(chunk matching) of de-duplication improves for smaller chunk sizes,however the sequence of hashes replacing the de-duplicated object(file) increases significantly. We propose a simple scheme to shrinkthe list of hashes generated during de-duplication of an object.This shrinkage is orders of magnitude smaller than what a customarycompression algorithm (gzip) achieves and has a significant impacton overall de-duplication efficiency.
Index Terms:
Data De-duplication, cryptographic hashes compression
Citation:
Subashini Balachandran, Cornel Constantinescu, "Sequence of Hashes Compression in Data De-duplication," dcc, pp.505, Data Compression Conference (dcc 2008), 2008
Usage of this product signifies your acceptance of the Terms of Use.