loading...
XML Syntax Conscious Compression
Snowbird, Utah March 28-March 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DCC.2006.85Data Compression Conference (DCC'06)
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
S. Harrusi, Tel Aviv University, Israel
A. Averbuch, Tel Aviv University, Israel
A. Yehudai, Tel Aviv University, Israel

XML is the standard format of content representation and sharing on theWeb. XML is a highly verbose language, especially regarding the duplication of meta-data in the form of elements and attributes. As XML content is becoming more widespread so is the demand to compress XML data volume. The paper presents the best XML compression ratios reported to date. Its advantage over other XML compression techniques is that it uses syntactic information to enhance compression. Therefore, it is a fully syntactic based XML compression. The syntactic information is parsed from XML documents by an innovative XML parser. We developed a new XML parser-generator for that purpose. Our parser-generator is based on a syntactic dictionary (DTD, XML-Schema, etc.) of the XML in order to create an efficient and compact XML parsers. This XML parser-generator is adopted to streaming technologies and can be used in a wide variety of XML applications such as validators, converters, gateways, routers, browsers editors etc. The parsers? symbols are encoded by a partial prediction matching (PPM) codec.

We compare between the performance of our algorithm and other existing XML compression techniques. The proposed compression algorithm achieves better compression ratio in comparison to other XML compression techniques that do not utilize syntactic structure. The superiority of our compression technique is more evident when it is tested on XML data sets that contain only tags and not free text.

Citation:
S. Harrusi, A. Averbuch, A. Yehudai, "XML Syntax Conscious Compression," dcc, pp.402-411, Data Compression Conference (DCC'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.