Document sanitization, i.e., the process of removing orgeneralizing sensitive information in order to reduce the security classification of the document, is widely used todayin applications of information sharing. Traditional document sanitization systems focus on removal or generalization of certain words and phrases, but do not take into account the utility of the sanitized documents. This leads to a gap between the sanitized documents and the users’ requirements. Proposed in this paper is a formal framework and conceptual algorithms for optimal document sanitization based on meta-labeling. Each document is associated with a meta-label, which serves to determine both the security label and the utility of the document. In the sanitization process, the system first computes a new meta-label for the sanitized version and then sanitizes the document through mediators guided by the new meta-label. Algorithms are provided to compute a new meta-label that is proven to satisfy the security requirements and provide maximal utility with respect to users’ requirements, which are also represented by a meta-label.
Index Terms:
Document Sanitization, Meta-label, Data Utility
Citation:
Lei Zhang, Alexander Brodsky, Vipin Swarup, Sushil Jajodia, "A Framework for Maximizing Utility of Sanitized Documents Based on Meta-labeling," policy, pp.181-188, 2008 IEEE Workshop on Policies for Distributed Systems and Networks, 2008