There has been a recent surge in work in probabilistic databases, propelled in large part by the huge increase in noisy data sources -- sensor data, experimental data, data from uncurated sources, and many others. There is a grow- ing need to be able to flexibly represent the uncertainties in the data, and to efficiently query the data. Building on existing probabilistic database work, we present a unifying framework which allows a flexible representation of corre- lated tuple and attribute level uncertainties. An important capability of our representation is the ability to represent shared correlation structures in the data. We provide moti- vating examples to illustrate when such shared correlation structures are likely to exist. Representing shared corre- lations structures allows the use of sophisticated inference techniques based on lifted probabilistic inference that, in turn, allows us to achieve significant speedups while com- puting probabilities for results of user-submitted queries.
Citation:
Prithviraj Sen, Amol Deshpande, Lise Getoor, "Representing Tuple and Attribute Uncertainty in Probabilistic Databases," icdmw, pp.507-512, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 2007