loading...
Declarative Querying for Biological Sequences
Atlanta, Georgia April 03-April 07
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2006.4722nd International Conference on Data ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sandeep Tata, University of Michigan
James S. Friedman, University of Michigan
Anand Swaroop, University of Michigan
The ongoing revolution in life sciences research is producing vast amounts of genetic and proteomic sequence data. Scientists want to pose increasingly complex queries on this data, but current methods for querying biological sequences are primitive and largely procedural. This limits the ease with which complex queries can be posed, and often results in very inefficient query plans. There is a growing and urgent need for declarative and efficient methods for querying biological sequence data. In this paper, we introduce a system called Periscope/SQ which addresses this need. Queries in our system are based on a well-defined extension of relational algebra. We introduce new physical operators and support for novel indexes in the database. As part of the optimization framework, we describe a new technique for selectivity estimation of string pattern matching predicates that is more accurate than previous methods. We also describe a simple, yet highly effective algorithm to optimize sequence queries. Finally, using a real-world application in eye genetics, we show how Periscope/SQ can be used to achieve a speedup of two orders of magnitude over existing procedural methods!
Citation:
Sandeep Tata, James S. Friedman, Anand Swaroop, "Declarative Querying for Biological Sequences," icde, pp.87, 22nd International Conference on Data Engineering (ICDE'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.