loading...
Integer Programming Approaches to Haplotype Inference by Pure Parsimony
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2006.24April-June 2006 (vol. 3 no. 2) pp. 141-154
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
In 2003, Gusfield introduced the Haplotype Inference by Pure Parsimony (HIPP) problem and presented an integer program (IP) that quickly solved many simulated instances of the problem [1]. Although it solved well on small instances, Gusfield's IP can be of exponential size in the worst case. Several authors [2], [3] have presented polynomial-sized IPs for the problem. In this paper, we further the work on IP approaches to HIPP. We extend the existing polynomial-sized IPs by introducing several classes of valid cuts for the IP. We also present a new polynomial-sized IP formulation that is a hybrid between two existing IP formulations and inherits many of the strengths of both. Many problems that are too complex for the exponential-sized formulations can still be solved in our new formulation in a reasonable amount of time. We provide a detailed empirical comparison of these IP formulations on both simulated and real genotype sequences. Our formulation can also be extended in a variety of ways to allow errors in the input or model the structure of the population under consideration.

[1] 141 D. Gusfield, “Haplotype Inference by Pure Parsimony,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 144-155, 2003.
[2] B.V. Halldórsson, V. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail, “A Survey of Computational Methods for Determining Haplotypes,” Computational Methods for SNPs and Haplotype Inference: Proc. DIMACS/RECOMB Satellite Workshop, pp. 26-47, 2004.
[3] G. Lancia, C.M. Pinotti, and R. Rizzi, “Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms,” INFORMS J. Computing, vol. 16, no. 4, pp. 348-359, 2004.
[4] A.G. Clark, K.M. Weiss, D.A. Nickerson, S.L. Taylor, A. Buchanan, J. Stengard, V. Salomaa, E. Vartiainen, M. Perola, E. Boerwinkle, and C.F. Sing, “Haplotype Structure and Population Genetic Inferences from Nucleotide-Sequence Variation in Human Lipoprotein Lipase,” Am. J. Human Genetics, vol. 63, no. 2, pp. 595-612, 1998.
[5] D. Gusfield, “Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms,” J. Computational Biology, vol. 8, no. 3, pp. 305-313. 2001.
[6] N. Edwards, Personal communication to D. Brown, Aug. 2004.
[7] K. Kalpakis and P. Namjoshi, “Haplotype Phasing Using Semidefinite Programming,” Proc. Fifth IEEE Int'l Symp. Bioinformatics and Bioeng., pp. 145-152, 2005.
[8] J. He and A. Zelikovsky, “Linear Reduction for Haplotype Inference,” Proc. Fourth Ann. Workshop Algorithms in Bioinformatics, pp. 242-253, 2004.
[9] M.J. Daly, J.D. Rioux, S.F. Schaffner, T.J. Hudson, and E.S. Lander, “High-Resolution Haplotype Structure in the Human Genome,” Nature Genetics, vol. 29, no. 2, pp. 229-232, 2001.
[10] N. Patil et al., “Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21,” Science, vol. 294, no. 5547, pp. 1719-1723, 2001.
[11] A.G. Clark, “Inference of Haplotypes from PCR-Amplified Samples of Diploid Populations,” Molecular Biology and Evolution, vol. 7, no. 2, pp. 111-112, 1990.
[12] K. Lindblad-Toh et al., “Genome Sequence, Comparative Analysis and Haplotype Structure of the Domestic Dog,” Nature, vol. 438, no. 7069, pp. 803-809, 2005.
[13] D.A. Hinds, L.L. Stuve, G.B. Nilsen, E. Halperin, E. Eskin, D.G. Ballinger, K.A. Frazer, and D.R. Cox, “Whole-Genome Patterns of Common DNA Variation in Three Human Populations,” Science, vol. 307, no. 5712, pp. 1072-1079, 2005.
[14] The Int'l HapMap Consortium, “A Haplotype Map of the Human Genome,” Nature, vol. 437, no. 7063, pp. 1299-1300, 2005.
[15] L. Wang and Y. Xu, “Haplotype Inference by Maximum Parsimony,” Bioinformatics, vol. 19, no. 14, pp. 1773-1780, 2003.
[16] D.G. Brown and I.M. Harrower, “A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis,” Proc. Fourth Ann. Workshop Algorithms in Bioinformatics, pp. 254-265, 2004.
[17] Z. Li, W. Zhou, X.S. Zhang, and L. Chen, “A Parsimonious Tree-Grow Method for Haplotype Inference,” Bioinformatics, vol. 21, no. 17, pp. 3475-3481, 2005.
[18] Y.-T. Huang, K.-M. Chao, and T. Chen, “An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,” J. Computational Biology, vol. 12, no. 10, pp. 1261-1274, 2005.
[19] J.E. Mitchell, Branch-and-Cut Algorithms for Combinatorial Optimization Problems, pp. 65-77. Oxford Univ. Press, Jan. 2002.
[20] A. Charnes, “Optimality and Degeneracy in Linear Programming,” Econometrica, vol. 20, pp. 160-170, 1952.
[21] C. Bron and J. Kerbosch, “Algorithm 457: Finding All Cliques of an Undirected Graph,” Comm. ACM, vol. 16, no. 9, pp. 575-577, 1973.
[22] ILOG CPLEX 8. 1: Reference Manual, ILOG SA, Dec. 2002.
[23] R.R. Hudson, “Generating Samples under a Wright-Fisher Neutral Model of Genetic Variation,” Bioinformatics, vol. 18, no. 2, pp. 337-338, 2002.
[24] The Int'l HapMap Consortium, “Integrating Ethics and Science in the International HapMap Project,” Nature Rev. Genetics, vol. 5, no. 6, pp. 467-475, 2004.
[25] T. Barzuza, J. Beckmann, R. Shamir, and I. Pe'er, “Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs,” Proc. 14th Ann. Symp. Combinatorial Pattern Matching, pp. 14-31, 2004.
[26] V. Bafna, D. Gusfield, S. Hannenhalli, and S. Yooseph, “A Note on Efficient Computation of Haplotypes via Perfect Phylogeny,” J. Computational Biology, vol. 11, no. 5, pp. 858-866, 2004.
[27] V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph, “Haplotyping as Perfect Phylogeny: A Direct Approach,” J. Computational Biology, vol. 10, pp. 323-340, 2003.

Index Terms:
Computations on discrete structures, integer programming, biology and genetics, haplotype inference.
Citation:
Daniel G. Brown, Ian M. Harrower, "Integer Programming Approaches to Haplotype Inference by Pure Parsimony," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 141-154, Apr.-June 2006, doi:10.1109/TCBB.2006.24
Usage of this product signifies your acceptance of the Terms of Use.