|
Dr. John L. SpougeSenior Investigator, National Center for Biotechnology Information
|
Some New Biological Applications for the Theory of BLAST Statistics
The statistics of BLAST alignment are rightly considered fundamental results in bioinformatics, but they are not as well understood as they deserve. On one hand, the paper of Karlin and Dembo (1) seminal to gapless BLAST alignments overlooks (in favor of more important results) several useful special cases of analytic formulas for p-values (2). On the other hand, Karlin and Dembo investigated gapless BLAST statistics not just for sequences of independent letters but also for Markov sequences. Furthermore, the statistics for gapped BLAST are understood mainly through simulations (3). This talk discusses some new biological findings derived from BLAST statistics.
My group reduced the computation for determining BLAST p-values for an arbitrary scoring scheme from 2 days to 1 sec, making real-time database searches with arbitrary BLAST scoring schemes possible for the first time (4,5). The reduced computation times should also make new bioinformatics applications possible. As an example, I will discuss joint work in progress with Martin Frith at CBRC on over-alignment p-values, which quantify the dependability of flanks of biological alignments. As a specific application, the work suggests that in the UCSC database, e.g., spurious flanks probably comprise more than 18% of the human-fugu genome alignment.
As an example of applications of analytic solutions for gapless BLAST, I consider an application to "positional regulomics", which exploits a multiple sequence alignment anchored on a specific genomic landmark, e.g., the transcription start site. The natural prejudice when associating a piece of DNA with its biological function is to regard the DNA sequence as "having" some intrinsic biological function (e.g., TRANSFAC contains transcription binding factor "sequences"). Against the backdrop of the prejudice, the following phenomenon of "positional regulation" is extraordinary: the regulatory function of a DNA sequence can vary with its position relative to a genomic landmark (6).
If time permits, I will also give an application of the gapless BLAST p-values for Markov chains to finding repeats in biological sequences (7)
References
|
| (1) | Karlin, S. and Dembo, A. (1992) Advances in Applied Probability, 24, 113-140. |
| (2) | Frith, M.C., Spouge, J.L., Hansen, U. and Weng, Z. (2002) Nucleic Acids Res, 30, 3214-3224. |
| (3) | Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V. and Altschul, S.F. (2001) Nucleic Acids Research, 29, 2994-3005. |
| (4) | Park, Y., Sheetlin, S. and Spouge, J.L. (2005) J Phys A: Math Gen, 38, 97-108. |
| (5) | Sheetlin, S., Park, Y. and Spouge, J.L. (2005) Nucleic Acids Res, 33, 4987-4994. |
| (6) | Tharakaraman, K., Bodenreider, O., Landsman, D., Spouge, J.L. and Marino-Ramirez, L. (2008) Nucleic Acids Res, 36, 2777-2786. |
| (7) | Spouge, J.L. (2007) Journal of Applied Probability, 44, 1122-1122. |
| | |
Profile
| Employment |
|
| 2002-present |
Adjunct Professor of Bioinformatics College of Engineering Boston University |
| 2001-present |
Tenured Senior Investigator National Center for Biotechnology Information National Library of Medicine Sponsor:Dr. D.J. Lipman |
| 1989-2001 |
Visiting Scientist National Center for Biotechnology Information National Library of Medicine Sponsor: Dr. D.A.Benson |
| 1986 (on leave of absence from NIH, April 9 to July 31) |
|
Research Assistant Institute of Science and Technology University of Manchester Sponsor: Prof. R.F. Stepto On Loan to: Laboratory of Prof. W. Burchard Institut fur Makromolekulare Chemie Freiburg, West Germany |
| 1985-1988 |
Visiting Associate Laboratory of Mathematical Biology National Institutes of Health Sponsor: Dr. J.V. Maizel |
| 1983-1985 |
Postdoctoral Fellow Theoretical Biology and Biophysics Group Los Alamos National Laboratory, Los Alamos, NM Sponsor: Dr. A.S. Perelson |
| 1979-1983 |
Postdoctoral Fellow of the Canadian Medical Research Council |
| Education |
|
| 1979-1983 |
Oxford University, D. Phil., Mathematics Th esis Title: A Probabilistic Approach to Coagulation Supervisor: Dr. J. M. Hammersley |
| 1975-1979 |
University of British Columbia, M.D. |
| 1971-1975 |
University of British Columbia, Honors BSc Mathematics |
|
|
|