Alignment vs. Sum of All Alignments Scoring for Motif Extraction

Paul Horton

In this paper we classify a large body of work on motif extraction (also known as gapless local multiple alignment) into two basic optimization problems.
Most work in this field can easily be cast as an optimization problem with the set of all possible alignments as its feasible set.
We describe two additional ways two define the feasible set for this problem and show that those definitions lead to theoretically efficient algorithms for fixed motif lengths. We also introduce a heuristic for this problem formulation, the MM heuristic, which is procedurally similar to the EM heuristic. We show limited, but very promising performance data for that heuristic.
In contrast, we show that the optimization problem implicit in the EM heuristic, as it has been applied to this field, is significantly different. It is naturally characterized as having the set of all possible model parameters as its feasible set and the sum of a likelihood function, summed over all alignments, as its scoring function. Rephrased in HMM terminology we show that: the EM heuristic used in the field of motif extraction is a special case of the usual EM heuristic used for learning HMM's using sum of paths likelihood, while MM and the others use Viterbi learning.


Real World Computing Partnership