2014-10: I was interviewed on Biostars!
2014-08: Frameshift alignment: statistics and post-genomic applications has been published in Bioinformatics!
2014-03: Explaining the correlations among properties of mammalian promoters has been published in Nucleic Acids Research! (data)
2014-02: Improved search heuristics find 20 000 new alignments between human and mouse genomes has been published in Nucleic Acids Research!


Welcome to Martin Frith's homepage at the CBRC

I am a computational biologist working at the CBRC, which is part of AIST. The CBRC is on the island of Odaiba, a futuristic entertainment district near central Tokyo.

Long term aim: decipher genome sequences

BibleHuman genome
Age 3 thousand years3 billion years
Length 4 million letters3 billion letters

Genomes are palimpsests of unimaginable antiquity, which hold the secrets to technology more advanced than any achievement of human civilization. We live in exciting times: genomes have been sequenced only recently, and we have barely begun to decipher them.

Research style

I have two research styles: developing software tools for analyzing biological data, and investigating biological questions computationally. Recently I have been sucked into tool development, but I'd like to return to biological questions sometime.


LAST LAST is a general-purpose, high-throughput sequence aligner. It can: compare multi-gigabase datasets to each other, use sequence quality data in a rigorous fashion, align DNA to proteins with frameshifts, estimate the reliability of each aligned column. Genome Research, 2011
tantan tantan masks low-complexity regions in biological sequences. It aims to prevent spurious alignments when searching for homologs (evolutionarily-related sequences). It does so much more reliably than previous methods. Nucleic Acids Research, 2011
seg-suite seg-suite provides tools for manipulating segments and alignments. It can compose alignments, find intersections, etc. Unpublished
DNemulator DNemulator is a package for simulating DNA sequencing errors, polymorphisms, cytosine methylation and bisulfite conversion. Nucleic Acids Research, 2012
Paraclu Paraclu is a method for finding clusters in data attached to sequences. For example, transcription start counts in genome sequences. It imposes minimal prior assumptions, and it typically finds a hierarchy of clusters within clusters. Genome Research, 2008
Glam2 GLAM2 is a method for discovering motifs (re-occurring sequence patterns) in sequences. It allows motif instances to vary by insertions and deletions. It is part of the MEME Suite. PLoS Computational Biology, 2008
Clover Clover tests whether known sequence motifs are over-represented in a set of DNA sequences. Nucleic Acids Research, 2004
Cluster-Buster Cluster-Buster finds clusters of pre-specified motifs in DNA sequences. Nucleic Acids Research, 2003


You can find most of my published articles by searching PubMed for Frith MC. (A few are just Frith M.)


Join me

I welcome postdocs and visitors to come and work with me, but you would probably need your own funding. Likely sources include JSPS and HFSP, and there will be others depending on your nationality and other circumstances. Here is a funding guide for Europeans (pdf). Strong quantitative skills are desirable, e.g. from a background in physics or mathematics. Knowledge of biology is not essential, but willingness to learn about and deal with messy biological details is. Here are some project ideas, although original projects are especially welcome. Knowledge of Japanese is not necessary. 日本人も歓迎です。You can apply to work at the CBRC here.


Email: martin followed by @ followed by This may change periodically to avoid spam. For personal email, please use my address.
Address: AIST Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan. Access.
Tel: I prefer email. Fax: +81-3-3599-8081

Valid HTML 4.01

Last modified 2014-10-21