LAST finds similar regions between sequences.
LAST can:
- Handle big sequence data, e.g:
- Compare two vertebrate genomes
- Align billions of DNA reads to a genome
- Indicate
the reliability of
each aligned column.
- Use sequence quality
data properly.
- Compare DNA to proteins, with frameshifts.
- Compare PSSMs to sequences
- Calculate the likelihood of chance similarities between random sequences.
LAST cannot (yet):
Documents:
What distinguishes LAST
from BLAST and
similar tools (e.g. BLAT,
LASTZ, YASS)?
- The main difference is that it copes more efficiently with
repeat-rich sequences (e.g. genomes). For example: it can align reads
to genomes without repeat-masking, without becoming overwhelmed by
repetitive hits.
What distinguishes LAST from DNA read mapping tools?
- The main difference is that it can find weak similarities, with
many mismatches and gaps.
Here are some dotplots made using LAST:
Publications:
-
Adaptive seeds tame genomic sequence comparison.
SM Kielbasa, R Wan, K Sato, P Horton, MC Frith, Genome Research 2011.
-
Incorporating sequence quality data into alignment improves DNA read mapping.
MC Frith, R Wan, P Horton, NAR 2010.
-
Parameters for accurate genome alignment.
MC Frith, M Hamada, P Horton, BMC Bioinformatics 2010.
The main technical innovation is that LAST finds initial matches
based on their multiplicity, instead of using a fixed length
(e.g. BLAST uses 11-mers). To find these variable-length matches, it
uses a suffix array (inspired
by Vmatch). To achieve high
sensitivity, it uses a spaced suffix array (or subset suffix array),
analogous to spaced seeds (or subset seeds).
Development:
Contact:
last (ATmark) cbrc (dot) jp