LAST finds similar regions between sequences.
- Handle big sequence data, e.g:
- Compare two vertebrate genomes
- Align billions of DNA reads to a genome
the reliability of
each aligned column.
- Use sequence quality
- Compare DNA to proteins, with frameshifts.
- Calculate the likelihood of chance similarities between random sequences.
- Do split and spliced alignment.
- Train alignment parameters for
unusual kinds of sequence (e.g. nanopore).
Development & old versions:
Here are some dotplots made using LAST:
What distinguishes LAST
from BLAST and
similar tools (e.g. BLAT,
- The main difference is that it copes more efficiently with
repeat-rich sequences (e.g. genomes). For example: it can align reads
to genomes without repeat-masking, without becoming overwhelmed by
What distinguishes LAST from DNA read mapping tools?
- The main difference is that it can find weak similarities, with
many mismatches and gaps.
The main technical innovation is that LAST finds initial matches
based on their multiplicity, instead of using a fixed length
(e.g. BLAST uses 11-mers). To find these variable-length matches, it
uses a suffix array (inspired
by Vmatch). To achieve high
sensitivity, it uses a spaced suffix array (or subset suffix array),
analogous to spaced seeds (or subset seeds).
Public mailing list:
last-align (ATmark) googlegroups (dot) com