LAST Performance Tuning

This document tells you how to trade-off speed, sensitivity, and memory and disk usage.

Ideally, the default settings would always work well. Unfortunately, there is too great a variety of challenging alignment tasks, and the LAST developers lack experience with most of them.

LAST must have some defaults, and any choice will displease someone. It is wrong to say "LAST is faster but less sensitive than method X", or "slower but more sensitive than method Y", without varying the defaults.

Sparsity options

It's advisable to use at most one sparsity option: combining them will likely give poor sensitivity.

lastal -k

By default lastal looks for initial matches starting at every position in the query sequence(s), but -k2 makes it check every 2nd position, -k3 every 3rd position, etc. Compared to the other sparsity options, this increases speed the most while reducing sensitivity the least.

lastdb -w

By default lastdb indexes every position in the reference sequence(s), but -w2 makes it index every 2nd position, -w3 every 3rd position, etc. Compared to the other sparsity options, this decreases memory and disk use the most while reducing sensitivity the least.

This has a complex effect on the speed and sensitivity of lastal. LAST uses initial matches that are sufficiently rare: this option makes it lose some matches (because those positions are not indexed), but gain others (because they are rarer). In practice, small values of w (e.g. 2) sometimes make lastal slower and more sensitive, but very large values will eventually reduce sensitivity.

Among other aligners, MegaBLAST indexes every 5th position, and BLAT indexes every 11th position.

lastdb -W

This makes LAST check for initial matches starting at only some positions, in both query and reference. This decreases memory and disk use, and increases speed of lastdb and lastal, but reduces sensitivity.

Specifically, this makes LAST look for initial matches starting only at positions that are "minimum" in any window of W consecutive positions. "Minimum" means that the sequence starting here is alphabetically earliest.

The fraction of positions that are "minimum" is roughly: 2 / (W + 1).

lastdb8 & lastal8

If your reference has more than about 4 billion letters, 8-byte LAST may be beneficial. Ordinary (4-byte) LAST cannot directly handle so much data, so it splits it into volumes, which is inefficient. 8-byte LAST can handle such data without voluming, but it uses more memory.

8-byte LAST combines well with lastdb option -w or -W, which reduce memory usage. Something like lastdb8 -W63 enables rapid, huge-scale homology search, with moderate memory usage, but low sensitivity.

Other options

lastal -m

This option trades speed for sensitivity. It sets the rareness limit for initial matches: initial matches are lengthened until they occur at most this many times in the lastdb volume. The default is 10. So -m100 makes it more sensitive but slower, by using more initial matches.

lastal -l

This option makes lastal faster but less sensitive. It sets the minimum length of initial matches, e.g. -l50 means length 50. (The default is 1). This can make it much faster, and the sensitivity is adequate if the alignments contain long, gapless, high-identity matches.

lastal -C

This option (gapless alignment culling) can make lastal faster but less sensitive. It can also reduce redundant output. For example, -C2 makes it discard alignments (before gapped extension) whose query coordinates lie in those of 2 or more stronger alignments. This works well for aligning long, repeat-rich, indel-poor sequences (e.g. mammal chromosomes) without repeat-masking.

lastal -z

This option can make lastal faster but less sensitive. It sets the maximum score drop in alignments, in the gapped extension phase. Lower values make it faster, by quitting unpromising extensions sooner. The default aims at best accuracy.

You can set this option in several ways: perhaps the most intuitive is via maximum gap length. For example, -z10g sets the maximum score drop such that the longest possible gap length is 10.

lastal -x

This option (preliminary gapped extension) can make lastal faster but less sensitive. For example, -x2g makes it extend gapped alignments with a maximum gap length of 2, discard those with score below the gapped score threshold, then redo the survivors with the final max score drop (z).

lastal -M

This option requests "minimum-difference" alignment, which is faster but cruder than standard gapped alignment. This treats all matches the same, and minimizes the number of differences (mismatches plus gaps).

lastal -j1

This option requests gapless alignment, which is even faster. (You could get the same effect by using very high gap costs, but -j1 is faster because it skips the gapping phase entirely.)

lastal -f

Option -fTAB reduces the output size, which can improve speed.

lastdb -i

This option makes lastdb faster, but disables some lastal options. If lastdb is too slow, try -i10.

lastdb -C2

This option may make lastal a bit faster, but uses more memory and disk, and makes lastdb slower. If these downsides are no problem, you may as well try it.

Repeat masking

This can make LAST much faster, produce less output, and reduce memory and disk usage. Please see last-repeats.html.