Go to the LAST homepage:
http://last.cbrc.jp/
There are 2 ways of using LAST:
Click on "LAST web service".
We need to input some sequences. Open a new browser window, and go here:
http://last.cbrc.jp/tutorial/webfiles/
Let's compare the human and UNI (ウニ) mitochondria!
Click on human-mito.fasta. The sequence is in the standard fasta format:
Copy and paste the sequence into the LAST web form.
- CTRL-a: select all
- CTRL-c: copy
- CTRL-v: paste
Next, copy and paste uni-mito.fasta into the other box.
Click "Align".
The dotplot shows which parts of the human sequence align to which parts of the UNI sequence.
To see the detailed alignments, click "Coloured alignments".
Each alignment has a score, which indicates the strength of the similarity.
The colours indicate the reliability of each aligned column:
Blue means very reliable (probability > 0.999), and yellow means unreliable (probability ≤ 0.5). The reliability close to gaps is often low. This is because the gaps' correct position is ambiguous.
TASK
Try the HOYA (ホヤ) mitochondrion too. Which mitochondria are most similar?
OPTIONAL TASK
Try some other animal mitochondria. You can get them from here:
http://www.ncbi.nlm.nih.gov/genomes/OrganelleResource.cgi?taxid=33208
Click on an Accession, then choose "Display Settings" → "FASTA (text)".
To use LAST on our own computer, we must use a unix command line (e.g. Linux, Mac, Windows+Cygwin).
Go to the LAST homepage:
http://last.cbrc.jp/
Click on "Download LAST", and save the latest version on your computer.
Open a terminal window:
Now you need to figure out which directory LAST got saved in (maybe "Downloads"?) and move there using the cd (change directory) command:
$ cd Downloads
To see what files are present, use the ls (list) command:
$ ls
You should see something like last-274.zip. Unzip it:
$ unzip last-274.zip
This created a new directory called last-274. Move into it:
$ cd last-274
Now, type make to compile the main LAST programs:
$ make
Next, install the programs. You can do either a standard installation (using sudo to request administrator permission):
$ sudo make install
Or a single-user installation:
$ make install prefix=~
In the latter case, you might need to log out and back in before your computer recognizes the new programs.
Download human-mito.fasta and uni-mito.fasta from here:
http://last.cbrc.jp/tutorial/files/
Open a terminal window, and cd into the directory where the files got saved (maybe "Downloads"?):
$ cd Downloads
First, use lastdb to make an "index" of the human sequence:
$ lastdb my-index human-mito.fasta
Unix hint: You do not have to type all of human-mito.fasta. You can just type the first few letters, then press the TAB key.
Point: You can replace my-index with any name you like. For example, neko.
lastdb created some new files. Type ls to see them:
$ ls human-mito.fasta my-index0.des my-index0.ssp my-index.prj y-pestis.fasta last-274.zip my-index0.prj my-index0.suf plague.fastq my-index0.bck my-index0.sds my-index0.tis uni-mito.fasta
Next, use lastal to compare the UNI sequence to my-index:
$ lastal my-index uni-mito.fasta > out.maf
Point: You can replace out.maf with any name you like.
This made a new file called out.maf. Type ls to see it:
$ ls human-mito.fasta my-index0.des my-index0.ssp my-index.prj uni-mito.fasta last-274.zip my-index0.prj my-index0.suf out.maf y-pestis.fasta my-index0.bck my-index0.sds my-index0.tis plague.fastq
To look at the alignments, use less -S:
$ less -S out.maf
You can use the arrow keys to move around. To quit looking, type "q".
The alignments are in MAF format, which looks like this:
a score=546 s Human 5915 1432 + 16569 CGTTGACTATTCTCTACAAACCACAAAGACATTGGAACACTATACCTATTA s UNI 5796 1432 + 15650 CGATGATTATTTTCTACTAACCACAAGGACATCGGAACACTTTATTTAATT a score=316 s Human 14795 1078 + 16569 CATTCATCGACCTCCCCACCCCATCCAACATCTCCGCATGATGAAACTTC s UNI 14560 1078 + 15650 CATTCGTTGACCTCCCCCTTCCCTCCAACCTTTCCATTTGGTGAAACTCG a score=224 s Human 9099 892 + 16569 ATCTTCACAATTCTAATTCTACTGACTATCCTAGAAATCGCTGTCGCCTTAA s UNI 9221 893 + 15650 ATCCTTATTATTTTTATTTTATTATTTGTGCTAGAAATAGGAGTAGCCTGCA a score=158 s Human 7582 664 + 16569 TTAATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCA s UNI 7707 664 + 15650 TTAATGGGAACTTGAGCACAGTTTGGTCTACAAGATGCATCCTCCCCTCTTA
TASK
Compare these alignments to the ones we found with the web service. Can you guess what all the numbers mean?
(Hint: MAF uses "0-based coordinates", but the web alignments use "1-based coordinates".)
Here is more information on MAF format:
http://genome.ucsc.edu/FAQ/FAQformat.html#format5
If you type each command on its own, it prints a short usage reminder:
$ lastdb lastdb: please give me an output name and sequence file(s) Usage: lastdb [options] output-name fasta-sequence-file(s) Prepare sequences for subsequent alignment with lastal. Main Options: -h: show all options and their default settings -p: interpret the sequences as proteins -c: soft-mask lowercase letters
$ lastal lastal: please give me a database name and sequence file(s) Usage: lastal [options] lastdb-name fasta-sequence-file(s)
The -h option prints a detailed help message, including all options and their default settings:
$ lastdb -h
$ lastal -h
The full documentation is at the LAST website:
We have sequenced some DNA from an outbreak of plague (ペスト)! Our DNA is in fastq format. Let's look at it:
http://last.cbrc.jp/tutorial/files/plague.fastq
It looks like this:
Each sequence has 4 lines.
Current DNA sequencers often make errors. They indicate the reliability of each base using ASCII-coded numbers. We can understand them by consulting an ASCII table:
http://www.ie.u-ryukyu.ac.jp/~e085739/_images/ascii1.png
The symbols indicate error probabilities, like this:
| Symbol | ASCII | error probability |
|---|---|---|
| ! | 33 | 1 |
| + | 43 | 0.1 |
| 5 | 53 | 0.01 |
| ? | 63 | 0.001 |
TASK: What is the symbol for error probability 0.0001?
Here is more information on fastq format:
We would like to compare our DNA to a reference genome of the plague bacterium, Yersinia pestis.
We have the reference genome in fasta format. Let's look at it:
http://last.cbrc.jp/tutorial/files/y-pestis.fasta
![]()
Download plague.fastq and y-pestis.fasta from here:
http://last.cbrc.jp/tutorial/files/
How should we compare them? Let's check the documentation:
http://last.cbrc.jp/doc/last-tutorial.html → Example 6
First, make an index of the genome:
$ lastdb -m1111110 yp y-pestis.fasta
The -m1111110 option makes it better at finding short, strong alignments. By default, it is better at finding long, weak alignments. (If you are interested, 1111110 represents a spaced seed.)
Next, compare our DNA to the genome:
$ lastal -Q1 yp plague.fastq > hit.maf
The -Q1 option tells lastal that the input is in fastq format.
Let's look at the alignments:
$ less -S hit.maf
a score=300 s Y.pestis 3891 50 + 4653728 ATCGCCAACTGAGGTGCTGCCAGTAAACCTACCACCATCTCTTCGATA s read360 0 50 + 50 ATCGCCAACTGAGGTGCTGCCAGTAAACCTACCACCATCTCTTCGATA q read360 9BBCCCBCCCCCBBCBCCBCBCCBCBCBBCCCB:6@C?CBCB<ACBBC a score=300 s Y.pestis 17011 50 + 4653728 ATAACCGCTGAAAGCATCTAAGCGGGAAACTTGCCTCGAGATGAGTC s read520 0 50 + 50 ATAACCGCTGAAAGCATCTAAGCGGGAAACTTGCCTCGAGATGAGTC q read520 8CC?CCCC=CC@CBCCBC@BCCCCCBCCCCCCC@CCCC@CBCC@CCC ...
/read724
We can type "n" to repeat the search, and "N" to search backwards.
TASK
How many alignments are there for read724? What are their scores?
Often, we assume that each DNA read has at most one true alignment. In that case, last-map-probs can estimate the probability that each alignment is true.
Let's look at the documentation:
http://last.cbrc.jp/doc/last-map-probs.html
Let's follow the "typical usage":
$ lastal -Q1 -e120 yp plague.fastq | last-map-probs.py -s150 > h.maf
Unix point: The "|" is a pipe. It sends the output of one command (lastal) to the input of another command (last-map-probs).
Let's look at the result:
$ less -S h.maf
a score=300 mismap=0 s Y.pestis 1635255 50 + 4653728 CCGGTGGTCCCTGTGGCATAGAGTTCGTGTTGTGCCAGTAGATCT s read662 0 50 - 50 CCGGTGGTCCCTGTGGCATAGAGTTCGTGTTGTGCCAGTAGATCT q read662 @?@@@@><???;?<AAAA>:B=B;B:BA?C@B@CC?@B@=ABB@? a score=300 mismap=0.00408535 s Y.pestis 1664783 50 + 4653728 CGTTGATGTGCGCACTGTTGCGGCTGTGATCGCCGACTGGACCGG s read652 0 50 - 50 CGTTGATGTGCGCACTGTTGCGGCTGTGATCGCCGACTGGACCGG q read652 >B@=>?:A=@9A><B>??BB<@<@@A:@:AAB@BA75B@@A@>>7
last-map-probs gives each alignment a mismap probability: the probability that it is wrong (assuming that each DNA read has one correct alignment). It discards alignments with mismap > 0.01.
TASK: What happened to read724?
To understand this more clearly, let's keep the alignments with mismap > 0.01. The last-map-probs option -mX means "discard alignments with mismap > X". To keep all the alignments, use -m1:
$ lastal -Q1 -e120 yp plague.fastq | last-map-probs.py -s150 -m1 > z.maf
TASK: What happened to read724 now?
We can convert the alignments to another format, using maf-convert.py:
$ maf-convert.py blast h.maf > h.blast
This converted them to BLAST format:
$ less -S h.blast
Which looks like:
Query= read360
(50 letters)
>Y.pestis
Length = 4653728
Score = 300
Identities = 50/50 (100%)
Strand = Plus / Plus
Query: 1 ATCGCCAACTGAGGTGCTGCCAGTAAACCTACCACCATCTCTTCGATAAG 50
||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 3892 ATCGCCAACTGAGGTGCTGCCAGTAAACCTACCACCATCTCTTCGATAAG 3941
...
We can convert them to other formats too. Look at the help message:
$ maf-convert.py -h Usage: maf-convert.py --help maf-convert.py axt my-alignments.maf maf-convert.py blast my-alignments.maf maf-convert.py html my-alignments.maf maf-convert.py psl my-alignments.maf maf-convert.py sam my-alignments.maf maf-convert.py tab my-alignments.maf Read MAF-format alignments & write them in another format.
TASK
Convert the alignments to some other formats. Which ones are easy to read? Which ones are compact (good for huge data)?
TASK
(Hint: it might be easier with some formats than others.)