Martin@652
|
1 |
last-dotplot
|
Martin@652
|
2 |
============
|
Martin@652
|
3 |
|
Martin@652
|
4 |
This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
|
Martin@878
|
5 |
alignments in MAF or LAST tabular format. It requires the `Python
|
Martin@878
|
6 |
Imaging Library <https://pillow.readthedocs.io/>`_ to be installed.
|
Martin@878
|
7 |
It can be used like this::
|
Martin@652
|
8 |
|
Martin@652
|
9 |
last-dotplot my-alignments my-plot.png
|
Martin@652
|
10 |
|
Martin@652
|
11 |
The output can be in any format supported by the Imaging Library::
|
Martin@652
|
12 |
|
Martin@652
|
13 |
last-dotplot alns alns.gif
|
Martin@652
|
14 |
|
Martin@652
|
15 |
To get a nicer font, try something like::
|
Martin@652
|
16 |
|
Martin@878
|
17 |
last-dotplot -f /usr/share/fonts/liberation/LiberationSans-Regular.ttf alns alns.png
|
Martin@878
|
18 |
|
Martin@878
|
19 |
or::
|
Martin@878
|
20 |
|
Martin@878
|
21 |
last-dotplot -f /Library/Fonts/Arial.ttf alns alns.png
|
Martin@652
|
22 |
|
Martin@652
|
23 |
If the fonts are located somewhere different on your computer, change
|
Martin@652
|
24 |
this as appropriate.
|
Martin@652
|
25 |
|
Martin@652
|
26 |
Choosing sequences
|
Martin@652
|
27 |
------------------
|
Martin@652
|
28 |
|
Martin@652
|
29 |
If there are too many sequences, the dotplot will be very cluttered,
|
Martin@652
|
30 |
or the script might give up with an error message. You can exclude
|
Martin@652
|
31 |
sequences with names like "chrUn_random522" like this::
|
Martin@652
|
32 |
|
Martin@652
|
33 |
last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png
|
Martin@652
|
34 |
|
Martin@850
|
35 |
Option "-1" selects sequences from the 1st (horizontal) genome, and
|
Martin@850
|
36 |
"-2" selects sequences from the 2nd (vertical) genome. 'chr[!U]*' is
|
Martin@850
|
37 |
a *pattern* that specifies names starting with "chr", followed by any
|
Martin@850
|
38 |
character except U, followed by anything.
|
Martin@652
|
39 |
|
Martin@652
|
40 |
========== =============================
|
Martin@652
|
41 |
Pattern Meaning
|
Martin@652
|
42 |
---------- -----------------------------
|
Martin@652
|
43 |
``*`` zero or more of any character
|
Martin@652
|
44 |
``?`` any single character
|
Martin@652
|
45 |
``[abc]`` any character in abc
|
Martin@652
|
46 |
``[!abc]`` any character not in abc
|
Martin@652
|
47 |
========== =============================
|
Martin@652
|
48 |
|
Martin@652
|
49 |
If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
|
Martin@652
|
50 |
compared to both the whole name and the part after the dot.
|
Martin@652
|
51 |
|
Martin@652
|
52 |
You can specify more than one pattern, e.g. this gets sequences with
|
Martin@652
|
53 |
names starting in "chr" followed by one or two characters::
|
Martin@652
|
54 |
|
Martin@652
|
55 |
last-dotplot -1 'chr?' -1 'chr??' alns alns.png
|
Martin@652
|
56 |
|
Martin@840
|
57 |
You can also specify a sequence range; for example this gets the first
|
Martin@840
|
58 |
1000 bases of chr9::
|
Martin@840
|
59 |
|
Martin@840
|
60 |
last-dotplot -1 chr9:0-1000 alns alns.png
|
Martin@840
|
61 |
|
Martin@652
|
62 |
Options
|
Martin@652
|
63 |
-------
|
Martin@652
|
64 |
|
Martin@652
|
65 |
-h, --help
|
Martin@652
|
66 |
Show a help message, with default option values, and exit.
|
Martin@866
|
67 |
-v, --verbose
|
Martin@866
|
68 |
Show progress messages & data about the plot.
|
Martin@652
|
69 |
-1 PATTERN, --seq1=PATTERN
|
Martin@850
|
70 |
Which sequences to show from the 1st (horizontal) genome.
|
Martin@652
|
71 |
-2 PATTERN, --seq2=PATTERN
|
Martin@850
|
72 |
Which sequences to show from the 2nd (vertical) genome.
|
Martin@652
|
73 |
-x WIDTH, --width=WIDTH
|
Martin@652
|
74 |
Maximum width in pixels.
|
Martin@652
|
75 |
-y HEIGHT, --height=HEIGHT
|
Martin@652
|
76 |
Maximum height in pixels.
|
Martin@652
|
77 |
-c COLOR, --forwardcolor=COLOR
|
Martin@652
|
78 |
Color for forward alignments.
|
Martin@652
|
79 |
-r COLOR, --reversecolor=COLOR
|
Martin@652
|
80 |
Color for reverse alignments.
|
Martin@851
|
81 |
--sort1=N
|
Martin@851
|
82 |
Put the 1st genome's sequences left-to-right in order of: their
|
Martin@851
|
83 |
appearance in the input (0), their names (1), their lengths (2).
|
Martin@851
|
84 |
--sort2=N
|
Martin@851
|
85 |
Put the 2nd genome's sequences top-to-bottom in order of: their
|
Martin@851
|
86 |
appearance in the input (0), their names (1), their lengths (2).
|
Martin@839
|
87 |
--trim1
|
Martin@850
|
88 |
Trim unaligned sequence flanks from the 1st (horizontal) genome.
|
Martin@839
|
89 |
--trim2
|
Martin@850
|
90 |
Trim unaligned sequence flanks from the 2nd (vertical) genome.
|
Martin@852
|
91 |
--border-pixels=INT
|
Martin@852
|
92 |
Number of pixels between sequences.
|
Martin@852
|
93 |
--border-color=COLOR
|
Martin@852
|
94 |
Color for pixels between sequences.
|
Martin@652
|
95 |
|
Martin@850
|
96 |
Text options
|
Martin@850
|
97 |
~~~~~~~~~~~~
|
Martin@850
|
98 |
|
Martin@850
|
99 |
-f FILE, --fontfile=FILE
|
Martin@850
|
100 |
TrueType or OpenType font file.
|
Martin@850
|
101 |
-s SIZE, --fontsize=SIZE
|
Martin@850
|
102 |
TrueType or OpenType font size.
|
Martin@878
|
103 |
--rot1=ROT
|
Martin@878
|
104 |
Text rotation for the 1st genome: h(orizontal) or v(ertical).
|
Martin@878
|
105 |
--rot2=ROT
|
Martin@878
|
106 |
Text rotation for the 2nd genome: h(orizontal) or v(ertical).
|
Martin@850
|
107 |
--lengths1
|
Martin@850
|
108 |
Show sequence lengths for the 1st (horizontal) genome.
|
Martin@850
|
109 |
--lengths2
|
Martin@850
|
110 |
Show sequence lengths for the 2nd (vertical) genome.
|
Martin@850
|
111 |
|
Martin@860
|
112 |
Annotation options
|
Martin@860
|
113 |
~~~~~~~~~~~~~~~~~~
|
Martin@860
|
114 |
|
Martin@860
|
115 |
These options read annotations of sequence segments, and draw them as
|
Martin@860
|
116 |
colored horizontal or vertical stripes. This looks good only if the
|
Martin@860
|
117 |
annotations are reasonably sparse: e.g. you can't sensibly view 20000
|
Martin@860
|
118 |
gene annotations in one small dotplot.
|
Martin@860
|
119 |
|
Martin@860
|
120 |
--bed1=FILE
|
Martin@860
|
121 |
Read `BED-format
|
Martin@860
|
122 |
<https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
|
Martin@860
|
123 |
annotations for the 1st genome. They are drawn as stripes, with
|
Martin@860
|
124 |
coordinates given by the first three BED fields. The color is
|
Martin@860
|
125 |
specified by the RGB field if present, else pale red if the
|
Martin@860
|
126 |
strand is "+", pale blue if "-", or pale purple.
|
Martin@860
|
127 |
--bed2=FILE
|
Martin@860
|
128 |
Read BED-format annotations for the 2nd genome.
|
Martin@860
|
129 |
--rmsk1=FILE
|
Martin@860
|
130 |
Read repeat annotations for the 1st genome, in RepeatMasker .out
|
Martin@860
|
131 |
or rmsk.txt format. The color is pale purple for "low
|
Martin@860
|
132 |
complexity" and "simple repeats", else pale red for "+" strand
|
Martin@860
|
133 |
and pale blue for "-" strand.
|
Martin@860
|
134 |
--rmsk2=FILE
|
Martin@860
|
135 |
Read repeat annotations for the 2nd genome.
|
Martin@860
|
136 |
|
Martin@860
|
137 |
Gene options
|
Martin@860
|
138 |
~~~~~~~~~~~~
|
Martin@860
|
139 |
|
Martin@860
|
140 |
--genePred1=FILE
|
Martin@860
|
141 |
Read gene annotations for the 1st genome in `genePred format
|
Martin@860
|
142 |
<https://genome.ucsc.edu/FAQ/FAQformat.html#format9>`_.
|
Martin@860
|
143 |
--genePred2=FILE
|
Martin@860
|
144 |
Read gene annotations for the 2nd genome.
|
Martin@860
|
145 |
--exon-color=COLOR
|
Martin@860
|
146 |
Color for exons.
|
Martin@860
|
147 |
--cds-color=COLOR
|
Martin@860
|
148 |
Color for protein-coding regions.
|
Martin@860
|
149 |
|
Martin@652
|
150 |
Unsequenced gap options
|
Martin@652
|
151 |
~~~~~~~~~~~~~~~~~~~~~~~
|
Martin@652
|
152 |
|
Martin@652
|
153 |
Note: these "gaps" are *not* alignment gaps (indels): they are regions
|
Martin@652
|
154 |
of unknown sequence.
|
Martin@652
|
155 |
|
Martin@652
|
156 |
--gap1=FILE
|
Martin@652
|
157 |
Read unsequenced gaps in the 1st genome from an agp or gap file.
|
Martin@652
|
158 |
--gap2=FILE
|
Martin@652
|
159 |
Read unsequenced gaps in the 2nd genome from an agp or gap file.
|
Martin@652
|
160 |
--bridged-color=COLOR
|
Martin@652
|
161 |
Color for bridged gaps.
|
Martin@652
|
162 |
--unbridged-color=COLOR
|
Martin@652
|
163 |
Color for unbridged gaps.
|
Martin@652
|
164 |
|
Martin@652
|
165 |
An unsequenced gap will be shown only if it covers at least one whole
|
Martin@652
|
166 |
pixel.
|
Martin@860
|
167 |
|
Martin@860
|
168 |
Colors
|
Martin@860
|
169 |
~~~~~~
|
Martin@860
|
170 |
|
Martin@860
|
171 |
Colors can be specified in `various ways described here
|
Martin@860
|
172 |
<http://effbot.org/imagingbook/imagecolor.htm>`_.
|