Martin@652
|
1 |
last-dotplot
|
Martin@652
|
2 |
============
|
Martin@652
|
3 |
|
Martin@652
|
4 |
This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
|
Martin@652
|
5 |
alignments in MAF or LAST tabular format. It requires the Python
|
Martin@652
|
6 |
Imaging Library to be installed. It can be used like this::
|
Martin@652
|
7 |
|
Martin@652
|
8 |
last-dotplot my-alignments my-plot.png
|
Martin@652
|
9 |
|
Martin@652
|
10 |
The output can be in any format supported by the Imaging Library::
|
Martin@652
|
11 |
|
Martin@652
|
12 |
last-dotplot alns alns.gif
|
Martin@652
|
13 |
|
Martin@652
|
14 |
To get a nicer font, try something like::
|
Martin@652
|
15 |
|
Martin@652
|
16 |
last-dotplot -f /usr/share/fonts/truetype/freefont/FreeSans.ttf alns alns.png
|
Martin@652
|
17 |
|
Martin@652
|
18 |
If the fonts are located somewhere different on your computer, change
|
Martin@652
|
19 |
this as appropriate.
|
Martin@652
|
20 |
|
Martin@652
|
21 |
Choosing sequences
|
Martin@652
|
22 |
------------------
|
Martin@652
|
23 |
|
Martin@652
|
24 |
If there are too many sequences, the dotplot will be very cluttered,
|
Martin@652
|
25 |
or the script might give up with an error message. You can exclude
|
Martin@652
|
26 |
sequences with names like "chrUn_random522" like this::
|
Martin@652
|
27 |
|
Martin@652
|
28 |
last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png
|
Martin@652
|
29 |
|
Martin@652
|
30 |
Option "-1" selects sequences from the 1st genome, and "-2" selects
|
Martin@652
|
31 |
sequences from the 2nd genome. 'chr[!U]*' is a *pattern* that
|
Martin@652
|
32 |
specifies names starting with "chr", followed by any character except
|
Martin@652
|
33 |
U, followed by anything.
|
Martin@652
|
34 |
|
Martin@652
|
35 |
========== =============================
|
Martin@652
|
36 |
Pattern Meaning
|
Martin@652
|
37 |
---------- -----------------------------
|
Martin@652
|
38 |
``*`` zero or more of any character
|
Martin@652
|
39 |
``?`` any single character
|
Martin@652
|
40 |
``[abc]`` any character in abc
|
Martin@652
|
41 |
``[!abc]`` any character not in abc
|
Martin@652
|
42 |
========== =============================
|
Martin@652
|
43 |
|
Martin@652
|
44 |
If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
|
Martin@652
|
45 |
compared to both the whole name and the part after the dot.
|
Martin@652
|
46 |
|
Martin@652
|
47 |
You can specify more than one pattern, e.g. this gets sequences with
|
Martin@652
|
48 |
names starting in "chr" followed by one or two characters::
|
Martin@652
|
49 |
|
Martin@652
|
50 |
last-dotplot -1 'chr?' -1 'chr??' alns alns.png
|
Martin@652
|
51 |
|
Martin@840
|
52 |
You can also specify a sequence range; for example this gets the first
|
Martin@840
|
53 |
1000 bases of chr9::
|
Martin@840
|
54 |
|
Martin@840
|
55 |
last-dotplot -1 chr9:0-1000 alns alns.png
|
Martin@840
|
56 |
|
Martin@652
|
57 |
Options
|
Martin@652
|
58 |
-------
|
Martin@652
|
59 |
|
Martin@652
|
60 |
-h, --help
|
Martin@652
|
61 |
Show a help message, with default option values, and exit.
|
Martin@652
|
62 |
-1 PATTERN, --seq1=PATTERN
|
Martin@652
|
63 |
Which sequences to show from the 1st genome.
|
Martin@652
|
64 |
-2 PATTERN, --seq2=PATTERN
|
Martin@652
|
65 |
Which sequences to show from the 2nd genome.
|
Martin@652
|
66 |
-x WIDTH, --width=WIDTH
|
Martin@652
|
67 |
Maximum width in pixels.
|
Martin@652
|
68 |
-y HEIGHT, --height=HEIGHT
|
Martin@652
|
69 |
Maximum height in pixels.
|
Martin@652
|
70 |
-f FILE, --fontfile=FILE
|
Martin@652
|
71 |
TrueType or OpenType font file.
|
Martin@652
|
72 |
-s SIZE, --fontsize=SIZE
|
Martin@652
|
73 |
TrueType or OpenType font size.
|
Martin@652
|
74 |
-c COLOR, --forwardcolor=COLOR
|
Martin@652
|
75 |
Color for forward alignments.
|
Martin@652
|
76 |
-r COLOR, --reversecolor=COLOR
|
Martin@652
|
77 |
Color for reverse alignments.
|
Martin@839
|
78 |
--trim1
|
Martin@839
|
79 |
Trim unaligned sequence flanks from the 1st genome.
|
Martin@839
|
80 |
--trim2
|
Martin@839
|
81 |
Trim unaligned sequence flanks from the 2nd genome.
|
Martin@845
|
82 |
--bed1=FILE
|
Martin@845
|
83 |
Read `BED-format
|
Martin@845
|
84 |
<https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
|
Martin@845
|
85 |
annotations for the 1st genome. They are drawn as rectangles,
|
Martin@845
|
86 |
with coordinates given by the first three BED fields. The color
|
Martin@845
|
87 |
is specified by the RGB field if present, else pale red if the
|
Martin@845
|
88 |
strand is "+", pale blue if "-", or pale purple.
|
Martin@845
|
89 |
--bed2=FILE
|
Martin@845
|
90 |
Read BED-format annotations for the 2nd genome.
|
Martin@652
|
91 |
|
Martin@652
|
92 |
Unsequenced gap options
|
Martin@652
|
93 |
~~~~~~~~~~~~~~~~~~~~~~~
|
Martin@652
|
94 |
|
Martin@652
|
95 |
Note: these "gaps" are *not* alignment gaps (indels): they are regions
|
Martin@652
|
96 |
of unknown sequence.
|
Martin@652
|
97 |
|
Martin@652
|
98 |
--gap1=FILE
|
Martin@652
|
99 |
Read unsequenced gaps in the 1st genome from an agp or gap file.
|
Martin@652
|
100 |
--gap2=FILE
|
Martin@652
|
101 |
Read unsequenced gaps in the 2nd genome from an agp or gap file.
|
Martin@652
|
102 |
--bridged-color=COLOR
|
Martin@652
|
103 |
Color for bridged gaps.
|
Martin@652
|
104 |
--unbridged-color=COLOR
|
Martin@652
|
105 |
Color for unbridged gaps.
|
Martin@652
|
106 |
|
Martin@652
|
107 |
An unsequenced gap will be shown only if it covers at least one whole
|
Martin@652
|
108 |
pixel.
|