doc/last-dotplot.txt
author Martin C. Frith
Tue Oct 03 18:25:07 2017 +0900 (2017-10-03)
changeset 878 20f5c97a3cfd
parent 866 5182d8528ce9
child 898 f6a9c15287ea
permissions -rw-r--r--
Add text-rotation options to last-dotplot
Martin@652
     1
last-dotplot
Martin@652
     2
============
Martin@652
     3
Martin@652
     4
This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
Martin@878
     5
alignments in MAF or LAST tabular format.  It requires the `Python
Martin@878
     6
Imaging Library <https://pillow.readthedocs.io/>`_ to be installed.
Martin@878
     7
It can be used like this::
Martin@652
     8
Martin@652
     9
  last-dotplot my-alignments my-plot.png
Martin@652
    10
Martin@652
    11
The output can be in any format supported by the Imaging Library::
Martin@652
    12
Martin@652
    13
  last-dotplot alns alns.gif
Martin@652
    14
Martin@652
    15
To get a nicer font, try something like::
Martin@652
    16
Martin@878
    17
  last-dotplot -f /usr/share/fonts/liberation/LiberationSans-Regular.ttf alns alns.png
Martin@878
    18
Martin@878
    19
or::
Martin@878
    20
Martin@878
    21
  last-dotplot -f /Library/Fonts/Arial.ttf alns alns.png
Martin@652
    22
Martin@652
    23
If the fonts are located somewhere different on your computer, change
Martin@652
    24
this as appropriate.
Martin@652
    25
Martin@652
    26
Choosing sequences
Martin@652
    27
------------------
Martin@652
    28
Martin@652
    29
If there are too many sequences, the dotplot will be very cluttered,
Martin@652
    30
or the script might give up with an error message.  You can exclude
Martin@652
    31
sequences with names like "chrUn_random522" like this::
Martin@652
    32
Martin@652
    33
  last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png
Martin@652
    34
Martin@850
    35
Option "-1" selects sequences from the 1st (horizontal) genome, and
Martin@850
    36
"-2" selects sequences from the 2nd (vertical) genome.  'chr[!U]*' is
Martin@850
    37
a *pattern* that specifies names starting with "chr", followed by any
Martin@850
    38
character except U, followed by anything.
Martin@652
    39
Martin@652
    40
==========  =============================
Martin@652
    41
Pattern     Meaning
Martin@652
    42
----------  -----------------------------
Martin@652
    43
``*``       zero or more of any character
Martin@652
    44
``?``       any single character
Martin@652
    45
``[abc]``   any character in abc
Martin@652
    46
``[!abc]``  any character not in abc
Martin@652
    47
==========  =============================
Martin@652
    48
Martin@652
    49
If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
Martin@652
    50
compared to both the whole name and the part after the dot.
Martin@652
    51
Martin@652
    52
You can specify more than one pattern, e.g. this gets sequences with
Martin@652
    53
names starting in "chr" followed by one or two characters::
Martin@652
    54
Martin@652
    55
  last-dotplot -1 'chr?' -1 'chr??' alns alns.png
Martin@652
    56
Martin@840
    57
You can also specify a sequence range; for example this gets the first
Martin@840
    58
1000 bases of chr9::
Martin@840
    59
Martin@840
    60
  last-dotplot -1 chr9:0-1000 alns alns.png
Martin@840
    61
Martin@652
    62
Options
Martin@652
    63
-------
Martin@652
    64
Martin@652
    65
  -h, --help
Martin@652
    66
      Show a help message, with default option values, and exit.
Martin@866
    67
  -v, --verbose
Martin@866
    68
      Show progress messages & data about the plot.
Martin@652
    69
  -1 PATTERN, --seq1=PATTERN
Martin@850
    70
      Which sequences to show from the 1st (horizontal) genome.
Martin@652
    71
  -2 PATTERN, --seq2=PATTERN
Martin@850
    72
      Which sequences to show from the 2nd (vertical) genome.
Martin@652
    73
  -x WIDTH, --width=WIDTH
Martin@652
    74
      Maximum width in pixels.
Martin@652
    75
  -y HEIGHT, --height=HEIGHT
Martin@652
    76
      Maximum height in pixels.
Martin@652
    77
  -c COLOR, --forwardcolor=COLOR
Martin@652
    78
      Color for forward alignments.
Martin@652
    79
  -r COLOR, --reversecolor=COLOR
Martin@652
    80
      Color for reverse alignments.
Martin@851
    81
  --sort1=N
Martin@851
    82
      Put the 1st genome's sequences left-to-right in order of: their
Martin@851
    83
      appearance in the input (0), their names (1), their lengths (2).
Martin@851
    84
  --sort2=N
Martin@851
    85
      Put the 2nd genome's sequences top-to-bottom in order of: their
Martin@851
    86
      appearance in the input (0), their names (1), their lengths (2).
Martin@839
    87
  --trim1
Martin@850
    88
      Trim unaligned sequence flanks from the 1st (horizontal) genome.
Martin@839
    89
  --trim2
Martin@850
    90
      Trim unaligned sequence flanks from the 2nd (vertical) genome.
Martin@852
    91
  --border-pixels=INT
Martin@852
    92
      Number of pixels between sequences.
Martin@852
    93
  --border-color=COLOR
Martin@852
    94
      Color for pixels between sequences.
Martin@652
    95
Martin@850
    96
Text options
Martin@850
    97
~~~~~~~~~~~~
Martin@850
    98
Martin@850
    99
  -f FILE, --fontfile=FILE
Martin@850
   100
      TrueType or OpenType font file.
Martin@850
   101
  -s SIZE, --fontsize=SIZE
Martin@850
   102
      TrueType or OpenType font size.
Martin@878
   103
  --rot1=ROT
Martin@878
   104
      Text rotation for the 1st genome: h(orizontal) or v(ertical).
Martin@878
   105
  --rot2=ROT
Martin@878
   106
      Text rotation for the 2nd genome: h(orizontal) or v(ertical).
Martin@850
   107
  --lengths1
Martin@850
   108
      Show sequence lengths for the 1st (horizontal) genome.
Martin@850
   109
  --lengths2
Martin@850
   110
      Show sequence lengths for the 2nd (vertical) genome.
Martin@850
   111
Martin@860
   112
Annotation options
Martin@860
   113
~~~~~~~~~~~~~~~~~~
Martin@860
   114
Martin@860
   115
These options read annotations of sequence segments, and draw them as
Martin@860
   116
colored horizontal or vertical stripes.  This looks good only if the
Martin@860
   117
annotations are reasonably sparse: e.g. you can't sensibly view 20000
Martin@860
   118
gene annotations in one small dotplot.
Martin@860
   119
Martin@860
   120
  --bed1=FILE
Martin@860
   121
      Read `BED-format
Martin@860
   122
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
Martin@860
   123
      annotations for the 1st genome.  They are drawn as stripes, with
Martin@860
   124
      coordinates given by the first three BED fields.  The color is
Martin@860
   125
      specified by the RGB field if present, else pale red if the
Martin@860
   126
      strand is "+", pale blue if "-", or pale purple.
Martin@860
   127
  --bed2=FILE
Martin@860
   128
      Read BED-format annotations for the 2nd genome.
Martin@860
   129
  --rmsk1=FILE
Martin@860
   130
      Read repeat annotations for the 1st genome, in RepeatMasker .out
Martin@860
   131
      or rmsk.txt format.  The color is pale purple for "low
Martin@860
   132
      complexity" and "simple repeats", else pale red for "+" strand
Martin@860
   133
      and pale blue for "-" strand.
Martin@860
   134
  --rmsk2=FILE
Martin@860
   135
      Read repeat annotations for the 2nd genome.
Martin@860
   136
Martin@860
   137
Gene options
Martin@860
   138
~~~~~~~~~~~~
Martin@860
   139
Martin@860
   140
  --genePred1=FILE
Martin@860
   141
      Read gene annotations for the 1st genome in `genePred format
Martin@860
   142
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format9>`_.
Martin@860
   143
  --genePred2=FILE
Martin@860
   144
      Read gene annotations for the 2nd genome.
Martin@860
   145
  --exon-color=COLOR
Martin@860
   146
      Color for exons.
Martin@860
   147
  --cds-color=COLOR
Martin@860
   148
      Color for protein-coding regions.
Martin@860
   149
Martin@652
   150
Unsequenced gap options
Martin@652
   151
~~~~~~~~~~~~~~~~~~~~~~~
Martin@652
   152
Martin@652
   153
Note: these "gaps" are *not* alignment gaps (indels): they are regions
Martin@652
   154
of unknown sequence.
Martin@652
   155
Martin@652
   156
  --gap1=FILE
Martin@652
   157
      Read unsequenced gaps in the 1st genome from an agp or gap file.
Martin@652
   158
  --gap2=FILE
Martin@652
   159
      Read unsequenced gaps in the 2nd genome from an agp or gap file.
Martin@652
   160
  --bridged-color=COLOR
Martin@652
   161
      Color for bridged gaps.
Martin@652
   162
  --unbridged-color=COLOR
Martin@652
   163
      Color for unbridged gaps.
Martin@652
   164
Martin@652
   165
An unsequenced gap will be shown only if it covers at least one whole
Martin@652
   166
pixel.
Martin@860
   167
Martin@860
   168
Colors
Martin@860
   169
~~~~~~
Martin@860
   170
Martin@860
   171
Colors can be specified in `various ways described here
Martin@860
   172
<http://effbot.org/imagingbook/imagecolor.htm>`_.