doc/last-dotplot.txt
author Martin C. Frith
Tue Oct 03 18:25:07 2017 +0900 (2017-10-03)
changeset 878 20f5c97a3cfd
parent 866 5182d8528ce9
child 898 f6a9c15287ea
permissions -rw-r--r--
Add text-rotation options to last-dotplot
     1 last-dotplot
     2 ============
     3 
     4 This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
     5 alignments in MAF or LAST tabular format.  It requires the `Python
     6 Imaging Library <https://pillow.readthedocs.io/>`_ to be installed.
     7 It can be used like this::
     8 
     9   last-dotplot my-alignments my-plot.png
    10 
    11 The output can be in any format supported by the Imaging Library::
    12 
    13   last-dotplot alns alns.gif
    14 
    15 To get a nicer font, try something like::
    16 
    17   last-dotplot -f /usr/share/fonts/liberation/LiberationSans-Regular.ttf alns alns.png
    18 
    19 or::
    20 
    21   last-dotplot -f /Library/Fonts/Arial.ttf alns alns.png
    22 
    23 If the fonts are located somewhere different on your computer, change
    24 this as appropriate.
    25 
    26 Choosing sequences
    27 ------------------
    28 
    29 If there are too many sequences, the dotplot will be very cluttered,
    30 or the script might give up with an error message.  You can exclude
    31 sequences with names like "chrUn_random522" like this::
    32 
    33   last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png
    34 
    35 Option "-1" selects sequences from the 1st (horizontal) genome, and
    36 "-2" selects sequences from the 2nd (vertical) genome.  'chr[!U]*' is
    37 a *pattern* that specifies names starting with "chr", followed by any
    38 character except U, followed by anything.
    39 
    40 ==========  =============================
    41 Pattern     Meaning
    42 ----------  -----------------------------
    43 ``*``       zero or more of any character
    44 ``?``       any single character
    45 ``[abc]``   any character in abc
    46 ``[!abc]``  any character not in abc
    47 ==========  =============================
    48 
    49 If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
    50 compared to both the whole name and the part after the dot.
    51 
    52 You can specify more than one pattern, e.g. this gets sequences with
    53 names starting in "chr" followed by one or two characters::
    54 
    55   last-dotplot -1 'chr?' -1 'chr??' alns alns.png
    56 
    57 You can also specify a sequence range; for example this gets the first
    58 1000 bases of chr9::
    59 
    60   last-dotplot -1 chr9:0-1000 alns alns.png
    61 
    62 Options
    63 -------
    64 
    65   -h, --help
    66       Show a help message, with default option values, and exit.
    67   -v, --verbose
    68       Show progress messages & data about the plot.
    69   -1 PATTERN, --seq1=PATTERN
    70       Which sequences to show from the 1st (horizontal) genome.
    71   -2 PATTERN, --seq2=PATTERN
    72       Which sequences to show from the 2nd (vertical) genome.
    73   -x WIDTH, --width=WIDTH
    74       Maximum width in pixels.
    75   -y HEIGHT, --height=HEIGHT
    76       Maximum height in pixels.
    77   -c COLOR, --forwardcolor=COLOR
    78       Color for forward alignments.
    79   -r COLOR, --reversecolor=COLOR
    80       Color for reverse alignments.
    81   --sort1=N
    82       Put the 1st genome's sequences left-to-right in order of: their
    83       appearance in the input (0), their names (1), their lengths (2).
    84   --sort2=N
    85       Put the 2nd genome's sequences top-to-bottom in order of: their
    86       appearance in the input (0), their names (1), their lengths (2).
    87   --trim1
    88       Trim unaligned sequence flanks from the 1st (horizontal) genome.
    89   --trim2
    90       Trim unaligned sequence flanks from the 2nd (vertical) genome.
    91   --border-pixels=INT
    92       Number of pixels between sequences.
    93   --border-color=COLOR
    94       Color for pixels between sequences.
    95 
    96 Text options
    97 ~~~~~~~~~~~~
    98 
    99   -f FILE, --fontfile=FILE
   100       TrueType or OpenType font file.
   101   -s SIZE, --fontsize=SIZE
   102       TrueType or OpenType font size.
   103   --rot1=ROT
   104       Text rotation for the 1st genome: h(orizontal) or v(ertical).
   105   --rot2=ROT
   106       Text rotation for the 2nd genome: h(orizontal) or v(ertical).
   107   --lengths1
   108       Show sequence lengths for the 1st (horizontal) genome.
   109   --lengths2
   110       Show sequence lengths for the 2nd (vertical) genome.
   111 
   112 Annotation options
   113 ~~~~~~~~~~~~~~~~~~
   114 
   115 These options read annotations of sequence segments, and draw them as
   116 colored horizontal or vertical stripes.  This looks good only if the
   117 annotations are reasonably sparse: e.g. you can't sensibly view 20000
   118 gene annotations in one small dotplot.
   119 
   120   --bed1=FILE
   121       Read `BED-format
   122       <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
   123       annotations for the 1st genome.  They are drawn as stripes, with
   124       coordinates given by the first three BED fields.  The color is
   125       specified by the RGB field if present, else pale red if the
   126       strand is "+", pale blue if "-", or pale purple.
   127   --bed2=FILE
   128       Read BED-format annotations for the 2nd genome.
   129   --rmsk1=FILE
   130       Read repeat annotations for the 1st genome, in RepeatMasker .out
   131       or rmsk.txt format.  The color is pale purple for "low
   132       complexity" and "simple repeats", else pale red for "+" strand
   133       and pale blue for "-" strand.
   134   --rmsk2=FILE
   135       Read repeat annotations for the 2nd genome.
   136 
   137 Gene options
   138 ~~~~~~~~~~~~
   139 
   140   --genePred1=FILE
   141       Read gene annotations for the 1st genome in `genePred format
   142       <https://genome.ucsc.edu/FAQ/FAQformat.html#format9>`_.
   143   --genePred2=FILE
   144       Read gene annotations for the 2nd genome.
   145   --exon-color=COLOR
   146       Color for exons.
   147   --cds-color=COLOR
   148       Color for protein-coding regions.
   149 
   150 Unsequenced gap options
   151 ~~~~~~~~~~~~~~~~~~~~~~~
   152 
   153 Note: these "gaps" are *not* alignment gaps (indels): they are regions
   154 of unknown sequence.
   155 
   156   --gap1=FILE
   157       Read unsequenced gaps in the 1st genome from an agp or gap file.
   158   --gap2=FILE
   159       Read unsequenced gaps in the 2nd genome from an agp or gap file.
   160   --bridged-color=COLOR
   161       Color for bridged gaps.
   162   --unbridged-color=COLOR
   163       Color for unbridged gaps.
   164 
   165 An unsequenced gap will be shown only if it covers at least one whole
   166 pixel.
   167 
   168 Colors
   169 ~~~~~~
   170 
   171 Colors can be specified in `various ways described here
   172 <http://effbot.org/imagingbook/imagecolor.htm>`_.