doc/last-dotplot.txt
author Martin C. Frith
Tue Apr 04 11:51:15 2017 +0900 (2017-04-04)
changeset 845 16060c00b129
parent 840 85a72978fb7d
child 846 1f46ab956351
permissions -rw-r--r--
Added last-dotplot options to show BED features.
Martin@652
     1
last-dotplot
Martin@652
     2
============
Martin@652
     3
Martin@652
     4
This script makes a dotplot, a.k.a. Oxford Grid, of pair-wise sequence
Martin@652
     5
alignments in MAF or LAST tabular format.  It requires the Python
Martin@652
     6
Imaging Library to be installed.  It can be used like this::
Martin@652
     7
Martin@652
     8
  last-dotplot my-alignments my-plot.png
Martin@652
     9
Martin@652
    10
The output can be in any format supported by the Imaging Library::
Martin@652
    11
Martin@652
    12
  last-dotplot alns alns.gif
Martin@652
    13
Martin@652
    14
To get a nicer font, try something like::
Martin@652
    15
Martin@652
    16
  last-dotplot -f /usr/share/fonts/truetype/freefont/FreeSans.ttf alns alns.png
Martin@652
    17
Martin@652
    18
If the fonts are located somewhere different on your computer, change
Martin@652
    19
this as appropriate.
Martin@652
    20
Martin@652
    21
Choosing sequences
Martin@652
    22
------------------
Martin@652
    23
Martin@652
    24
If there are too many sequences, the dotplot will be very cluttered,
Martin@652
    25
or the script might give up with an error message.  You can exclude
Martin@652
    26
sequences with names like "chrUn_random522" like this::
Martin@652
    27
Martin@652
    28
  last-dotplot -1 'chr[!U]*' -2 'chr[!U]*' alns alns.png
Martin@652
    29
Martin@652
    30
Option "-1" selects sequences from the 1st genome, and "-2" selects
Martin@652
    31
sequences from the 2nd genome.  'chr[!U]*' is a *pattern* that
Martin@652
    32
specifies names starting with "chr", followed by any character except
Martin@652
    33
U, followed by anything.
Martin@652
    34
Martin@652
    35
==========  =============================
Martin@652
    36
Pattern     Meaning
Martin@652
    37
----------  -----------------------------
Martin@652
    38
``*``       zero or more of any character
Martin@652
    39
``?``       any single character
Martin@652
    40
``[abc]``   any character in abc
Martin@652
    41
``[!abc]``  any character not in abc
Martin@652
    42
==========  =============================
Martin@652
    43
Martin@652
    44
If a sequence name has a dot (e.g. "hg19.chr7"), the pattern is
Martin@652
    45
compared to both the whole name and the part after the dot.
Martin@652
    46
Martin@652
    47
You can specify more than one pattern, e.g. this gets sequences with
Martin@652
    48
names starting in "chr" followed by one or two characters::
Martin@652
    49
Martin@652
    50
  last-dotplot -1 'chr?' -1 'chr??' alns alns.png
Martin@652
    51
Martin@840
    52
You can also specify a sequence range; for example this gets the first
Martin@840
    53
1000 bases of chr9::
Martin@840
    54
Martin@840
    55
  last-dotplot -1 chr9:0-1000 alns alns.png
Martin@840
    56
Martin@652
    57
Options
Martin@652
    58
-------
Martin@652
    59
Martin@652
    60
  -h, --help
Martin@652
    61
      Show a help message, with default option values, and exit.
Martin@652
    62
  -1 PATTERN, --seq1=PATTERN
Martin@652
    63
      Which sequences to show from the 1st genome.
Martin@652
    64
  -2 PATTERN, --seq2=PATTERN
Martin@652
    65
      Which sequences to show from the 2nd genome.
Martin@652
    66
  -x WIDTH, --width=WIDTH
Martin@652
    67
      Maximum width in pixels.
Martin@652
    68
  -y HEIGHT, --height=HEIGHT
Martin@652
    69
      Maximum height in pixels.
Martin@652
    70
  -f FILE, --fontfile=FILE
Martin@652
    71
      TrueType or OpenType font file.
Martin@652
    72
  -s SIZE, --fontsize=SIZE
Martin@652
    73
      TrueType or OpenType font size.
Martin@652
    74
  -c COLOR, --forwardcolor=COLOR
Martin@652
    75
      Color for forward alignments.
Martin@652
    76
  -r COLOR, --reversecolor=COLOR
Martin@652
    77
      Color for reverse alignments.
Martin@839
    78
  --trim1
Martin@839
    79
      Trim unaligned sequence flanks from the 1st genome.
Martin@839
    80
  --trim2
Martin@839
    81
      Trim unaligned sequence flanks from the 2nd genome.
Martin@845
    82
  --bed1=FILE
Martin@845
    83
      Read `BED-format
Martin@845
    84
      <https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_
Martin@845
    85
      annotations for the 1st genome.  They are drawn as rectangles,
Martin@845
    86
      with coordinates given by the first three BED fields.  The color
Martin@845
    87
      is specified by the RGB field if present, else pale red if the
Martin@845
    88
      strand is "+", pale blue if "-", or pale purple.
Martin@845
    89
  --bed2=FILE
Martin@845
    90
      Read BED-format annotations for the 2nd genome.
Martin@652
    91
Martin@652
    92
Unsequenced gap options
Martin@652
    93
~~~~~~~~~~~~~~~~~~~~~~~
Martin@652
    94
Martin@652
    95
Note: these "gaps" are *not* alignment gaps (indels): they are regions
Martin@652
    96
of unknown sequence.
Martin@652
    97
Martin@652
    98
  --gap1=FILE
Martin@652
    99
      Read unsequenced gaps in the 1st genome from an agp or gap file.
Martin@652
   100
  --gap2=FILE
Martin@652
   101
      Read unsequenced gaps in the 2nd genome from an agp or gap file.
Martin@652
   102
  --bridged-color=COLOR
Martin@652
   103
      Color for bridged gaps.
Martin@652
   104
  --unbridged-color=COLOR
Martin@652
   105
      Color for unbridged gaps.
Martin@652
   106
Martin@652
   107
An unsequenced gap will be shown only if it covers at least one whole
Martin@652
   108
pixel.