doc/last-papers.txt
author Martin C. Frith
Tue Feb 28 16:35:02 2017 +0900 (2017-02-28)
changeset 834 1514199bb31b
parent 590 fd377d1235d8
child 864 c15cd2ae062d
permissions -rw-r--r--
Bugfix: dotplot left border misplaced (for some versions of python/PIL).
Martin@560
     1
Detailed papers on LAST
Martin@560
     2
=======================
Martin@560
     3
Martin@560
     4
LAST has many ingredients, some of which are described in these
Martin@560
     5
papers.  If you find an ingredient useful, please cite the
Martin@560
     6
corresponding paper.  Citation is important because it provides
Martin@560
     7
feedback on which research work was useful, and helps to justify the
Martin@560
     8
research to society.
Martin@560
     9
Martin@560
    10
1. `Adaptive seeds tame genomic sequence comparison`__.  Kiełbasa SM,
Martin@560
    11
   Wan R, Sato K, Horton P, Frith MC.  Genome Res. 2011 21(3):487-93.
Martin@560
    12
Martin@560
    13
   __ http://genome.cshlp.org/content/21/3/487.long
Martin@560
    14
Martin@560
    15
   This describes the main algorithms used by LAST.
Martin@560
    16
Martin@560
    17
2. `Incorporating sequence quality data into alignment improves DNA
Martin@560
    18
   read mapping`__.  Frith MC, Wan R, Horton P.  Nucleic Acids
Martin@560
    19
   Res. 2010 38(7):e100.
Martin@560
    20
Martin@560
    21
   __ http://nar.oxfordjournals.org/content/38/7/e100.long
Martin@560
    22
Martin@560
    23
   How LAST uses sequence quality data.
Martin@560
    24
Martin@560
    25
3. `Parameters for Accurate Genome Alignment`__.  Frith MC, Hamada M,
Martin@560
    26
   Horton P.  BMC Bioinformatics. 2010 11:80.
Martin@560
    27
Martin@560
    28
   __ http://www.biomedcentral.com/1471-2105/11/80
Martin@560
    29
Martin@560
    30
   Choice of score parameters, ambiguity of alignment columns, and
Martin@560
    31
   gamma-centroid alignment.
Martin@560
    32
Martin@560
    33
4. `A new repeat-masking method enables specific detection of
Martin@560
    34
   homologous sequences`__.  Frith MC.  Nucleic Acids Res. 2011
Martin@560
    35
   39(4):e23.
Martin@560
    36
Martin@560
    37
   __ http://nar.oxfordjournals.org/content/39/4/e23.long
Martin@560
    38
Martin@560
    39
   This describes the tantan algorithm for finding simple /
Martin@560
    40
   low-complexity / tandem repeats, which reliably prevents
Martin@560
    41
   non-homologous alignments, unlike other repeat finders.
Martin@560
    42
Martin@560
    43
5. `Gentle masking of low-complexity sequences improves homology
Martin@560
    44
   search`__.  Frith MC.  PLoS One. 2011 6(12):e28819.
Martin@560
    45
Martin@560
    46
   __ http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0028819
Martin@560
    47
Martin@560
    48
   This describes what LAST does with repeats after they have been
Martin@560
    49
   found.
Martin@560
    50
Martin@560
    51
6. `Probabilistic alignments with quality scores: an application to
Martin@560
    52
   short-read mapping toward accurate SNP/indel detection`__.  Hamada
Martin@560
    53
   M, Wijaya E, Frith MC, Asai K.  Bioinformatics. 2011
Martin@560
    54
   27(22):3085-92.
Martin@560
    55
Martin@560
    56
   __ http://bioinformatics.oxfordjournals.org/content/27/22/3085.long
Martin@560
    57
Martin@560
    58
   Describes probabilistic alignment using sequence quality data, and
Martin@560
    59
   LAMA alignment.
Martin@560
    60
Martin@560
    61
7. `A mostly traditional approach improves alignment of
Martin@560
    62
   bisulfite-converted DNA`__.  Frith MC, Mori R, Asai K.  Nucleic
Martin@560
    63
   Acids Res. 2012 40(13):e100.
Martin@560
    64
Martin@560
    65
   __ http://nar.oxfordjournals.org/content/40/13/e100.long
Martin@560
    66
Martin@560
    67
   This describes alignment of bisulfite-converted DNA, and an update
Martin@560
    68
   for use of fastq quality data that allows for non-uniform base
Martin@560
    69
   frequencies.
Martin@560
    70
Martin@560
    71
8. `An approximate Bayesian approach for mapping paired-end DNA reads
Martin@560
    72
   to a reference genome`__.  Shrestha AM, Frith MC.
Martin@560
    73
   Bioinformatics. 2013 29(8):965-72.
Martin@560
    74
Martin@560
    75
   __ http://bioinformatics.oxfordjournals.org/content/29/8/965.long
Martin@560
    76
Martin@560
    77
   This describes the algorithm used by last-pair-probs.
Martin@560
    78
Martin@560
    79
9. `Improved search heuristics find 20,000 new alignments between
Martin@560
    80
   human and mouse genomes`__.  Frith MC, Noé L.  Nucleic Acids
Martin@560
    81
   Res. 2014 42(7):e59.
Martin@560
    82
Martin@560
    83
   __ http://nar.oxfordjournals.org/content/42/7/e59.long
Martin@560
    84
Martin@560
    85
   This describes sensitive DNA seeding (MAM8 and MAM4).
Martin@560
    86
Martin@560
    87
10. `Frameshift alignment: statistics and post-genomic
Martin@560
    88
    applications`__.  Sheetlin SL, Park Y, Frith MC, Spouge JL.
Martin@560
    89
    Bioinformatics. 2014 30(24):3575-82.
Martin@560
    90
Martin@560
    91
    __ http://bioinformatics.oxfordjournals.org/content/30/24/3575.long
Martin@560
    92
Martin@560
    93
    Describes DNA-versus-protein alignment allowing for frameshifts.
Martin@560
    94
Martin@590
    95
11. `Split-alignment of genomes finds orthologies more accurately`__.
Martin@590
    96
    Frith MC, Kawaguchi R.  Genome Biology. 2015 16:106.
Martin@590
    97
Martin@590
    98
    __ http://www.genomebiology.com/content/16/1/106
Martin@576
    99
Martin@576
   100
    Describes the split alignment algorithm, and its application to
Martin@576
   101
    whole genome alignment.
Martin@576
   102
Martin@834
   103
12. `Training alignment parameters for arbitrary sequencers with
Martin@834
   104
    LAST-TRAIN`__.  Hamada M, Ono Y, Asai K Frith MC.
Martin@834
   105
    Bioinformatics.
Martin@834
   106
Martin@834
   107
    __ https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btw742
Martin@834
   108
Martin@834
   109
    Describes last-train.
Martin@834
   110
Martin@560
   111
External methods
Martin@560
   112
----------------
Martin@560
   113
Martin@560
   114
LAST of course owes its ideas to much previous research.  Here are
Martin@560
   115
listed only implementations that are directly used in LAST.
Martin@560
   116
Martin@560
   117
* `The Gumbel pre-factor k for gapped local alignment can be estimated
Martin@560
   118
  from simulations of global alignment`__.  Sheetlin S, Park Y, Spouge
Martin@560
   119
  JL.  Nucleic Acids Res. 2005 33(15):4987-94.
Martin@560
   120
Martin@560
   121
  __ http://nar.oxfordjournals.org/content/33/15/4987.long
Martin@560
   122
Martin@560
   123
  Describes how E-values are calculated.
Martin@560
   124
Martin@560
   125
* `New finite-size correction for local alignment score
Martin@560
   126
  distributions`__.  Park Y, Sheetlin S, Ma N, Madden TL, Spouge JL.
Martin@560
   127
  BMC Res Notes. 2012 5:286.
Martin@560
   128
Martin@560
   129
  __ http://www.biomedcentral.com/1756-0500/5/286
Martin@560
   130
Martin@560
   131
  Describes a correction that makes the E-values more accurate for
Martin@560
   132
  short sequences.
Martin@560
   133
Martin@560
   134
* `GNU Parallel - The Command-Line Power Tool`__.  Tange O.  ;login:
Martin@560
   135
  The USENIX Magazine. 2011:42-47.
Martin@560
   136
Martin@560
   137
  __ https://www.usenix.org/publications/login/february-2011-volume-36-number-1/gnu-parallel-command-line-power-tool
Martin@560
   138
Martin@560
   139
  It seems traditional not to cite this kind of ingredient, which is
Martin@560
   140
  unfortunate because the same reasons for citation apply.