doc/last-papers.txt
author Martin C. Frith
Tue Feb 28 16:35:02 2017 +0900 (2017-02-28)
changeset 834 1514199bb31b
parent 590 fd377d1235d8
child 864 c15cd2ae062d
permissions -rw-r--r--
Bugfix: dotplot left border misplaced (for some versions of python/PIL).
     1 Detailed papers on LAST
     2 =======================
     3 
     4 LAST has many ingredients, some of which are described in these
     5 papers.  If you find an ingredient useful, please cite the
     6 corresponding paper.  Citation is important because it provides
     7 feedback on which research work was useful, and helps to justify the
     8 research to society.
     9 
    10 1. `Adaptive seeds tame genomic sequence comparison`__.  Kiełbasa SM,
    11    Wan R, Sato K, Horton P, Frith MC.  Genome Res. 2011 21(3):487-93.
    12 
    13    __ http://genome.cshlp.org/content/21/3/487.long
    14 
    15    This describes the main algorithms used by LAST.
    16 
    17 2. `Incorporating sequence quality data into alignment improves DNA
    18    read mapping`__.  Frith MC, Wan R, Horton P.  Nucleic Acids
    19    Res. 2010 38(7):e100.
    20 
    21    __ http://nar.oxfordjournals.org/content/38/7/e100.long
    22 
    23    How LAST uses sequence quality data.
    24 
    25 3. `Parameters for Accurate Genome Alignment`__.  Frith MC, Hamada M,
    26    Horton P.  BMC Bioinformatics. 2010 11:80.
    27 
    28    __ http://www.biomedcentral.com/1471-2105/11/80
    29 
    30    Choice of score parameters, ambiguity of alignment columns, and
    31    gamma-centroid alignment.
    32 
    33 4. `A new repeat-masking method enables specific detection of
    34    homologous sequences`__.  Frith MC.  Nucleic Acids Res. 2011
    35    39(4):e23.
    36 
    37    __ http://nar.oxfordjournals.org/content/39/4/e23.long
    38 
    39    This describes the tantan algorithm for finding simple /
    40    low-complexity / tandem repeats, which reliably prevents
    41    non-homologous alignments, unlike other repeat finders.
    42 
    43 5. `Gentle masking of low-complexity sequences improves homology
    44    search`__.  Frith MC.  PLoS One. 2011 6(12):e28819.
    45 
    46    __ http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0028819
    47 
    48    This describes what LAST does with repeats after they have been
    49    found.
    50 
    51 6. `Probabilistic alignments with quality scores: an application to
    52    short-read mapping toward accurate SNP/indel detection`__.  Hamada
    53    M, Wijaya E, Frith MC, Asai K.  Bioinformatics. 2011
    54    27(22):3085-92.
    55 
    56    __ http://bioinformatics.oxfordjournals.org/content/27/22/3085.long
    57 
    58    Describes probabilistic alignment using sequence quality data, and
    59    LAMA alignment.
    60 
    61 7. `A mostly traditional approach improves alignment of
    62    bisulfite-converted DNA`__.  Frith MC, Mori R, Asai K.  Nucleic
    63    Acids Res. 2012 40(13):e100.
    64 
    65    __ http://nar.oxfordjournals.org/content/40/13/e100.long
    66 
    67    This describes alignment of bisulfite-converted DNA, and an update
    68    for use of fastq quality data that allows for non-uniform base
    69    frequencies.
    70 
    71 8. `An approximate Bayesian approach for mapping paired-end DNA reads
    72    to a reference genome`__.  Shrestha AM, Frith MC.
    73    Bioinformatics. 2013 29(8):965-72.
    74 
    75    __ http://bioinformatics.oxfordjournals.org/content/29/8/965.long
    76 
    77    This describes the algorithm used by last-pair-probs.
    78 
    79 9. `Improved search heuristics find 20,000 new alignments between
    80    human and mouse genomes`__.  Frith MC, Noé L.  Nucleic Acids
    81    Res. 2014 42(7):e59.
    82 
    83    __ http://nar.oxfordjournals.org/content/42/7/e59.long
    84 
    85    This describes sensitive DNA seeding (MAM8 and MAM4).
    86 
    87 10. `Frameshift alignment: statistics and post-genomic
    88     applications`__.  Sheetlin SL, Park Y, Frith MC, Spouge JL.
    89     Bioinformatics. 2014 30(24):3575-82.
    90 
    91     __ http://bioinformatics.oxfordjournals.org/content/30/24/3575.long
    92 
    93     Describes DNA-versus-protein alignment allowing for frameshifts.
    94 
    95 11. `Split-alignment of genomes finds orthologies more accurately`__.
    96     Frith MC, Kawaguchi R.  Genome Biology. 2015 16:106.
    97 
    98     __ http://www.genomebiology.com/content/16/1/106
    99 
   100     Describes the split alignment algorithm, and its application to
   101     whole genome alignment.
   102 
   103 12. `Training alignment parameters for arbitrary sequencers with
   104     LAST-TRAIN`__.  Hamada M, Ono Y, Asai K Frith MC.
   105     Bioinformatics.
   106 
   107     __ https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btw742
   108 
   109     Describes last-train.
   110 
   111 External methods
   112 ----------------
   113 
   114 LAST of course owes its ideas to much previous research.  Here are
   115 listed only implementations that are directly used in LAST.
   116 
   117 * `The Gumbel pre-factor k for gapped local alignment can be estimated
   118   from simulations of global alignment`__.  Sheetlin S, Park Y, Spouge
   119   JL.  Nucleic Acids Res. 2005 33(15):4987-94.
   120 
   121   __ http://nar.oxfordjournals.org/content/33/15/4987.long
   122 
   123   Describes how E-values are calculated.
   124 
   125 * `New finite-size correction for local alignment score
   126   distributions`__.  Park Y, Sheetlin S, Ma N, Madden TL, Spouge JL.
   127   BMC Res Notes. 2012 5:286.
   128 
   129   __ http://www.biomedcentral.com/1756-0500/5/286
   130 
   131   Describes a correction that makes the E-values more accurate for
   132   short sequences.
   133 
   134 * `GNU Parallel - The Command-Line Power Tool`__.  Tange O.  ;login:
   135   The USENIX Magazine. 2011:42-47.
   136 
   137   __ https://www.usenix.org/publications/login/february-2011-volume-36-number-1/gnu-parallel-command-line-power-tool
   138 
   139   It seems traditional not to cite this kind of ingredient, which is
   140   unfortunate because the same reasons for citation apply.