Added parallel-fasta and parallel-fastq.
authorMartin C. Frith
Tue Nov 05 18:40:02 2013 +0900 (2013-11-05)
changeset 36512b5485bf653
parent 364 4d5149f35e8c
child 366 13dd8ce7563d
Added parallel-fasta and parallel-fastq.
doc/last-map-probs.txt
doc/last-split.txt
doc/last-tutorial.txt
makefile
scripts/parallel-fasta
scripts/parallel-fastq
     1.1 --- a/doc/last-map-probs.txt	Tue Nov 05 15:16:01 2013 +0900
     1.2 +++ b/doc/last-map-probs.txt	Tue Nov 05 18:40:02 2013 +0900
     1.3 @@ -56,16 +56,12 @@
     1.4  Using multiple CPUs
     1.5  -------------------
     1.6  
     1.7 -With large datasets, it's important to go faster by using multiple
     1.8 -CPUs.  One way to do that is by using GNU parallel
     1.9 -(http://www.gnu.org/software/parallel/)::
    1.10 +This will run the pipeline on all your CPU cores::
    1.11  
    1.12 -  parallel --pipe -L4 "lastal -Q1 -e120 hu | last-map-probs.py -s150" < reads.fastq > myalns.maf
    1.13 +  parallel-fastq "lastal -Q1 -e120 hu | last-map-probs.py -s150" < reads.fastq > myalns.maf
    1.14  
    1.15 -The "-L4" tells it that each fastq record is 4 lines, so there should
    1.16 -be no line wrapping or blank lines.  Beware that older versions of GNU
    1.17 -parallel were slow when using --pipe -L, so be sure to use a recent
    1.18 -version.
    1.19 +It requires GNU parallel to be installed
    1.20 +(http://www.gnu.org/software/parallel/).
    1.21  
    1.22  Limitations
    1.23  -----------
     2.1 --- a/doc/last-split.txt	Tue Nov 05 15:16:01 2013 +0900
     2.2 +++ b/doc/last-split.txt	Tue Nov 05 18:40:02 2013 +0900
     2.3 @@ -55,6 +55,16 @@
     2.4    lastdb -m1111110 db genome.fasta
     2.5    lastal -Q1 -e120 db q.fastq | last-split -c0 -t0.004 -g db > out.maf
     2.6  
     2.7 +Going faster by parallelization
     2.8 +-------------------------------
     2.9 +
    2.10 +For example, split alignment of DNA reads to a genome::
    2.11 +
    2.12 +  parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
    2.13 +
    2.14 +This requires GNU parallel to be installed
    2.15 +(http://www.gnu.org/software/parallel/).
    2.16 +
    2.17  Output
    2.18  ------
    2.19  
    2.20 @@ -126,18 +136,6 @@
    2.21    lastdb -m1111110 db genome.fasta
    2.22    lastal -Q1 -e120 db q.fastq | last-split -c0 > out.maf
    2.23  
    2.24 -Going faster by parallelization
    2.25 --------------------------------
    2.26 -
    2.27 -With large datasets, it's important to go faster by using multiple
    2.28 -CPUs.  One way to do that is by using GNU parallel
    2.29 -(http://www.gnu.org/software/parallel/)::
    2.30 -
    2.31 -  parallel --pipe -L4 "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
    2.32 -
    2.33 -Beware that older versions of GNU parallel were slow when using --pipe
    2.34 --L, so be sure to use a recent version.
    2.35 -
    2.36  Options
    2.37  -------
    2.38  
     3.1 --- a/doc/last-tutorial.txt	Tue Nov 05 15:16:01 2013 +0900
     3.2 +++ b/doc/last-tutorial.txt	Tue Nov 05 18:40:02 2013 +0900
     3.3 @@ -234,38 +234,40 @@
     3.4  -----------------------------------
     3.5  
     3.6  If you have more than one query sequence, you can go faster by
     3.7 -aligning them in parallel.  One way to do that is by using GNU
     3.8 -parallel (http://www.gnu.org/software/parallel/).  (Beware that GNU
     3.9 -parallel had some efficiency bugs that were fixed in late 2012 / early
    3.10 -2013, so be sure to use a recent version.)
    3.11 -
    3.12 -If you have fasta queries in separate files (e.g. chr*.fa), then
    3.13 -instead of this::
    3.14 -
    3.15 -  lastal mydb chr*.fa > myalns.maf
    3.16 -
    3.17 -Try this::
    3.18 -
    3.19 -  parallel lastal mydb ::: chr*.fa > myalns.maf
    3.20 -
    3.21 -If you have fasta queries in one file, then instead of this::
    3.22 +aligning them in parallel.  Instead of this::
    3.23  
    3.24    lastal mydb queries.fa > myalns.maf
    3.25  
    3.26 -Try this::
    3.27 +try this::
    3.28  
    3.29 -  parallel --pipe --recstart '>' lastal mydb < queries.fa > myalns.maf
    3.30 +  parallel-fasta "lastal mydb" < queries.fa > myalns.maf
    3.31  
    3.32 -If you have fastq queries in one file, then instead of this::
    3.33 +Instead of this::
    3.34  
    3.35    lastal -Q1 -e120 db q.fastq | last-split > out.maf
    3.36  
    3.37 -Try this::
    3.38 +try this::
    3.39  
    3.40 -  parallel --pipe -L4 "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
    3.41 +  parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
    3.42  
    3.43 -(The "-L4" tells it that each fastq record is 4 lines, so there should
    3.44 -be no line wrapping or blank lines.)
    3.45 +Instead of this::
    3.46 +
    3.47 +  zcat queries.fa.gz | lastal mydb > myalns.maf
    3.48 +
    3.49 +try this::
    3.50 +
    3.51 +  zcat queries.fa.gz | parallel-fasta "lastal mydb" > myalns.maf
    3.52 +
    3.53 +Notes:
    3.54 +
    3.55 +* These require GNU parallel to be installed
    3.56 +  (http://www.gnu.org/software/parallel/).
    3.57 +
    3.58 +* You can use various GNU parallel options to control the number of
    3.59 +  simultaneous jobs, use remote computers, etc.
    3.60 +
    3.61 +* parallel-fastq assumes that each fastq record is 4 lines, so there
    3.62 +  should be no line wrapping or blank lines.
    3.63  
    3.64  Example 9: Ambiguity of alignment columns
    3.65  -----------------------------------------
     4.1 --- a/makefile	Tue Nov 05 15:16:01 2013 +0900
     4.2 +++ b/makefile	Tue Nov 05 18:40:02 2013 +0900
     4.3 @@ -7,7 +7,7 @@
     4.4  bindir = $(exec_prefix)/bin
     4.5  install: all
     4.6  	mkdir -p $(bindir)
     4.7 -	cp src/last?? src/last-split scripts/*.?? $(bindir)
     4.8 +	cp src/last?? src/last-split scripts/* $(bindir)
     4.9  
    4.10  clean:
    4.11  	@cd src && $(MAKE) clean
     5.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     5.2 +++ b/scripts/parallel-fasta	Tue Nov 05 18:40:02 2013 +0900
     5.3 @@ -0,0 +1,8 @@
     5.4 +#! /bin/sh
     5.5 +
     5.6 +parallel --gnu --version > /dev/null || exit 1
     5.7 +
     5.8 +parallel --gnu --minversion 20130222 > /dev/null ||
     5.9 +echo $(basename $0): warning: old version of parallel, might be slow 1>&2
    5.10 +
    5.11 +exec parallel --gnu --pipe --recstart '>' "$@"
     6.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     6.2 +++ b/scripts/parallel-fastq	Tue Nov 05 18:40:02 2013 +0900
     6.3 @@ -0,0 +1,8 @@
     6.4 +#! /bin/sh
     6.5 +
     6.6 +parallel --gnu --version > /dev/null || exit 1
     6.7 +
     6.8 +parallel --gnu --minversion 20130222 > /dev/null ||
     6.9 +echo $(basename $0): warning: old version of parallel, might be slow 1>&2
    6.10 +
    6.11 +exec parallel --gnu --pipe -L4 "$@"