Added parallel-fasta and parallel-fastq.
1.1 --- a/doc/last-map-probs.txt Tue Nov 05 15:16:01 2013 +0900
1.2 +++ b/doc/last-map-probs.txt Tue Nov 05 18:40:02 2013 +0900
1.3 @@ -56,16 +56,12 @@
1.4 Using multiple CPUs
1.5 -------------------
1.6
1.7 -With large datasets, it's important to go faster by using multiple
1.8 -CPUs. One way to do that is by using GNU parallel
1.9 -(http://www.gnu.org/software/parallel/)::
1.10 +This will run the pipeline on all your CPU cores::
1.11
1.12 - parallel --pipe -L4 "lastal -Q1 -e120 hu | last-map-probs.py -s150" < reads.fastq > myalns.maf
1.13 + parallel-fastq "lastal -Q1 -e120 hu | last-map-probs.py -s150" < reads.fastq > myalns.maf
1.14
1.15 -The "-L4" tells it that each fastq record is 4 lines, so there should
1.16 -be no line wrapping or blank lines. Beware that older versions of GNU
1.17 -parallel were slow when using --pipe -L, so be sure to use a recent
1.18 -version.
1.19 +It requires GNU parallel to be installed
1.20 +(http://www.gnu.org/software/parallel/).
1.21
1.22 Limitations
1.23 -----------
2.1 --- a/doc/last-split.txt Tue Nov 05 15:16:01 2013 +0900
2.2 +++ b/doc/last-split.txt Tue Nov 05 18:40:02 2013 +0900
2.3 @@ -55,6 +55,16 @@
2.4 lastdb -m1111110 db genome.fasta
2.5 lastal -Q1 -e120 db q.fastq | last-split -c0 -t0.004 -g db > out.maf
2.6
2.7 +Going faster by parallelization
2.8 +-------------------------------
2.9 +
2.10 +For example, split alignment of DNA reads to a genome::
2.11 +
2.12 + parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
2.13 +
2.14 +This requires GNU parallel to be installed
2.15 +(http://www.gnu.org/software/parallel/).
2.16 +
2.17 Output
2.18 ------
2.19
2.20 @@ -126,18 +136,6 @@
2.21 lastdb -m1111110 db genome.fasta
2.22 lastal -Q1 -e120 db q.fastq | last-split -c0 > out.maf
2.23
2.24 -Going faster by parallelization
2.25 --------------------------------
2.26 -
2.27 -With large datasets, it's important to go faster by using multiple
2.28 -CPUs. One way to do that is by using GNU parallel
2.29 -(http://www.gnu.org/software/parallel/)::
2.30 -
2.31 - parallel --pipe -L4 "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
2.32 -
2.33 -Beware that older versions of GNU parallel were slow when using --pipe
2.34 --L, so be sure to use a recent version.
2.35 -
2.36 Options
2.37 -------
2.38
3.1 --- a/doc/last-tutorial.txt Tue Nov 05 15:16:01 2013 +0900
3.2 +++ b/doc/last-tutorial.txt Tue Nov 05 18:40:02 2013 +0900
3.3 @@ -234,38 +234,40 @@
3.4 -----------------------------------
3.5
3.6 If you have more than one query sequence, you can go faster by
3.7 -aligning them in parallel. One way to do that is by using GNU
3.8 -parallel (http://www.gnu.org/software/parallel/). (Beware that GNU
3.9 -parallel had some efficiency bugs that were fixed in late 2012 / early
3.10 -2013, so be sure to use a recent version.)
3.11 -
3.12 -If you have fasta queries in separate files (e.g. chr*.fa), then
3.13 -instead of this::
3.14 -
3.15 - lastal mydb chr*.fa > myalns.maf
3.16 -
3.17 -Try this::
3.18 -
3.19 - parallel lastal mydb ::: chr*.fa > myalns.maf
3.20 -
3.21 -If you have fasta queries in one file, then instead of this::
3.22 +aligning them in parallel. Instead of this::
3.23
3.24 lastal mydb queries.fa > myalns.maf
3.25
3.26 -Try this::
3.27 +try this::
3.28
3.29 - parallel --pipe --recstart '>' lastal mydb < queries.fa > myalns.maf
3.30 + parallel-fasta "lastal mydb" < queries.fa > myalns.maf
3.31
3.32 -If you have fastq queries in one file, then instead of this::
3.33 +Instead of this::
3.34
3.35 lastal -Q1 -e120 db q.fastq | last-split > out.maf
3.36
3.37 -Try this::
3.38 +try this::
3.39
3.40 - parallel --pipe -L4 "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
3.41 + parallel-fastq "lastal -Q1 -e120 db | last-split" < q.fastq > out.maf
3.42
3.43 -(The "-L4" tells it that each fastq record is 4 lines, so there should
3.44 -be no line wrapping or blank lines.)
3.45 +Instead of this::
3.46 +
3.47 + zcat queries.fa.gz | lastal mydb > myalns.maf
3.48 +
3.49 +try this::
3.50 +
3.51 + zcat queries.fa.gz | parallel-fasta "lastal mydb" > myalns.maf
3.52 +
3.53 +Notes:
3.54 +
3.55 +* These require GNU parallel to be installed
3.56 + (http://www.gnu.org/software/parallel/).
3.57 +
3.58 +* You can use various GNU parallel options to control the number of
3.59 + simultaneous jobs, use remote computers, etc.
3.60 +
3.61 +* parallel-fastq assumes that each fastq record is 4 lines, so there
3.62 + should be no line wrapping or blank lines.
3.63
3.64 Example 9: Ambiguity of alignment columns
3.65 -----------------------------------------
4.1 --- a/makefile Tue Nov 05 15:16:01 2013 +0900
4.2 +++ b/makefile Tue Nov 05 18:40:02 2013 +0900
4.3 @@ -7,7 +7,7 @@
4.4 bindir = $(exec_prefix)/bin
4.5 install: all
4.6 mkdir -p $(bindir)
4.7 - cp src/last?? src/last-split scripts/*.?? $(bindir)
4.8 + cp src/last?? src/last-split scripts/* $(bindir)
4.9
4.10 clean:
4.11 @cd src && $(MAKE) clean
5.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000
5.2 +++ b/scripts/parallel-fasta Tue Nov 05 18:40:02 2013 +0900
5.3 @@ -0,0 +1,8 @@
5.4 +#! /bin/sh
5.5 +
5.6 +parallel --gnu --version > /dev/null || exit 1
5.7 +
5.8 +parallel --gnu --minversion 20130222 > /dev/null ||
5.9 +echo $(basename $0): warning: old version of parallel, might be slow 1>&2
5.10 +
5.11 +exec parallel --gnu --pipe --recstart '>' "$@"
6.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000
6.2 +++ b/scripts/parallel-fastq Tue Nov 05 18:40:02 2013 +0900
6.3 @@ -0,0 +1,8 @@
6.4 +#! /bin/sh
6.5 +
6.6 +parallel --gnu --version > /dev/null || exit 1
6.7 +
6.8 +parallel --gnu --minversion 20130222 > /dev/null ||
6.9 +echo $(basename $0): warning: old version of parallel, might be slow 1>&2
6.10 +
6.11 +exec parallel --gnu --pipe -L4 "$@"