IntroductionThe FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).
The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.
It is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome - manipulating the sequences to produce better mapping results.
The FASTX-Toolkit tools perform some of these preprocessing tasks.
Convert FASTQ files to FASTA files.
Chart Quality Statistics and Nucleotide Distribution
Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).
Renames the sequence identifiers in FASTQ/A file.
Removing sequencing adapters / linkers
Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.
FASTQ/A Barcode splitter
Splitting a FASTQ/FASTA files containning multiple samples
changes the width of sequences line in a FASTA file
FASTA Nucleotide Changer
Convets FASTA sequences from/to RNA/DNA
FASTQ Quality Filter
Filters sequences based on quality
FASTQ Quality Trimmer
Trims (cuts) sequences based on quality
Masks nucleotides with 'N' (or other character) based on quality
These tools can be used in two forms:
Web-based (with Galaxy).
Galaxy's Test website already contains some of the FASTX-toolkit tools.
running the tools from command line (or as part of a script).
Tools demonstrationVisit the Hannon lab public galaxy server to see a demonstration of these (and other) tools.
02-Feb-2010 - Version 0.0.13New tools:
fastq_masker (suggested by Ben Bimber)
fastx_trimmer can trim N nucleotides from the end of the sequences (a new command line option, and a separate tool in Galaxy)
fastx_clipper accepts minimum adapter length to clip (requested by Erick Antezana, command line only)
Improved Galaxy integration:
Almost all tools have working functional tests (except the plotting tools and barcode splitter).
Plotting tools (nucleotide distribution and quality boxplot) detect input file type and show a detailed warning if given a FASTA/Q file as input
(hopefully reducing bug reports).
Tools read the input FASTQ type (sanger or solexa) and use the correct quality ASCII offset (33 for sanger, 64 for solexa).
Dec-2009 - Version 0.0.12never officially released
24-Nov-2009 - Version 0.0.11New tools: fastx_uncollapser, fastq_quality_filter.
New features: fastx_collapser can re-collapse an already-collapsed FASTA file; fastx_trimmer can trim N bases from the end of the sequence.
Minor compilation bug-fixes.
10-Aug-2009 - Version 0.0.10Bug fix on Mac OS X (reported by Joshua Waterfall).
New tool: FASTX-Renamer (based on suggestion+patch by Charles Plessy).
New undocumented command line argument: -Q NN handles FASTQ ASCII quality with user specified offset (was hard-coded as 64 in previous versions). Requested by Erick Antezana
Barcode-Splitter: improved galaxy integration - stores output files directly into galaxy's files database; no need for external webserver anymore.
Uses libgtextutils-0.5 library (as a dynamic library)
Version 0.0.9Never released.
12-Mar-2009 - Version 0.0.8Minor changes to compilation stage, as suggested by users.
FASTX-toolkit should now compile cleanly on Mac OS x.
No new features were added.
Using libgtextutils-0.3 library.
24-Mar-2009 - Version 0.0.7Added Fasta-Formatter and Fasta-Nucleotide-Changer tools.
Using libgtextutils-0.1 library.