Introduction
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).
The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.
However,
It is sometimes more productive to preprocess the FASTA/FASTQ files before
mapping the sequences to the genome - manipulating the sequences to
produce better mapping results.
The FASTX-Toolkit tools perform some of these preprocessing tasks.
Available Tools
-
FASTQ-to-FASTA converter
Convert FASTQ files to FASTA files. -
FASTQ Information
Chart Quality Statistics and Nucleotide Distribution -
FASTQ/A Collapser
Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts) -
FASTQ/A Trimmer
Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise). -
FASTQ/A Renamer
Renames the sequence identifiers in FASTQ/A file. -
FASTQ/A Clipper
Removing sequencing adapters / linkers -
FASTQ/A Reverse-Complement
Producing the Reverse-complement of each sequence in a FASTQ/FASTA file. -
FASTQ/A Barcode splitter
Splitting a FASTQ/FASTA files containning multiple samples -
FASTA Formatter
changes the width of sequences line in a FASTA file -
FASTA Nucleotide Changer
Convets FASTA sequences from/to RNA/DNA -
FASTQ Quality Filter
Filters sequences based on quality -
FASTQ Quality Trimmer
Trims (cuts) sequences based on quality -
FASTQ Masker
Masks nucleotides with 'N' (or other character) based on quality
These tools can be used in two forms:
-
Web-based (with Galaxy).
Galaxy's Test website already contains some of the FASTX-toolkit tools. -
Command-line:
running the tools from command line (or as part of a script).
Tools demonstration
Visit the Hannon lab public galaxy server to see a demonstration of these (and other) tools.News
02-Feb-2010 - Version 0.0.13
New tools:fastq_masker (suggested by Ben Bimber)
New features:
fastx_trimmer can trim N nucleotides from the end of the sequences (a new command line option, and a separate tool in Galaxy)
fastx_clipper accepts minimum adapter length to clip (requested by Erick Antezana, command line only)
Improved Galaxy integration:
Almost all tools have working functional tests (except the plotting tools and barcode splitter).
Plotting tools (nucleotide distribution and quality boxplot) detect input file type and show a detailed warning if given a FASTA/Q file as input
(hopefully reducing bug reports).
Tools read the input FASTQ type (sanger or solexa) and use the correct quality ASCII offset (33 for sanger, 64 for solexa).
Dec-2009 - Version 0.0.12
never officially released24-Nov-2009 - Version 0.0.11
New tools: fastx_uncollapser, fastq_quality_filter.New features: fastx_collapser can re-collapse an already-collapsed FASTA file; fastx_trimmer can trim N bases from the end of the sequence.
Minor compilation bug-fixes.
10-Aug-2009 - Version 0.0.10
Bug fix on Mac OS X (reported by Joshua Waterfall).New tool: FASTX-Renamer (based on suggestion+patch by Charles Plessy).
New undocumented command line argument: -Q NN handles FASTQ ASCII quality with user specified offset (was hard-coded as 64 in previous versions). Requested by Erick Antezana
Barcode-Splitter: improved galaxy integration - stores output files directly into galaxy's files database; no need for external webserver anymore.
Uses libgtextutils-0.5 library (as a dynamic library)
Version 0.0.9
Never released.12-Mar-2009 - Version 0.0.8
Minor changes to compilation stage, as suggested by users.FASTX-toolkit should now compile cleanly on Mac OS x.
No new features were added.
Using libgtextutils-0.3 library.
24-Mar-2009 - Version 0.0.7
Added Fasta-Formatter and Fasta-Nucleotide-Changer tools.Using libgtextutils-0.1 library.