FASTX-Toolkit

FASTQ/A short-reads pre-processing tools

Introduction

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).

The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.

However,
It is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome - manipulating the sequences to produce better mapping results.

The FASTX-Toolkit tools perform some of these preprocessing tasks.

Available Tools


These tools can be used in two forms:
  1. Web-based (with Galaxy).
    Galaxy's Test website already contains some of the FASTX-toolkit tools.
  2. Command-line:
    running the tools from command line (or as part of a script).

Tools demonstration

Visit the Hannon lab public galaxy server to see a demonstration of these (and other) tools.

News

02-Feb-2010 - Version 0.0.13

New tools:
   fastq_masker (suggested by Ben Bimber)
New features:
   fastx_trimmer can trim N nucleotides from the end of the sequences (a new command line option, and a separate tool in Galaxy)
   fastx_clipper accepts minimum adapter length to clip (requested by Erick Antezana, command line only)
Improved Galaxy integration:
   Almost all tools have working functional tests (except the plotting tools and barcode splitter).
   Plotting tools (nucleotide distribution and quality boxplot) detect input file type and show a detailed warning if given a FASTA/Q file as input
   (hopefully reducing bug reports).
   Tools read the input FASTQ type (sanger or solexa) and use the correct quality ASCII offset (33 for sanger, 64 for solexa).

Dec-2009 - Version 0.0.12

never officially released

24-Nov-2009 - Version 0.0.11

New tools: fastx_uncollapser, fastq_quality_filter.
New features: fastx_collapser can re-collapse an already-collapsed FASTA file; fastx_trimmer can trim N bases from the end of the sequence.
Minor compilation bug-fixes.

10-Aug-2009 - Version 0.0.10

Bug fix on Mac OS X (reported by Joshua Waterfall).
New tool: FASTX-Renamer (based on suggestion+patch by Charles Plessy).
New undocumented command line argument: -Q NN handles FASTQ ASCII quality with user specified offset (was hard-coded as 64 in previous versions). Requested by Erick Antezana
Barcode-Splitter: improved galaxy integration - stores output files directly into galaxy's files database; no need for external webserver anymore.
Uses libgtextutils-0.5 library (as a dynamic library)

Version 0.0.9

Never released.

12-Mar-2009 - Version 0.0.8

Minor changes to compilation stage, as suggested by users.
FASTX-toolkit should now compile cleanly on Mac OS x.
No new features were added.
Using libgtextutils-0.3 library.

24-Mar-2009 - Version 0.0.7

Added Fasta-Formatter and Fasta-Nucleotide-Changer tools.
Using libgtextutils-0.1 library.

25-Feb-2009 - Version 0.0.6

Initial public release.