nxCode - DNA Barcode Designer and Decoder

for Next-Gen Sequencing

Cold Spring Harbor Labs
Main Download barcode sets Download software Usage Algorithmic details FAQ

Usage

Page contents
  1. Work flow with nxCode
  2. Decoder
  3. Bar-Code design tool

Work flow with nxCode (recommended)

  1. Decide on the size of the barcode-set needed for your experiment, also decide if there are any special biochemical constraints your barcode set needs to meet (such as GC content, restriction sites that must not appear in the barcodes, etc.).
  2. Search the online sets available on this website for a barcode-set suitable for your experiment. If you find one, download it. If not, use the nxCode barcode design tool to manufacture your own barcode-set according to your requirements and constraints.
  3. Incorporate the barcode-set (or some of it) in your experiment, and run it. 
  4. Download the nxCode Decoder and run it as specified below on the data received from your sequencer.

nxDecoder

General description
nxCode Decoder is a tool designed to decode sequenced data that contains a given set of DNA barcodes. nxCode is currently optimized for decoding data that was obtained using Illumina's Genome Analyzer - Solexa.

Input:
  1. FASTA/FASTQ file containing the data to be decoded (normally the output from your sequencer).
  2. FASTA file containing the barcode-set that was used in the experiment (can be either downloaded from this website or produced using the nxCode barcode design tool).
    Useful tip: The barcode file can also contain a description for each barcode (written in the header). This description will then appear in the output file each time a given sequence is decoded as the barcode.
  3. Probability table in .xls format, such as the one included in the package (for further discussion, see Algorithmic details).
Examples:
  1. Input-file
  2. Barcode-set file
  3. Probability table

Output:


A FASTA file (or multiple files, if the optional --split parameter is used) containing the input sequenced data, and headers noting the specimen each sequence most likely originated from, along with two numbers: Event Likelihood and Decode Quality.
(for more details about these numbers and the methods used for decoding, see Algorithmic details).

Examples:
  1. output file
  2. output directory (if --split is used)
Synopsis
nxDecoder.pl --file_to_decode=s --bc_file=s --prob_table=s
             [--split] [--graph] [--start_pos_in_oligo=i]
             [--ligated_with_t] [--clip] [--exact_match]
             [--min_likelihood=f] [--min_quality=f]
             [--min_prob_to_consider] [--debug] [--man]  

Mandatory input parameters
--file_to_decode=s Path to a FASTA or FASTQ file containing the sequenced material for decoding.
--prob_table=sPath and filename of probability table, in .xls format.
--bc_file=s Path to a FASTA file containing the DNA barcodes used, with optional descriptions as headers in the fasta file.
Optional input parameters
--start_pos_in_oligo=i The starting position of the barcode in the oligo.
Default value: 1

--ligated_with_tIndicates that the first nucleotide immediately after the barcode in the oligo is T.
Use if T nucleotide was used for sticky-end ligation of the barcodes - improves accuracy.
Off by default.
--split Generates a folder containing a FASTA file for each barcode word. Each file will contain ONLY the sequenced data associated with THAT barcode by the decoder.
--small_headerAll headers written to output file will contain ONLY the description from the barcode FASTA header.
--exact_matchDecodes only perfect matches of the barcodes.
--min_likelihood=fOnly events at or above event_likelihood f will be printed to the output file.

Legal values: 0 <= f <= 10 (For more details about event_likelihood see Algorithmic details ).
--min_quality=fOnly events at or above decode_quality f will be printed to the output file.

Legal values: 0 <= f <= 1 (For more details about decode_quality see Algorithmic details ).
--graphGenerates a graph of the abundance of each barcode in the sequenced data.

--clip
Clips the barcode out of the sequence in the output fasta file. Sequences that were not identified (decoded to *unknown*) will not be clipped.
--min_prob_to_consider Minimum probability of event to take into account during preprocessing. Higher values speed up preprocessing, at the expense of accuracy (For discussion about min_prob_to_consider see Algorithmic details).

Default values:
1e-09 for barcodes <= 10 nucleotides
1e-10 for barcodes of 10 nucleotides
1e-11 for barcodes >= 10 nucleotides

--debug
prints Debug messages to screen.

--man
prints this manual.

Examples
./nxDecoder.pl --file_to_decode=solexa_output.fa --prob_table=prob_table.hash --bc_file=my_barcodes.fa --clip --split
                  
./nxDecoder.pl --file_to_decode=solexa_output.fa --prob_table=prob_table.hash --bc_file=my_barcodes.fa  --start_pos_in_oligo=5 --ligated_with_t --graph --min_likelihood=5 --min_quality=0.8

nxCodeBuilder

General description
nxCodeBuilder is a tool for designing custom error-resistant DNA barcode sets for next-gen sequencing.
nxCodeBuilder is currently optimized for making barcode sets to be used in Illumina's Genome Analyzer - Solexa.

Input:
  1. Barcode length (number of nucleotides).
  2. Expected Accuracy in decode (for more details about this see Algorithmic details).
  3. Probability table in .xls format, comes standard with the package (for further discussion see Algorithmic details).
Examples:
  1. Barcode length: 6, 7, 8 etc.
  2. Expected Accuracy: 0.99, 0.995 etc.
  3. Probability table.

Output:

FASTA file containing the barcode set maid with the given input parameters. This file should be used as input to the decoder when decoding sequenced data obtained using the barcode set. 
(for more details about the methods used for making a barcode set  please see Algorithmic details).

Examples:
  1. output file
Synopsis
nxCodeBuilder.pl --bc_len=i --exp_acc=f --prob_table=s
                 [--start_pos_in_oligo=i] [--too_long_homopoly=i]
                 [--gc_min=f] [--gc_max=f] [--tm_hist]
                 [--seed_index=i] [--ligated_with_t] [--bc_num_limit=i]
                 [--forbidden_seqs=s] [--min_prob_to_consider=f]
                 [--shuffle] [--debug] [--man]
Mandatory input parameters
--bc_len=i Number of nucleotides allotted to the barcode.
--exp_acc=fExpected minimum probability of correct decoding.
Legal values: 0 < f < 1
High values produce smaller barcode sets, and vice versa.

(for more details about exp_acc see Algorithmic details).
--prob_table=sPath and filename of probability table in .xls format

(for more details about about how prob_table is used in the algorithm see Algorithmic details).
Optional input parameters
--start_pos_in_oligo=iPosition of the barcode in the oligo.
Later positions are generally more error-prone, producing smaller barcode sets.
Default: 1
--too_long_homopoly=iDo not consider barcodes with homopolymers of length i or more (e.g., CGAAAATT will not be considered for i=4 or less).
Default: 0
--gc_min=fMinimum GC content of barcode.
Legal values: 0 <= f <= 1
Default: 0
--gc_max=fMaximum GC content of barcodes.
Legal values: 0 <= f <= 1
Default: 1
--tm_histDraw a melting temperature histogram of the produced barcode set, saved to a file called:
tm_hist_hour_min_sec.png
Note: requires 'dan' program from EMBOSS package and GD::Graph::bars module.
Off by default.
--seed_index=iIndex of the first candidate to be considered in the lexicographic search.
Default: 0
--ligated_with_tIndicates that the first nucleotide immediately after the barcode in the oligo is T. Use if T nucleotide was used for sticky-end ligation of the barcodes - improves accuracy.
Off by default.
--bc_num_limit=iIndicates a limit to the number of needed barcodes. The program will stop considering candidates upon reaching that limit (if it is reached).
--forbidden_seqs=sPath and filename to a file containing sequences that must not appear in any barcode (e.g., restriction sites).
Each forbidden sequence should appear on a line of its own.
--min_prob_to_consider=f Minimum probability of barcode distortions that will be taken into account during code production. Lower values increase runtime and code size.

(for more details about min_prob_to_consider see Algorithmic details).
--shuffleConsiders candidate barcodes in random order, not lexicographic order.
--debugPrints debug messages during code construction.
--man Prints this manual.
Examples
./nxCodeBuilder.pl --bc_len=6 --exp_acc=0.999 --prob_table=my_table.xls

./nxCodeBuilder.pl --bc_len=8 --exp_acc=0.995 --prob_table=my_table.xls --gc_min=0.25 --gc_max=0.65             --too_long_homopoly=4

./nxCodeBuilder.pl --bc_len=9 --exp_acc=0.999 --prob_table=my_table.xls --start_pos_in_oligo=4 --ligated_with_t
--forbidden_seqs=./restriction_sites.txt