A self optimized base caller for Illumina/Solexa Genome Analyzer (patent pending)

What is Alta-Cyclic?
Alta-Cyclic is a novel Illumina Genome-Analyzer (AKA: Solexa) base-caller that allows more accurate and longer reads.

What is Alta-Cyclic license?
Alta-Cyclic is available free for academic, non-profit and personal use. See the license here.

I am not falling in any of the categories for license?
Contact us for licensing information.

How do I cite Alta-Cyclic?
Please cite: Erlich Y, Mitra PP, Delabastide M, McCombie WR & Hannon GJ. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods. 2008 Jul 6.

Where can I get some background about next generation sequencing technologies?
I recommend Nature Methods primer for newbies and this great paper by Michael Metzker for more technical-oriented readers.

What are the main differences between Alta-Cyclic and Illumina base-caller?
There are several difference between the base callers. Alta-Cyclic optimizes the parameters before calling. It actually tests different combintation and checks which one works. In addition, Alta-Cyclic corrects the fluorophore cross-talk in a dynamic fashion, and not a assign a static cross-talk according to values that are collected an the begining of the run. This is done using a machine learning approach (SVM). Finally, Alta-Cyclic calculates the phasing parameters according to the last cycles of the run using a novel model.

Do you have any stand-alone version of Alta-Cyclic?
Sorry. We have only a cluster-based version for Alta-Cyclic.

What kind of job submission should my cluster use?
Alta-Cyclic suppotrs SGE (Sun Grid Engine).

I have a small/weak cluster size and I noticed that you used a fairly large and strong one. Can I still use Alta-Cyclic?
Yes. The run time will be longer though.

Notice that Alta-Cyclic gains some speed in the training stage by killing jobs that takes 'too much time'. These jobs usually represent 'bad' coordinates (like completly wrong phasing parameters). In current version (v0.1.0), 'too much time' is hard coded. You can tweak it in very simple way:

(1) Open in a text editor
(2) Go to the line: Readonly my $D_LIFETIME_FOR_OPT and increase the time.
(3) Do the same thing for $D_LIFETIME_FOR_ALL_CYCLE.

What is this two bit (2bit) format that you are using for the reference genome, and how can I convert a genome in a FASTA format to such format?
You can get more details here about this format. Jim Kent developed a command line tool for conversion from FASTA to 2bit that you can get here .

What is the "--is_reference_genome_local" option in brave_heart?
This option indicates for brave_heart whether the folder that contains the refernce genome is only installed on the submission node ('Y') or it is already installed on the remote execution node.

What is the "--not_nice" option?
In order to constantly improve the computational algorithm and training sets, Alta-Cyclic reports the parameters that were found in the training process back to CSHL. If you prefer not to reports these parameters use the '--not_nice' option.

When should I use the '--auto' option for karma?
If you have a GAII machine and you want to call all lanes, you should do it. If you have GAI machine or you want to read some of the lanes (for instance, 1-3), you can use the '--auto' option, but you must set the number of tiles or lanes.

I keep getting error meassages like:
 Content of Cluster/Errors/
 +only 46236 lines were collected. Expected:75000
 at /tmp/6669723.57.public.q/ line 58
 main::main() called at /tmp/6669723.57.public.q/ line 37
 3 messages
 Cleaining file
What do I do wrong?
Alta-Cyclic has an expectation regarding the size of the training set for each stage of the training. If this size does not meet it screams but not kills itself. If the number of lines is very low (say 20000), you might want restart and increase it using the '--maximum_lines_for_training' option. Notice that the number of '--maximum_lines_for_training' is not the size of the training input! This is the size of the starting sample. The actual training set size is smaller since some sequence reads for training contain 'N' and other does not pass the alignment threshold.

How can I recover from a broken run?
Simply run: brave_heart --load Path_to_Training_Folder_of_Broken_Run/egg. Alta-Cyclic will identify the last state that was saved and the location of the training files and will continure from the latest possible step.

Why do I get this error message:
Error in option spec: "random_walk=f{2}"
You are running an Alta-Cyclic version that is older than v2.2. In addition, you should upgrade your perl package Getopt::Long to be at least with version 2.35 (see next question).

Why do I get this error message:
Getopt::Long version 2.35 required--this is only version 2.34
Your current version of Getopt::Long does not support advanced options that Alta-Cyclic requires. Please update your version by:
1. sudo cpan
2. install 'Getopt::Long'
You are ready to go!