Researchers at Cold Spring Harbor Laboratory
have devised a pooling-based multiplexing method that allows them to
sequence tens of thousands of samples in a single second-generation
sequencing run, many more than can be done by existing barcoding
methods.
Though the group is not the only one that has developed a pooling
method for multiplexed second-gen sequencing — a Columbia University
team has developed a related approach — it may be the first to explore
its potential for clinical applications, by identifying carriers of
genetic diseases in certain orthodox Jewish communities.
The original motivation for developing the multiplexing method, which was published online in Genome Research
last month, was to be able to cost-effectively sequence genome-wide
collections of short hairpin RNAs contained in bacterial clone
libraries, and to link each sequence back to its clone, according to
Greg Hannon, a professor at Cold Spring Harbor Lab and the senior
author of the paper.
Until recently, he and his team sequenced each clone by capillary
sequencing technology, since second-generation sequencing platforms did
not allow them to link sequence reads back to specific clones. That,
however, was an expensive approach.
"We have spent literally many millions over the years, certainly
more than $10 million, sequence-verifying clones by conventional
sequencing," Hannon said.
Over the last few years, researchers as well as vendors of
second-generation sequencing platforms have come up with barcoding
strategies in which each sample is tagged with a unique oligonucleotide
prior to sequencing. However, generally these approaches only allow
multiplexing dozens to hundreds of samples.
In order to be able to sequence tens of thousand of samples in
parallel, the Cold Spring Harbor researchers decided to mix them in
certain patterns to create pools, where each pool — but not each
individual sample within it — is tagged with an oligo barcode. Since it
is known which pools contain which samples, individual samples can be
assigned to a sequence with high confidence based on the sequence
patterns in the pools.
The strategy the researchers are using to pool the samples is based
on the Chinese remainder theorem, which has been known for almost 2,000
years, according to Yaniv Erlich, a graduate student in Hannon's lab
and the first author of the paper. Another article that focuses on the
mathematical aspects of the method is in preparation, he said.
His aim was to "minimize the amount of robotics we are using, the
amount of sequencing, and the number of pools," he said. This sets the
new method apart from other pooling strategies, he added, for example
those used in BAC pooling, which often try to minimize the number of
pools, thus creating large pools and requiring a lot of robotics time.
In their paper, the researchers employed the Illumina Genome
Analyzer to test their method, which uses 384 barcodes, by sequencing
two libraries, each consisting of about 40,000 bacterial clones and
comprising approximately 20,000 different microRNAs. They achieved
greater than 97 percent accuracy.
At present, they are analyzing libraries with more than 60,000
clones, according to Erlich, and in theory, it is possible to analyze
more than 100,000 samples.
The method, dubbed "DNA Sudoku," is currently best suited to analyze
sequences, or genotypes, that are rare — for example, rare alleles in a
population, or shRNAs in a clone library. "If we have two alleles with
the same frequency, we cannot use this method to distinguish between
these," Erlich said. In addition, sufficient sequencing depth is
necessary to assign sequences with high confidence.
Sequencing technologies with longer reads than the existing ones —
such as the technology developed by Pacific Biosciences — could
eventually enable researchers to analyze more common genotypes because
the long reads "pick up natural variation among individuals" that can
be used to distinguish between samples, according to Ehrlich.
With the new method, which the researchers have patented, it costs
between five and 10 times less to analyze a clone library than by
Sanger sequencing technology, according to Hannon. He said it now costs
between $50,000 and $80,000 to analyze the same number of clones "that
would have constituted a fairly substantially complete library in the
past."
The Cold Spring Harbor scientists are not the only ones to explore
pooling strategies for multiplexed sequencing. Researchers at Columbia
University, for example, have developed a related approach, which also appeared in Genome Research last month.
As part of that paper, the researchers devised a simulation, using
short-read data from one of the pilot projects of the 1,000 Genomes
project, to test how their approach to extract rare variations.
According to Itsik Pe'er, a professor in the department of computer
science at Columbia and one of the authors, the original aim was to
develop a method for resequencing candidate genomic intervals across
hundreds or thousands of cases.
"I believe it is even more exciting for many experiments where
related sequences are to be obtained from many sources in parallel," he
told In Sequence by e-mail.
Since he conducts his research in a computational lab, Pe'er and his
colleagues have not yet used their method in a sequencing project, but
have received interest from others in the approach, he said.
Multiplexed Carrier Testing
Apart from sequencing clone libraries, the Cold Spring Harbor
researchers are also about to test their new method in a project that
involves genotyping large numbers of human samples.
In collaboration with Dor Yeshorim, a New York-based organization
that aims to prevent genetic diseases in participating orthodox Jewish
communities, the researchers plan to analyze several thousand
previously characterized human samples in order to determine their
carrier state for certain genetic diseases
According to Erlich, Dor Yeshorim represents one of the largest
genetic centers in North America, processing more than 20,000 samples
per year. Ashkenazi and Sephardic Jews have an increased risk for being
carriers of a number of recessive genetic disorders, such as Tay-Sachs
disease or cystic fibrosis, and Dor Yeshorim offers members of
participating orthodox Jewish communities with a large percentage of
such carriers to genotype them as young adults.
The organization does not report back the results, but instead
provides participants with a number that encodes the carrier state.
Only if two participants want to get married do they submit their
numbers to the organization to find out whether or not their children
are likely to develop a recessive genetic disease, or whether they are
"compatible or incompatible for the marriage," according to Erlich.
Since the program was started in the 1980s, it has helped to nearly
eliminate Tay-Sachs disease in participating communities, he said.
Under their collaboration, Cold Spring Harbor will analyze several
thousand previously characterized samples provided by Dor Yeshorim and
assess whether or not they are carriers for certain genetic diseases.
"The vision is to take 10 loci, 8,000 specimens, [and] sequence them in
one Illumina run," Erlich said. The panel of genes to be tested could
be increased in the future, he added.
One part of the project is to validate the new method, and to
compare the sequencing-based results to those derived from standard
genotyping. Another part will be to identify new causative mutations in
cases where a disease allele is known to exist but the precise mutation
is unknown, according to Hannon.
Another possible application of the multiplexed sequencing method is
in HLA testing, according to the researchers, although this will be
more difficult because the state of both alleles in the genome needs to
be inferred, and because complex haplotypes need to be reconstructed
from short reads. "Theoretically, it should be feasible," Erlich said.