The Bioinformatics group consists of data analysts, software engineers, and computational biologists and has developed analytical pipelines to manage, store, annotate, and report on data produced by the Illumina sequencing platforms. We employ a combination
of vendor, third-party, and in-house tools and databases to provide data-quality metrics, integrated candidate reports, and relevant biological and clinical context for experimental platform data.
The Bioinformatics team can provide help in:
- Designing custom bait sets
- Experimental design
- DNA sequence analysis including:
- Variant detection (SNV, Indel, CNV, and structural variants)
- Variant annotation
- RNA sequence analysis including:
- Allele-specific expression
- Fusion analysis
- Differential expression
- Sample QC evaluation and troubleshooting
- Customized analyses based on the specific needs of your project
- Specialized analysis methods for:
- PDX models
- Cell-free DNA
- Detection of viral DNA in tumor samples
We also develop new tools and strategies in a research setting that are then translated to the clinic. Our latest developments include BreaKmer for detection of structural rearrangements and RobustCNV for detecting changes in gene copies. In addition,
the team is developing methods to analyze samples derived from PDX models, cell-free DNA, and single-cell sequencing.
In concert with developing new and updated offerings, the bioinformatics group has initiated the process of porting much of its analytics pipeline infrastructure to the cloud (currently Google's GCP) to speed data processing throughput and accommodate
third-parties and collaborators that may primarily maintain an online cloud presence.
BreaKmer (Abo et al 2015) is designed to detect larger genomic structural variations from single sample aligned short read target-captured high-throughput sequence data. It detects variation from sequence reads that result in aligned split-read signatures,
such as inter- and intra-chromosomal rearrangements and insertion/deletion events with sizes that result in split-reads from targeted high-throughput sequence data. Briefly, the method extracts "misaligned" sequences from a targeted region, such as
split-reads and unmapped mates, assembles a contig from these reads, and re-aligns the contig to make a variant call. It classifies detected variants as "insertions/deletions," "tandem duplications," "inversion," and "translocations."
Abo, RP, Ducar, M, Garcia, EP, Thorner, AR, Rojas-Rudilla, V, Lin, L, Sholl, LM, Hahn, WC, Meyerson, M, Lindeman, NI, Van Hummelen, P, MacConaill, LE (2015). BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers. Nucleic Acids Research, 43, 3: e19
Copy number variants are now being identified using RobustCNV, a new algorithm developed at the Center for Cancer Genomics (CCG).
RobustCNV relies on localized changes in the mapping depth of sequenced reads in order to identify changes in copy number at the loci sampled during targeted capture. This strategy includes a normalization step in which systematic bias in mapping depth
is reduced or removed by fitting a model against a panel of normals and by removal of residual GC bias using a loess fit. Normalized coverage data is then segmented using Circular Binary Segmentation (Olshen et al, 2004). Finally, copy number calls
are assigned using an adaptive calling strategy that adjusts calling thresholds based on the post-normalization variability in each sample.
This strategy is most effective when the panel of normals contains samples which have a pattern of systematic bias that closely matches the bias in the tumor samples. In situations where this is not the case, the samples can remain noisy and CNV calls
may be difficult to make correctly. For this reason, normal samples of similar tissue quality, age, fixation, and processing to tumor samples should be included in all studies where identifying copy number variants is an objective.