Computational Biology
Applying Quantitative Sciences and Information Technologies to
Answer Biological Questions
Featured here are two examples where computational biology is
helping to make sense of an increasingly information-rich
environment.
Insights into Stem Cell Differentiation
The laboratory of stem cell biologist Stuart Orkin, MD, chair of the Department of Pediatric Oncology, identified a novel enzyme that safeguards the
identity and pluripotency of embryonic stem (ES) cells. Together
with computational biologist
Guo-Cheng ("GC") Yuan, PhD, of the Department of Biostatistics and Computational Biology, Orkin is defining the novel
mechanisms that regulate cellular identity and cell-fate switching
during ES cell development. These rely on an understanding of the
molecules that play essential roles in development and in the
proliferation of tumor cells.
Histones, the spool-like proteins around which DNA winds to form
chromatin, are critical to embryonic development because they
undergo methylation and other epigenetic modifications that affect
gene expression. Enzymes, called methyltransferases, transfer a
methyl group onto the tails of histones at fixed locations on
chromatin. This marks the gene at that site for repression or
activation, depending on where methylation occurs in the histone.
One of the protein complexes that synthesize these methylation
marks is Polycomb repressive complex 2 (PRC2), which acts as a
master epigenetic regulator of ES cells. To maintain pluripotency
of the cell, PRC2 binds to developmental genes and mediates
methylation via the EZH2 methyltransferase within the complex,
thereby repressing differentiation genes. When the cell is destined
to develop into different lineages, however, PRC2 de-associates
from its target genes, allowing them to be fully expressed and for
differentiation to occur.
Until recently, scientists believed that EZH2, which is
up-regulated in some cancers, was the only enzyme directly
responsible for methylation on histone H3 lysine 27 (H3K27). Then
Xiaohua Shen, PhD, a research fellow in the Orkin laboratory,
identified a new methyltransferase, EZH1, which is homologous to
EZH2 and able to transfer methylation marks to H3K27. After
interrogating a 45-million-probe microarray to locate the marks and
target genes of both EZH1 and EZH2, Orkin turned to Yuan to analyze
the enormous data set. "Traditional biochemical and molecular
analyses are rudimentary compared to what a true computational
biologist can do," says Orkin.
Yuan and his postdoctoral fellow, Yingchun Liu, PhD, sifted
through the dizzying array of numbers in order to find the loci
where the EZH proteins bind, to map these loci to their chromosomal
locations, and to search complex databases to uncover the genes at
those sites. They also assessed whether the binding sites of EZH1
correspond to the same genomic regions as the H3K27 marks; indeed,
both EZH1 and EZH2 co-localize with H3K27 on chromatin. Genome-wide
study and computational analysis thus confirmed biochemical and
genetic evidence that EZH1 compensates for, and complements, EZH2
by targeting the same genes. Interestingly, in cells lacking EZH2,
only one-third of target genes retained H3K27 marks due to the
presence of EZH1. These genes were more often associated with
lineage differentiation, while genes losing H3K27 marks were
associated with non-developmental functions.
"Other scientists believed that EZH1 has no role whatsoever,"
says Orkin, who is renowned for defining the transcription factors
governing differentiation. "But with GC's help, we made a solid
case to the contrary. This collaboration gave us real confidence in
our data and insight into the function of these genes."
More exciting work will happen in the next step of this
research, says Yuan. "The real question is: how does EZH1 know
which set of genes to target when EZH2 is depleted?" It's a mystery
he hopes to help solve using a new computational method he
developed for other purposes, but has since adapted to study
Polycomb binding.
"The identification of EZH1 as a novel methyltransferase acting
on H3K27 demonstrates the diversity in mammalian Polycomb
repressive complexes," explains Orkin. "This discovery should set
the stage for new developments in the role of chromatin in stem
cell pluripotency and cancer biology."
Profiling Ovarian Cancer
One day, after consoling yet another patient whose ovarian
cancer had stubbornly resisted platinum agents, clinical
investigator Ursula Matulonis, MD, of Medical Oncology, sought the expertise
of computational biologist John Quackenbush, PhD, of Biostatistics and Computational
Biology. She wanted to apply modern molecular techniques, such as
DNA microarrays, to understand platinum resistance in ovarian
cancer, the most deadly gynecologic malignancy. The
cross-disciplinary partnership that the two investigators forged
that day has reached beyond the laboratories of Dana-Farber to
produce the largest set of ovarian cancer genomic profiles to
date.
Ovarian cancer samples were evaluated by genome-wide gene
expression profiling and the data subjected to principal component
analysis. The ovarian cancer samples grouped into three distinct
classes, shown here in red, blue, and green.
Although their pilot project was limited by the small number of
fresh-frozen tumor samples available, serendipitous events enabled
the two investigators to dramatically scale up their joint effort.
Quackenbush happened to meet a former colleague, now at Illumina,
who had developed a new gene expression assay for paraffin-embedded
tissues. Meanwhile, Matulonis was working with pathologists Ronny Drapkin, MD, PhD, of Medical Oncology, and Michelle
Hirsch, MD, PhD, of Brigham and Women's Hospital, who had recently
used paraffin-embedded tissues from a tumor bank to build a tissue
microarray – a paraffin block of 100 or more microtumor cores on a
single slide. Quackenbush and Matulonis quickly recognized the
power of combining these two new resources. "We thought that if the
results of our gene expression assays could be confirmed using a
simple antibody test on the tissue microarray," says Quackenbush,
"we might have a test with the potential for immediate clinical
impact."
With the pathologists joining the partnership and Illumina on
board, the study began in earnest. Quackenbush's team extracted DNA
and RNA from the same tumor samples used to create the tissue
microarray and sent the purified nucleic acids to Illumina. The
company generated data on mRNA, microRNA, copy number variation,
and DNA methylation, which it returned to Dana-Farber for
analysis.
Discoveries deriving from these data – such as markers
indicating that a patient is likely to become platinumresistant or
platinum-sensitive – may lead to more effective diagnostics and
treatments. One early discovery showed that tumor samples separate
into distinct molecular subgroups, a finding which may guide future
treatment of patients with ovarian cancer. Investigators are now
examining whether these subgroups are associated with outcomes or
other clinical measures. The team is also analyzing RNA and
microRNA (which can bind to mRNA, targeting it for degradation) to
look for anti-correlations, instances where an increase in microRNA
decreases mRNA. "Integrating these two types of data may provide
more complete information than analyzing either type alone,"
Quackenbush explains. "Most importantly, Dana-Farber now has the
ability to address basic questions about ovarian cancer."