Computational Biology
Designing Tools to Unravel Transcription Factor Interactions
Genome-Wide
As a PhD graduate from Stanford University, computational
biologist Xiaole (Shirley) Liu set her sights on joining an institution
with a biology specialty, first-rate mentors, and the resources
needed to start her career. In 2002 she joined the Department of Biostatistics and Computational Biology at Dana-Farber.
One of Liu's earliest mentors at Dana-Farber was
physician-scientist Myles Brown, MD, now a colleague and frequent collaborator. The
Brown lab, which studies the function of hormone receptors in human
cancers, had recently conducted a genome-wide analysis of the
interactions between estrogen receptor (ER) and the DNA sequences
it recognizes. ER, a transcription factor that is overactive in
about 70 percent of breast tumors, binds to specific DNA sequences
called cis-regulatory elements, and acts as a master on-off switch
for target genes. The sophisticated technology Brown used in the
study contained probes for all the non-repetitive human genome
sequences at 35-base-pair resolution, resulting in an avalanche of
data.
Brown turned to Liu to tunnel through the data to locate all of
ER's binding sites and the genes they up-or-downregulate. Liu and
colleagues designed data analysis and modeling algorithms
specifically for the project. Using these tools, they discovered
several thousand authentic ER binding sites in previously
unexplored regions of the genome and mapped these sites to the
genes they control. Surprisingly, the vast majority of these
binding sites occurred not in promoters, but in enhancers, tens to
hundreds of kilobases away from their targets.
Furthermore, through integrative modeling of myriad data sets
(e.g., binding, gene expression, and genomic sequences), Liu's
group showed for the first time that even ER binding sites distant
from genes are still functional. Later studies from other groups
validated these findings and demonstrated that transcription factor
binding to thousands of enhancer regions in the genome is the norm,
not the exception.
Liu and colleagues also analyzed the enriched sequence patterns
around ER binding sites and identified ER's collaborating partners,
other transcription factors that cooperate with ER and correlate
with the ER level in breast tumor samples. Remarkably, the
collaborating partners for up-regulated genes were distinct from
those for late-response downregulated genes.
"Biologists can generate massive amounts of data in a few
weeks," says Liu, "but analyses can take months or even longer."
The ER data set, for example, took three postdoctoral fellows
almost two years to unravel. Liu and colleagues are now building
tools to automate the process and to create a knowledge base for
storing genome-wide interaction data. Called the cistrome
(cis-elements bound by transcription factors across the genome),
these tools will soon be available to Dana-Farber scientists
through a Web server and, in a year, to investigators
worldwide.