• Computational Biology

    Designing Tools to Unravel Transcription Factor Interactions Genome-Wide

    As a PhD graduate from Stanford University, computational biologist Xiaole (Shirley) Liu set her sights on joining an institution with a biology specialty, first-rate mentors, and the resources needed to start her career. In 2002 she joined the Department of Biostatistics and Computational Biology at Dana-Farber.

    One of Liu's earliest mentors at Dana-Farber was physician-scientist Myles Brown, MD, now a colleague and frequent collaborator. The Brown lab, which studies the function of hormone receptors in human cancers, had recently conducted a genome-wide analysis of the interactions between estrogen receptor (ER) and the DNA sequences it recognizes. ER, a transcription factor that is overactive in about 70 percent of breast tumors, binds to specific DNA sequences called cis-regulatory elements, and acts as a master on-off switch for target genes. The sophisticated technology Brown used in the study contained probes for all the non-repetitive human genome sequences at 35-base-pair resolution, resulting in an avalanche of data.

    Brown turned to Liu to tunnel through the data to locate all of ER's binding sites and the genes they up-or-downregulate. Liu and colleagues designed data analysis and modeling algorithms specifically for the project. Using these tools, they discovered several thousand authentic ER binding sites in previously unexplored regions of the genome and mapped these sites to the genes they control. Surprisingly, the vast majority of these binding sites occurred not in promoters, but in enhancers, tens to hundreds of kilobases away from their targets.

    Furthermore, through integrative modeling of myriad data sets (e.g., binding, gene expression, and genomic sequences), Liu's group showed for the first time that even ER binding sites distant from genes are still functional. Later studies from other groups validated these findings and demonstrated that transcription factor binding to thousands of enhancer regions in the genome is the norm, not the exception.

    Liu and colleagues also analyzed the enriched sequence patterns around ER binding sites and identified ER's collaborating partners, other transcription factors that cooperate with ER and correlate with the ER level in breast tumor samples. Remarkably, the collaborating partners for up-regulated genes were distinct from those for late-response downregulated genes.

    "Biologists can generate massive amounts of data in a few weeks," says Liu, "but analyses can take months or even longer." The ER data set, for example, took three postdoctoral fellows almost two years to unravel. Liu and colleagues are now building tools to automate the process and to create a knowledge base for storing genome-wide interaction data. Called the cistrome (cis-elements bound by transcription factors across the genome), these tools will soon be available to Dana-Farber scientists through a Web server and, in a year, to investigators worldwide.

  • Email
  • Print
  • Share
  • Text
Highlight Glossary Terms