The NIH Catalyst, September

T H E N I H C A T A L Y S T	S E P T E M B E R – O C T O B E R 2005

NHGRI Hosts 'Geek-Fest' To Start the Ball Rolling
STEP 1: SEQUENCING THE HUMAN GENOME —
STEP 2: INTERPRETING IT — ENTER ENCODE

by Jim Swyers

Brainstorm: More than 100 genomic scientists from three continents met in Rockville for three days in July to plumb the depths of the human genome sequence

In 2000, a small group of computer scientists working in relative isolation at the University of California, Santa Cruz, developed a fairly simple yet powerful software tool to assemble the almost-complete human genome sequence.

Now, five years after the completion of an initial draft of the human genome sequence, it is no longer possible for any single group of individuals—or scientific discipline for that matter—to make sense of all the data that have been amassed about the human genome.

Rather, deciphering the information embedded in the 3-billion–base human genome sequence requires the combined intellect and talents of multidisciplinary teams of scientists working collaboratively worldwide.

An initial surprise about the human genome is that instead of the expected 100,000 or so genes, it appears to contain only about 20,000 to 25,000 genes. Moreover, most of the estimated 5 percent of the human genome sequence believed to be functionally important based on evolutionary conservation does not encode protein.

Establishing functions of these conserved noncoding sequences (as well as other functional regions that are not evolutionarily conserved) represents a high-priority goal of genomics programs worldwide.

In an effort to identify all the functional elements in the human genome, NHGRI recently launched the ENCODE (Encyclopedia of DNA Elements) project. ENCODE involves the large-scale generation of experimental and computational data and the rigorous integration and analyses of the results.

In the initial phase, an international group of investigators—the ENCODE Consortium—is focused on analyzing the same set of 44 genomic regions that together account for 1 percent (~30 million bases) of the human genome.

Informing the ENCODE Project

To establish how best to analyze the large amount of data already generated by ENCODE, NHGRI’s extramural and intramural research divisions, along with the NIH Intramural Sequencing Center (NISC), a major participant in ENCODE, jointly sponsored an intensive three-day workshop for more than 100 ENCODE consortium scientists from leading public and private genomic research organizations in the U.S., Canada, Europe, Japan, and Singapore.

The objective of the workshop was to begin the detailed analysis required to evaluate and compare the effectiveness of the many different technologies currently being used to find functional elements in the human genome.

"All we ask," NHGRI Scientific Director and NISC Director Eric Green told the assembled group at the workshop onset, "is that you be flexible, spontaneous, and productive."

The proceedings—affectionately dubbed the "Rockville Geek-fest" by NHGRI Director Francis Collins in tribute to the fact that most participants were bioinformaticians and computer scientists—were designed to catalyze the rigorous analysis of ENCODE data and to help assess progress in interpreting the targeted 1 percent of the human genome with respect to functional sequences.

"Your job is to compare all of the different methods used to date in studying this 1 percent of the human genome, establish what you have and have not learned, and speculate about which approaches are ready to be used in analyzing the entire human genome," Collins said.

Ewan Birney, a senior scientist at the European Molecular Biology Laboratory–European Bioinformatics Institute and coordinator of ENCODE data analyses, said that the ultimate goal of the workshop was to solidify the functioning of five distinctive but collaborative groups that would eventually lead the ENCODE Consortium in writing "high-impact" research papers about the areas of study addressed by ENCODE.

Gerps and targs, anyone?: One of the five analysis groups work in real time during smaller group breakout sessions

The five analysis groups—formed before the workshop and composed of consortium members—were assigned to study

Sequence alignments and conservation

Genes and transcripts

Transcriptional regulation

Chromatin structure and replication

Sequence variation.

The goal is for each to submit a major paper for publication within the next year.

A Casual But High-Stakes Affair

Over the course of the three days, participants alternated between meeting in their separate groups and massing for presentations and question-and-answer sessions.

Clad in shorts and tee shirts, and continuously consuming high-carbohydrate snacks and soft drinks, participants spent much of their group sessions staring intensely at their laptop screens, occasionally breaking their concentration to ask other group members about a particular sequence alignment or how a particular piece of data was derived.

Words like "transfrags," "cagetags," "ditags," "gerps," "targs," "bincons," and "phastcons" were the major lexicon, and "USB thumb drives" were the major currency.

Although the typical 16-hour days at the workshop were grueling for many participants, most said it was well worth the sacrifice because they were getting the opportunity to work side-by-side with collaborators they had never before met in person. Many also said that the stakes were too high not to be involved firsthand in the proceedings.

Garry Cutting, professor of pediatrics at the Johns Hopkins University School of Medicine in Baltimore, for instance, was thinking about his cystic fibrosis (CF) patients when he emphasized the need to understand better the genetic regulation of the CF gene, which resides in one of the ENCODE target regions.

"Even though the cystic fibrosis gene was discovered 16 years ago," Cutting said to his workshop colleagues, "we still do not understand what sequence elements regulate its expression. If we did, we might be able to use those elements in designing treatments for patients."

He noted that although many mutations have been identified, "we also have patients whose mutations have escaped detection. Their mutations must exist somewhere other than where we are looking."

In the realm of his particular primary interest, then, he expects that ongoing gatherings will yield nothing less than "discovering the location of all the elements that regulate transcription of the CF gene."

The Wave of the Future

Workshop organizers were similarly optimistic at the conclusion of what was considered a quite productive experience, agreeing that collaborations and large-group interactions are the wave of the future in genomics research.

"The human genome is so complex that its full interpretation will require the hard work of large, diverse teams of energetic investigators. This just can’t be done by isolated groups any longer," observed Elise Feingold, NHGRI extramural program director and co-coordinator of ENCODE.

"We see this as only the beginning of an extraordinarily important set of collaborations—I am tingling with excitement about all of the positive outcomes of this meeting," Birney said. Collins agreed that there would be much to expand upon for years to come "when all the dust from this workshop settles."

For more information about the ENCODE project, see the website.

Return to Table of Contents

NHGRI Hosts 'Geek-Fest' To Start the Ball Rolling STEP 1: SEQUENCING THE HUMAN GENOME — STEP 2: INTERPRETING IT — ENTER ENCODE

NHGRI Hosts 'Geek-Fest' To Start the Ball Rolling
STEP 1: SEQUENCING THE HUMAN GENOME —
STEP 2: INTERPRETING IT — ENTER ENCODE