Blogging the biotechnology revolution

Systems Biology is changing the way biology is done. Is it a fad or is it effective? This blog tracks current happenings and helps you stay on top of the field. You can find a list of relevant papers at systems biology paper watch Have you heard a talk or read a paper in bioinformatics / systems biology you would like to tell other people about? Email: bioinfblog@gmail.com and get the word out!

Monday, December 03, 2007

RECOMB satellite systems biology in La Jolla, CA.
Thanks to Rohith and Han-yu for blogging the talks at this conference! This conference brings together the systems biology community with selected papers published in Molecular Systems Biology.


The keynotes:

Trey Ideker - Comparative Genomics

Vista rears its ugly head as Trey's talk is postponed due to technical difficulties! And we're off. The well known-explosion in sequence specific information has been coupled with a less-known exponential rise in interaction data. Sequence data has served as the catalyst for many interesting avenues of research. How can the interaction data be used as well? We're given a survey of such applications:

(1) DNA Damage Response --- Utilizing ChIP-CHIP assays in combination with knockout experiments to derive a basic molecular interaction network which governs the transcriptional response of a cell to DNA damage. There could be many false positives, but it provides a great first generation systems-level examination of an interesting phenotype.

(2) Cancer diagnosis --- Many previous attempts have been made at classifying different types of cancer (in this case breast cancer metastasis) using the expression level of genes. Two studies completed over the past 7 years two groups have come up with different sets of genes which correctly classify patients into one of two types of breast cancer. Intriguing fact, there exists very little overlap between these two sets of genes, can this be cleaned up? Combine a molecular interaction network with patient expression data to find sub-networks of genes whose expression is discriminative between the two types of cancer. These modules display increased reproducibility and improved classification accuracy; however, both measures still remain slightly low. However, most interesting is that we are recovering true causative genes, as opposed to downstream perturbed genes! One question, can we discern which are causative versus down-stream modules?? The future will tell.



Manolis Kellis - Regulatory Network Inference

Can comparative genomics be used to annotate specific functions for a particular region of a genome? A pretty interesting look at how evolutionary signatures can be used to discern regulatory motifs. Individual motifs seem to be regulated well across the genome, hmm....how is this "motif" defined as (sequence, alone?)? Does regulatory sequence == promoters? Anyways, there seem to be positional biases in the presence of the regulatory motifs. Some of these motifs overlap with microRNA sequences. They've developed a framework for identifying the microRNA signatures and thus discover new miRNAs and new categories of microRNA families. A pretty surprising discovery: the anti-sense strand of an miRNA may provide more powerful regulation. His lab has pieced together these evolutionary signatures and the discovered motifs to piece together a regulatory network that covers over 80-% of the transcription factors and many of the edges are supported both by literature and co-expression experiments. Seems like a pretty unique way to generate a regulatory network. Just have some questions as to the definition of these "sigantures" and "motifs". Can this approach be extended to humans?


Timothy Hughes --- Cracking the Second Genetic Code

An exciting title to be sure. What is the second genetic code? Answer: Protein-DNA Interactions! Determining the sequence specificity of a DNA-binding protein can be "cracked" by analyzing the sequences that your protein of interest binds to. Utilizing a unique array, where DNA sequences in lengths of 8-mers are represented in many different larger contexts (32-mers). The plan is then to express the DNA-binding portion of your favorite protein and run it through this custom array. They've generated a large data-set comprising DNA binding motifs for over 100 transcription factors. They then clustered the TFs based on their DNA binding sequences and found that they're mostly all different (clusters have low membership). They refine this data set by then examining particular 6-mers within the sequence and find that you can find some similarities between TFs that wouldn't have been found on the larger scale.



Ulrike Gaul - Decoding Transcription Control in Drosophila Segmentation

To study the evolution of segementation modules in fly, Dr. Gaul and coworkers started from scanning the genomic sequences for cis-regulatory elements which minimize the free energy between DNA and known segmentation TFs, Using known binding preference for maternal and zygotic gap transcription factors. The identified binding sites surprisingly do not correlate with conserved blocks in genomes. They further showed that expression change correlate with binding site turnover but not sequence turnover, suggesting that sequence conservation is not a reliable indicator of functional conservation. Combined with TF distribution in AP formation measured in lab, a thermodynamic model was developed to predict expression levels of target genes. A substantial amount of the predicted expression levels were validated in lab. (we got a little lost after this...)



Naama Barkai - Evolution of Gene Expression

Here we get a talk regarding the "logic" of gene expression control, How can we consider the interplay between a phenotype, gene expression, and environmental control? They begin by examining ribosome synthesis, why? Because it's one of the dominant expenditure of biosynthesis energy (interesting stats, ~2000 ribosomes produced per minute). Makes sense to use this as a benchmark in examining growth rate, as ribosome biosynthesis will be expected to correlate to growth-rate. In order to see how gene expression may be correlated with various growth rates, they've used yeast grown in chemostats wth different diution rates in order to produce yeast with varying growth rates. The mRNA of these various yeast "strains" can then be sampled and examined. Again, ribosomal protein expression is used as a proxy for cell growth. But one problem how do we isolate internal signals from external (environmental signals): by "confusing" them. Don't examine chemostate at steady-state, but create a pulse-like stress state and make sure to measure both growth rate and transcriptional profile. As one would expect we witness a delayed growth decline following the perturbation along with a simultaneous decline in ribosomal biogenesis and ribosomal proteins. However, in other perturbations such as heat shock or NaCl addiction, we see a drop in ribosomal biogenesis and ribosomal proteins before an accompanying drop in growth rate.


Mark Vidal - Interactome Networks

Start with a reminder that all of biology comes down to the chemistry. Our graph abstraction of interactome networks reduces the elegant complexity of ribbon-and-wire models of protein interactions. So how do we go about detecting these pair-wise interactions: a throughback to the original Fields and Song Y2H paper. Interactome maps have evolved continuously over time encompassing every increasing number of nodes and covering increasingly more complex organisms. The value we gain from networks increases when we begin to "color" or annotate both the nodes as well as the edges, in order to allow for actual biological facts to be extracted from them. Shift to the topic of a human interaction network. First, up: what is the size of this network? How many genes are contained in the human genome? MAPPIT -- a new technology to examine interactions, based not on transcription factors, but rather a membrane-protein based platform. Final messages: human binary networks contains on the order of 1 million interactions! High throughput data can be higher quality then curated low throughput data. Some interesting comparisons between the time frame it which took to go from the development of the Sanger method to large scale genome sequencing (~15 years), is almost equivalent to the same time frame it took to go from the development of the Y2H method and the production of large scale interactome networks. Now, an overview of disease in relation to interactome networks:

(1) Inherited Ataxias --- Ataxia nodes have a much smaller mean path length then non-disease nodes.

(2) Diseaseome (a new ome!) -- all nodes colored by the type of disease that gene is known to be associated with. We then see cliques enriched for a particular color or disease.

(3) Epstein-Barr virus and virus human protein interaction maps --- Examination of viral proteins and their interaction with host proteins. EBV proteins are found to interact preferentially with human hub proteins!

What is the fundamental unit of human networks? The gene (protein) -- node OR is it the edge?

Examining edge perturbations versus node perturbations might help us to acquire new insights in the future.



Arul Chinnaiyan - Bioinformatics as an Engine for Oncology Discovery

Cancer has been suggested to be a disease of pathways. How to use bioinformatics as a discovery tool to molecular mechanism, as opposed to nominating genes, is the question Arul wanted to address. They has tried to understand the aggressive behaviors of cancers by profiling different stages of prostate cancer. Traditional high-throught put biology has been done in isolation. They pursued an integrated view to link different studies in order to finding overlap between molecular concepts which are sets of co-expressed genes under different conditions. By connecting molecular concepts which shared significant proportion of genes, they drew a molecular concept map and used it to model prostate cancer progression. They then profiled prostate cancer tissues using ChIP-chip to identify repressed genes and found that this set of Polycomb repressed genes are correlated with patient survival and shared by other aggressive cancer types. Shift the gear to an important discovery of oncogene fusion. Before their studies, it is generally believed that gene fusion is not a common event in epithelial tumors such as prostate cancer. The most famous gene fusion example is BCR-ABL in CML caused by translocation between two chromosomes. They came up a sample outlier detection method for gene expression profiles, called COPA. COPA was performed in a compendium of cancer expression profiles to identify genes which has heterogeneous expression across patients of same type of cancer. The key discovery is when they went back to sequence the DNA sequences of the identified genes in prostate cancer patient. They indeed showed that the heterogeneous expression of identified ETS genes were caused by fusion with TMPRSS2. As the fusions are specific for prostate cancer, they are currently setting up screening protocols for diagnosis and prognosis, and further search for compounds which can inhibit the gene fusion for rational drug design.