Blogging the biotechnology revolution: December 2005

"An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets" by Daniel Schwartz and Steven Gygi, Gygi lab at Harvard Medical School, published in Nature Biotechnology on November 4, 2005.

[Historical Digression] Edwin Krebs and Edmond Fischer, Nobel Laureates for discovering the regulatory role of protein phosphorylation in the 1950s. Courtesy of the UW Department of Biochemistry.

This paper presents the first method for finding motifs for phosphorylation sites in proteins. Recent refinements in strategies to identify phosphorylation sites, and the resulting increased number of known phosphorylation sites, have made this new "substrate-driven" approach possible. Using a phosphorylated peptide data set and a background peptide data set, the algorithm calculates the probability of observing residue X at position Y in the phosphorylation motif, given a background probabilty of Z calculated from the second data set. It then constructs the motif in a recursive building procedure, in which statistically significant residue/position pairs are identified (recursive motif building, step 1). Next it removes from the phosphorylation and background data sets all the sequences containing the motif (set reduction, step 2). Steps 1 and 2 are repeated until step 1 reveals no further significant residue/position pairs, leaving a final set of significant motifs.

The biological problem addressed here is an important one, and its solution will hopefully lead to the direct discovery of potential phosphorylation sites in proteins of interest. However, the method presented is quite naive. The authors seem to recognize this and describe it as a "starting point for future research" in the Outlook section. I think refinements like taking structure into account or grouping amino acids according to acidic/basic, hydrophobic/philic would be appropriate to improve performance.

"A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis" by Janes et al. from the Lauffenburger and Yaffe groups at MIT published in Science December 9, 2005.

This work explores the pathways to cytokine induced apoptosis by perturbing the inputs (in this case TNF, EGF, and insulin) to the well characterized apoptotic pathway. By perturbing the inputs and measuring various outputs they build a predictor of the apoptotic outputs based on partial least squares. They find that this works pretty well and that breaking down the matrices in this using principal component analysis produces a "basis set" of proteins that seems to be most important in predicting apoptotic outputs. Using this approach, they can implicate molecular circuits previously unknown to be involved in this specific signaling network.

Overall it was a good paper, a bit involved, but really showed a good marriage of data mining paired with a (more or less) specific biological question. People would say that this is a systems biology paper, and it is somewhat by its data-driven approach, but I would argue that the cellular modeling that they did was specific to apoptosis and a specific biological question.
If you're interested in cellular modeling, I would highly recommend this paper.

Sydney Brenner, The Salk Institute
"What is a Gene"

Sydney Brenner has been called "the Thomas Jefferson of molecular biology," with good reason.

Much of his talk described the hurdles that the pioneers of genome sequencing had to overcome. It is hard to grasp the progress this field has made in merely 50 years since the double helix. In the beginning, when they met to discuss the sequencing of the human genome, they only had the phage lambda, which was ~40,000 bp. They were planning 3 billion! It was an enormous task. But they accomplished it by adhering to standards such as keeping no data in the database that's not CAP: complete, accurate, paramount.

He discussed genomics as a lesson in science. Now we see -omics as a copy of the approach to genome sequencing: getting massive, rich, digital data. Get things you can count. It fits the CAP rule!

In illustrating genomic complexity, he explained that there are on average 5 different gene instantiations for each locus. Additional complexity comes from different cell types, which are abundant in the brain.

Some controversy: he thinks whole concept of systems biology is wrong. Instead, he supports the mantra “Least is Best,” i.e. quality and directed experiments are better. To paraphrase, "You don’t learn to play the drums by listening, you go in there and learn the notes."

He appeared weary of the biotechnology business, calling the two greatest evils 1) venture capitalists and 2) the CEOs of those venture capital companies. He went on to say, “The only thing worse than scientists doing business is businessmen doing science."

J. Craig Venter, J. Craig Venter Institute
"From Reading to Writing the Genetic Code"

Talked about the Sargasso gene sequencing in the beginning. So many sequences make genbank look small in comparison!

Trying to design the minimal cell based on mycoplasma genitalium (smallest genome).
Got so far as to reconstruct viruses, using plasmid-based cassette synthesis. The paper was in PNAS.

But to get bigger synthesis they need new technology. Inspiration comes from Deinococcus radiodurans ("Conan the Bacterium"), which can undergo massive amounts of radiation and still live. Basically, the genome is blown to bits from the radiation, but an amazing homologous recombination reassembles the genome. They are trying to use this type of approach to synthesize genomes from short fragments and assemble them using the same mechanisms. It's really an interesting method that I hope works. Has anyone seen any work done so far on this?

J. Craig Venter looking crafty

Tomorrow the bioinfblog team will be at the Venter meeting "Celebrating a decade of genome sequencing" here at UCSD. There are many notable speakers, including Lucy Shapiro, Tony Hunter, Sydney Brenner, and of course Craig Venter. Hopefully we will be able to get some pictures to share!
Check out the details at: http://pkr.sdsc.edu/venter/index.php

Manolis Kellis, MIT
"Regulatory network discovery and evolution"
First part of the talk was comparative genomics techniques for short regulatory motifs. Since motifs are small ~8bp and degenerate this is hard without comparative techniques.
Genome wide motif discovery by looking for over represented n-mers. They did an apparently simple counting technique that seemed to work well.
Promoter motifs are symmetrically conserved in both forward and reverse strand. This is not true for 3'-UTR motifs (which is coordinate with their role in post transcriptional modification.
3-UTr are usually ~8 long and end in A which alludes to miRNA them being miRNA targets. Scanning the miRNA database confirms this.

went back and looked for miRNAs that are complements for the 3-utrs they find. They select for those that form stem loops, in total they find 258 candidates of which 114 confirm known miRNA genes. Estimate that 20% of human genes are targeted by miRNA.

Use full genome duplication as a model for network evolution using motif based networks. Duplication in yeasts (K. waltii and cerevisiae) , pufferfish and even human (plos 2005).
Duplicated genes show accelerated divergence and that in 95% of theses cases on copy evolves faster than the other.
Using a simple scheme for modeling interactions gain/loss they can model the motifs of interactions in duplicated genes.
Noted that paralogs are more likely to interact with each other. This may be due to proximity or redundancy.

Michael G. Rosenfeld, UCSD
"Coactivator and Corepressor Networks Integrated Transcriptional Response Programs"

Transcriptional regulation is done by exchange of activators and co-repressors on the binding sites.
Uses a variation of chip-chip called chip-dsl-chip where they biotinynlate the RNA and use 40 mer primers to ligate small fragments that can be detected on a chip (this avoids the random sizes of sonication problem in traditional chip-chip)

Most profound in this talk is the notion that there is some interplay between the transcriptional activation machinery and PARP1 which is known to be involved in NDA-damage response. This led to the idea that there is a DNA double strand nick that is associated with transcriptional activation caused by TopoIIb.

Keith Elliston, Genestruct
"What is the significance of Systems Biology and Regulatory Genomics to industry?"
Its clear from vioxx etc that drug companies dont know the molecular action of their drugs, this is where sys biology can help
There is a 'cognitive barrier' that so much is known that people cant get their head around it. SYstems biology is to filter al that is known to 'actionable' items that can be directly addressed.

For drug companies, novel target identification is the hardest task, but the least interesting for drug companies. It takes between 800 million and a billion dollars to make a blockbuster drug. So have to recoup costs before patent runs out..

Agrees with Lee Hood that SBs biggest effect is on biomarkers (personalized medicine). There is some talk that companies are afraid of personalized medicine because it lowers their revenue since they would get the right drug the first time. This was confirmed to some extent, that only the really visionary companies will support this. This will need a change in paradigm at the FDA since drugs will work great for a small percentage of the population and will get rejected even though they are more appropriate for that small fraction.

Mike Levine, UC Berkeley
"Genome regulatory network controlling gastrulation of the Drosophila embryo"
His group found 10 noncoding genes regulated by the dorsal gradient in fly and half are miRNAs.
orientation of the enhancer is important, if you reverse it you see disrupted effects.
Ciona (sea squirt) notochord development network mapped out using a electroporation transgene system
Most of this talk was out of my field but I found it interesting that they found that when they expand the progenitor field of early heart development in sea squirts that instead of a bigger heart, they get a more complex 2-chambered heart. In theory, this supports a mechanism for mutations to function in terms of adding complexity through mutation.

Robert Waterston, University of Washington
"Automated Gene Expression Profiling in C. elegans with Continuous Single Cell Resolution"
There are exactly 959 cells in an embyro in c. elegans. Theres a complete lineage (the path of how they divided and came to be) of cells by Sulston et al 1983. They did GFP tag of the nuclei to monitor the lineage of each cell using microscopy. Software automatically tracks the lineage of each cell.
The goal is to use this to monitor gene expression of labeled genes during development on a cell-by-cell basis.
Using RNAi they can monitor the changes in worm development via division and also reporter gene expression. I would say they are doing some pretty cool stuff, but this was mostly a work in progress talk since the method isnt fine tuned yet.

Stuart Kim, Stanford
"Systems Biology of Aging"
Coexpression from a compendium of gene expression profiles. They use a comparative approach for orthologs in different organisms to monitor coexpression. Found a cluster in c. elegans that is regulated by aging. Which started his group on the search for the regulatory program of aging.

He views aging as a process of the deterioration of robust gene networks.
There are some animals that dont age: rockfish live <200 years
Use kidney as model for aging. Kidneys get worse with age.
Used kidneys from 74 donors at different ages. 400 diff exp genes
Screened these peoples histories for 18 diseases and found that diseases were not correlated with the changes in expression.
Gave the kidneys to nephrologists to see if expression correlated with kidney age, found that the how a kidney looked was correlated with expression intensities.
then looked as age regulated genes in muscle and brain.
find that single genes arent diff exp in these three but pathways are using gene set enrichment analysis.

He hints at a master regulator of aging in worms. Do you think this is possible? He says theyve found a motif in worms that they are investigating. In all id say this was the most thought provoking talk so far even if it wasnt backed up with concrete/well formed data.

Eric Davidson, California Institute of Technology (CalTech)
"Functional properties of the gene regulatory network for early sea urchin development"
Uses BioTapestry software to visualize networks
~50 genes with knockout and expression measurements in sea urchin come up with a detailed regulatory network.

looked at network evolution between starfish and sea urchin. the cell differentiation topology is there. the majority of connections in the ~8 gene system they looked at are the same, including loops.
However there are alot of differences making the conservation of that subsystem remarkable.

Suggests the presence of network 'kernels' that drive basal development programs in organisms in this phylum since the cambrian.

David Gifford, MIT
"Embryonic Stem Cell Regulatory Networks"

This is prety much an explanation of their cell paper earlier this year. However hey shows how they use the approach of joint binding deconvolution (JBD) to discern binding to 50bp Resolution outperforms other methods. Using this they can get a positional prior for motif searching and use that as a seed for motif searching. find that it works remakably well.
Binding of PolII in stem cells. They find that those transcripts down regulated and bound by PolII are more likely to be bound by miRNAs. They use Agilent arrays and havent yet published their JBD algorithm.
Studies three transcription factors : NANOG, OCT4, SOX2. these are supposed master regulators of stem cells (control differentiation)
moving toward studying groups of similiarly differentiated sets of cells using genomic technologies.
Mentions that joint learning of motif and binding data would be a good idea
also that a binding algorithms that account for steric hinderance might be of use since the DNA is sonicated and looses structure.

Leroy Hood, Institute for Systems Biology
"An Analysis of Gene Regulatory and Protein Networks in Halobacteria"

Halobacter is involved in bio-energy and bio-remediation. Discussed how the study of halobacteria drives the development of new computational tools at ISB.

The acidic nature of Halobacter makes it easier to do ab initio protein folding (using rosetta). Does anyone know why that is?

Used this for gene annotation: bakerlab.org

Also because of the success of rosetta protein folding they started
worldcommunitygrid.org to do protein folding @home in a collaboration with IBM

They have a pipeline set up were they find co-regulated groups through biclustering using 'cMonkey' (to appear) as well as 'inferelator' (to appear) to identify the responsible TF for the regulation

Looks at disease progression in prostate/ovarian cancer by molecular fingerprints (biomarkers) in the blood. Using MPSS for sequencing.
In the future he says that in each organ there is blood that is a status reporter for the health of they system. Estimates that you need between 1000-2000 markers to get a good diagnostic. How do you read out this many markers? Hes convinced that microfluidics/nanotechnology platforms hold the key.

David Dill, SRI International

"The Pathalyzer: a tool for visualization and analysis of signal transduction pathways"

Petri Nets as a method for monitoring signal transduction pathways. This group developed software to find active pathways in a network of reactions. Encoded in Maude language layed out using Dot. A query (can gene x,y,z be activated), is known as the reachability problem, and can be solved uning 'stubborn sets' otherwise would be an NP hard problem. He mentions that Petri nets are used alot in biology. I havent seen anything like it, does anyone know of any papers using Petri nets?

Satoru Miyano, University of Toyko
"Gene Networks for Drug Target Gene Discovery"

Uses knockout/knockdowns and microarray data and bayesian network and nonparametric regression to establish gene links. Optimize parameters to an approximating function using both microarray and the DAG of the bayesian network.
Data set of 521 genes from 120 micrarrays each with about 1800 genes each. To compute the gene links uses a heck of alot of computing power! looks like ~800 processors at around $10 million.
Mentions a lack of papers on the layout of biological networks (for interpretation of results, etc)
Also has developed a generic method for incorporating other biological knowledge into network estimation. This is done by assigning a prior to links according to the Gibbs distribution. Also developed a method for incorporating p-p interactions into this framework.

When evaluating the optimal networks a single network is not biologically relevant. By compiling many optimal networks together you get biological links reinforced. I actually like this because ive heard one of the big problems in network construction is that there are many network configurations that explain the phenotypes equally well.

Did huge experiment knockdown 270 human genes by siRNA. To look at hypertension drug fenofibrate. Many of the targets recovered were known drug targets. More are being investigated with GNI.

Edda Klipp, Max Planck Institute for Molecular Genetics
"Dynamic modeling of yeast cell stress response"
Uses differential equations to define reactions in the system. Handles lack of parameters using a combination of interaction data , text mining, and parameter estimation/optimization approaches. Models the osmotic stress response pathway. The complete model is very complex and includes input flux of the glucose; it follows directed experiments well.
Also a model for the yeast cell cycle, incorporating phenotypic data such as cell size allows for the explanation of critical cell size in cell division. Also can predict mutant phenotypes and population effects.
Im not sure about her results but the jist is:
"mathematical models for cellular processes allow for a testable representation of experimental knowledge"

Aviv Regev, Harvard University
"Trees and Forests: How do molecular networks accommodate change?"
Studied the evolution of functional modules from gene expression data in various yeasts. Using a definition of modules as reciprocal best blast hits. Using co-expression from cerevisiae and pombe infer these modules in ~20 other yeast related species.
Also tries to reconstruct evolutionary history of modules by looking at the character of the whole promoter region (duplications, deletions, inversions etc).
Also work toward a general framework for module phylogeny. Module membership changes over time. Also module topology (network) changes over time. Module flux changes as members change as well.

Ilan Wapinsnki is developing method for resolution of orthologs using phyogenetic history. Look for this in the future.

Richard Karp, Berkeley
"An integrated approach to the reconstruction of molecular networks"
Mentioned the flow cytometry approach of lauffenbeger an emerging way to measure abundances accutaely in single cells. Currently can only do up to 13 proteins at once. Using this data they are trying to reconstruct the networks.
sAt the end of the talk showed evidence that the netowrk topology (arrows) are more important than simply the paramters using cilia and flagellar(?) data from previous papers. In particular with wrong topology never get to predict the correct phenotype but with the correct topology and random parameters get the proper prediciton 0.005% of the time.

Douglas Lauffenburger, MIT
"Multivariate Cue/Signal/Response Analysis of Cell Decision Processes"

Talked about interesting methods for the survey of signal transduction. Looks at temporal (dynamic) activation over short time points using protein abundance as wall as kinase activity all on a chip. They find that different phosphorylation sites are activated at different times. This is done using a novel mass spec labeling method. This means that its site-specific response that matters not the abundance as traditionally assumed. He mentions that the data set is very rich and ripe for data mining using bayesian approaches.

This is SB from the 2005 RECOMB Satellite workshop on Systems Biology and Regulatory Genomics in La Jolla, CA. This blog is expose everyone to contents of talks and emerging papers in the field of bioinformatics and systems biology. Guest reports are always welcome and encouraged!
Anyway the atmosphere is electric and bionformaitcs is in the air! More to come!

Conference web page: http://research.calit2.net/recomb-workshop05/
Live webcast: http://rpvss.ucsd.edu:8080/ramgen/encoder/archive/RECOMB.rm

Blogging the biotechnology revolution

Friday, December 30, 2005

Friday, December 16, 2005

Tuesday, December 06, 2005

Monday, December 05, 2005

Sunday, December 04, 2005

Saturday, December 03, 2005

Friday, December 02, 2005

Blogging the biotechnology revolution

Previous Posts

Archives

Links

BlogRoll

Contributors