Some of the projects underway:
Sequence genomics projects have produced many methods for predicting protein function based on sequence motifs, pairwise sequence alignment, or multiple sequence alignment clustering, providing information on molecular function but not insight into biological mechanism. Structural genomics projects are beginning to supply the data necessary to make general observations about biological mechanism; however, such efforts are hampered by inadequate automated processes to assemble and to characterize such data across protein families and superfamilies. We are working to cross the gap from molecular function to biological mechanism by developing and using computational sequence, structure, bioinformatics and biophysical methods to characterize the molecular function sites of six superfamilies. We use these methods to classify these superfamilies, and compare our classifications to known biological features and mechanisms, allowing progressive improvement in computational approaches and our understanding of the underlying structure/function relationships. We aim to characterize electrostatics, structure, and sequence features of each functional site, which should correlate with mechanism better than current clustering approaches based on global sequence alignment. Our long term goal is to develop an integrated method for describing functional sites that uniquely combines physics, chemistry, and structural information and to use this approach to characterize the general principles that underlie biological mechanism. This detailed analysis will yield insights into biological mechanisms, yielding hypotheses that can be experimentally tested, and ultimately will enable better methods for identifying functional sites from sequence information, thus allowing more accurate identification and classification of many protein functions. The development of these general concepts will allow for the modification of enzymes (to improve or alter their activity) and the design of enzyme inhibitors (lead compounds), an early step in pharmaceutical drug discovery.
Predicting biological networks that underlie experimental data is a major, unsolved problem in modern biology. Constructing models from time course experimental data is particularly difficult, as the number of time points is usually fewer than the number of measured genes or proteins. We are developing computational algebra and Bayesian approaches to modeling such data. Although the number of modified proteins and measured biological endpoints that respond (i.e., the number of variables) exceeds the number of time points that can be collected (i.e., the number of equations), by considering the network under various conditions and by applying game theoretic methods to multiple discretizations of the data, consensus models can be constructed. These models represent aspects of the underlying biological network, identifying dependencies between protein modifications and biological responses. This collaboration among researchers in the departments of Biochemistry, Computer Science, Mathematics, and Physics at Wake Forest University aims to develop theory, algorithms, computational tools, and research methodologies for the network modeling of time course data.
The long-term goal of this project is to provide a better understanding of the basic cellular and molecular mechanisms driving joint tissue destruction during the development of osteoarthritis (OA). We are utilizing a systems and computational biology approach to map the transcriptional regulatory networks that underlie development of OA in a stage-specific, whole organ, manner. By integrating this transcriptional regulatory network with publicly available information on signaling pathways and protein-protein interaction networks, we are: 1) identifying key genes and proteins that could serve as novel targets for disease modifying therapy, as well as novel stage-specific biomarkers; and 2) identifying pathways that are involved in the disease process, which will enhance our understanding of mechanism. Our approach utilizes a recently developed mouse model of osteoarthritis (destabilization of the medial meniscus). Advantages of this model include: it is biomechanical; damage to the meniscus is a common feature of human OA; it mimics the joint pathology of human OA; and it allows for collection of time course data (early, middle, and late disease stages). Furthermore, the wide availability of transgenic animals permits the future manipulation of identified pathways to test the role of candidate genes and proteins in the network that underlies the development of OA. This project brings together a team of scientists with expertise in computational biology, basic molecular and translational research in OA, surgical models of OA, and the histological evaluation of OA. We aim to provide a comprehensive picture of the OA disease process, thus providing unprecedented insight into the mechanism of that process with the future promise of discovering novel pathways and drug targets responsible for the initiation and progression of the disease.
Dendritic cells (DC) are essential to the development of protective immunity to a number of infectious pathogens. These cells alert the adaptive immune system to the presence of pathogenic invaders and activate these cells to clear infections. To stimulate such activation, however, they must undergo a process termed maturation that increases their potency. DC maturation is a tightly regulated process involving changes in gene expression, intracellular trafficking, cytoskeletal modifications, and mobilization to lymphoid organs. The gene expression network, the dynamic process of interaction among gene expression, regulatory sequences, and trans-acting factors, underlying this process is extremely important for controlling many of the observed changes. Very few studies have examined this process over a comprehensive time course and none have attempted to derive network models of this process. Our long-term goal is to understand, at a systems level, the biology that underlies DC maturation following stimulation by infectious agents. We aim to identify novel, previously undefined components of the DC maturation network and to identify cause-and-effect relationships that explain how DC maturation is controlled upon exposure to various infectious stimuli. In this project, we are assessing the dynamics of DC maturation by identifying and clustering genes that are significantly expressed during DC maturation over a comprehensive time course following treatment of DC with poly I:C as a model of viral infection. We are also identifying relationships between significantly expressed genes, thus beginning to identify networks of interactions. Ultimately, we will demonstrate that we can identify groups genes involved in subnetworks and model the resulting network neighborhoods, thus beginning to establish cause-and-effect versus correlative relationships within the gene expression network. Because DC maturation is such a pivotal event for protective immunity, a broader understanding of the gene expression program and the comprehensive transcriptional regulatory network underlying their maturation is a key to the identification of new targets for the design and development of vaccines and therapies against infectious agents.
Phenylpropanoid biosynthesis is an important component of plant secondary metabolism that has been extremely well characterized at the genetic, biochemical, and molecular levels. Research interest has been spurred by the importance of the endproducts in such diverse functions as flower pigmentation, UV protection, signaling (including regulation of auxin transport), male fertility, and defense against pathogens as well as their anti-oxidant and anti-cancer properties in humans. The pathway also offers a highly tractable genetic system that is characterized by easily-identifiable (i.e., flower, seed, or leaf color), non-lethal mutations that factored into Mendel’s elucidation of heritable traits, McClintock’s work on transposable elements, and the discovery of cosuppression. Extensive molecular, biochemical, and physiological characterization of this pathway and its many branches make it an ideal system in which to begin to address fundamental questions about Arabidopsis systems biology. We are utilizing new methods for producing quantitative genomics, proteomic, and metabolomic data for identification of novel components and developing new tools for defining the relationships among those components. Recent insights into the physiological functions of the metabolic products of this pathway will allow us to place these molecular and biochemical events into a physiological context. This project is unique in attempting to collect time course gene expression, protein expression, and metabolite data and combining these comprehensive data sets to create integrated biological networks to aid in understanding of the relationships between components. The project combines modeling, theory, and experimentation to produce the outcome of systems-level understanding of the phenylpropanoid biosynthetic, transcriptional and regulatory pathways, as exemplary networks, and the biological consequences of hormonal controls of this pathway and will provide a systems level understanding of a metabolic network that synthesizes molecules that are important regulators of plant growth, development, and defense, as well as serving as important antioxidants in human diet. Understanding the controls of this pathway will provide insights into how to engineer the synthetic, signaling and regulatory pathways for both improving plant growth and facilitating production of these important compounds.
Sequence and structural genomics projects have identified and predicted molecular functions in proteins, yet researchers still cannot determine biological mechanisms of, for example, catalysis or substrate specificity or inhibitor binding, without detailed biochemical and biophysical analysis of a single protein. While structural genomics projects are providing the necessary data, they are not being used to reveal the general principles underlying biological mechanism. We are using sequence, structure, bioinformatics, and biophysical methods to characterize the molecular function sites of protein superfamilies. Our tools include fuzzy functional forms (FFFs), active site profilling (DASP), PASSS, and MEAD for electrostatic analysis. The research program focuses on the following objectives: 1) characterizing the sequence and structure of functional-site features and using the results to develop methods for clustering the peroxiredoxin family; 2) analyzing the electrostatics, including ionizable residue pKas, residues affecting these pKas, and electrostatic potential, at peroxiredoxin functional sites and testing them experimentally; 3) integrating the electrostatic, sequence and structural information to create a robust profiling method that can identify peroxiredoxin subfamilies, then making it available; and 4) using it to create active-site signatures and profiles for a well-studied and important set of protein superfamilies and making these data available. Crossing the gap from molecular function to biological mechanism requires integrating sequence, structure, and physical-chemical data. The detailed functional site analysis of protein superfamilies is yielding insights into biological mechanisms, leading to hypotheses that can be experimentally tested. In the long term, the resulting methods will enable more accurate functional site identification from sequence. The development of general concepts for identifying and classifying molecular functional-site features will advance the design of enzymes with improved, altered, or novel activity, and of inhibitors (or lead compounds), an early step in the pharmaceutical drug-discovery process.
Nonadditive effects (in which the sum of the free energy changes resulting from two single mutations do not equal the measured free energy change for the double mutant) are are common in proteins. They are the basis of a crucial functional feature of proteins, allostery, and are also associated with site pairs that are not involved in allostery. The physical basis for nonadditive effects is poorly understood and our predictive capacity, in either qualitative or quantitative terms, is marginal at best. Current generalizations are based on the analysis of a modest number of site pairs and a small number of mutations at those sites. We will develop new generalizations about the interaction network in proteins by doing thermodynamic cycle measurements with several thousand mutant proteins. This will be accomplished using previously developed high throughput mutagenesis techniques and high precision stability measurements. Another objective of this project is to identify parameters extractable from conformational ensembles generated by molecular dynamics simulations that correlate qualitatively and quantitatively with the equilibrium thermodynamic measurements. Extensive correlations between thermodynamic measurements and computer simulation parameters will be a significant step towards a capacity to predict features of the interaction network.
The regular secondary structures, alpha helices and beta strands, are easily recognizable in protein structures. The non-regular secondary structures, such as the various types of loops and turns, are less easily recognized, but no less important in the structure and function of proteins. Omega loops, a type of non-regular secondary structure first described in 1988, are segments of non-regular secondary protein structure that are six or more residues in length and are shaped so that the loop ends are close in three-dimensional space. Omega loops constitute approximately 20-23% of protein structure and have been recognized as playing a variety of roles in protein function. Additional loop types, including S-loops and strap loops, were described in the 1990s. These structures are almost always found at the protein surface and it is generally assumed that these non-regular structures are more flexible than other parts of the protein. However, in 1995, it was suggested that loops could be classified based on their roles in function, folding and stability. Recently, trigger loops were proposed to play a specific role in protein function. We hypothesize that loops playing these different roles will exhibit different dynamic characteristics. We are testing this hypothesis by performing simulations on proteins containing loops of various types that have been well-studied experimentally.
Proteins are not static structures. Rather, they exhibit many different kinds of motions on a very wide range of time scales. To better understand these motions in a well-studied protein system, we have used NMR and EPR spectroscopy to study protein motion and dynamics in yeast iso-1-cytochrome c. Beginning with the first successful isotopic labeling of cytochrome c, we have studied the psec-nsec dynamics of the protein backbone using both hydrogen exchange and 15N relaxation measurements. Site-directed spin labeling of the cysteine in the C-terminal helix of iso-1-cytochrome c provides an indication of the flexibility of the C-terminus of the protein.
To better understand the role that omega loops play in proteins, we have developed methods of directed, random mutagenesis of yeast cytochrome c. Yeast cytochrome c can be analyzed for both function and structure in vivo, which makes it an ideal protein for analysis of structure-function relationships. Using directed, random mutagenesis, we mutagenized several pairs of residues in yeast cytochrome c to identify which residue pairs are consistent with structure and function of this protein in vivo.
Automated methods for identification of secondary structure are necessary for large-scale analysis of protein structure. We developed a method for identification and classification of protein secondary structure, without previous knowledge of the types of secondary structures. This method utilizes artificial neural networks to classify and cluster segments of protein structure based on their geometry. Clustering of six-residue segments in a large group of protein structures results in the identification of six classes of secondary structures, which we term structural building blocks (SBBs). Two of these are the canonical alpha helix and beta strand structures, while two other SBBs coincide with N- and C-terminal helix capping structures.
The functional and structural elucidation of proteins of unknown function is an essential challenge that spans all areas of biology. We have recently shown that the integration of de novo structure prediction with multiple sources of weak functional information can result in large numbers of functional annotations, not available via sequence based methods. We are developing a database that will present function derived from de novo structure prediction, with functional interpretation of the structures carried out automatically via the matching of the predicted structures to libraries of active site, functional site and fold descriptors. We are integrating these novel structure-derived data with several other sources of annotation (such as ontologies containing pathway, process or localization information) and systems-biology data (such as proteomics and microarray data) to present a comprehensive meta-database for exploring the function of proteins of unknown function/structure. From the perspective of modular implementation, the system will consist of three major parts: 1) the generation of structure predictions and domain organization annotations; 2) extraction of structure derived function annotation and integration with preexisting process, localization and non-specific functional information; and 3) development of the database system wtih graphical access to the database via connections to other analysis tools.