Related Links:
Site Search:

Fetrow Group Site
WWW

Projects





Integrated functional-site feature analysis, with application to peroxiredoxin and other redoxin proteins

In collaboration with Professor Leslie Poole (WFUBMC, Biochemistry) and Professor Freddie R. Salsbury Jr. (WFU, Physics); Funded by the NSF

1psqSequence genomics projects have produced many methods for predicting protein function based on sequence motifs, pairwise sequence alignment, or multiple sequence alignment clustering, providing information on molecular function but not insight into biological mechanism. Structural genomics projects are beginning to supply the data necessary to make general observations about biological mechanism; however, such efforts are hampered by inadequate automated processes to assemble and to characterize such data across protein families and superfamilies. We are working to cross the gap from molecular function to biological mechanism by developing and using computational sequence, structure, bioinformatics and biophysical methods to characterize the molecular function sites of six superfamilies. We use these methods to classify these superfamilies, and compare our classifications to known biological features and mechanisms, allowing progressive improvement in computational approaches and our understanding of the underlying structure/function relationships. We aim to characterize electrostatics, structure, and sequence features of each functional site, which should correlate with mechanism better than current clustering approaches based on global sequence alignment. Our long term goal is to develop an integrated method for describing functional sites that uniquely combines physics, chemistry, and structural information and to use this approach to characterize the general principles that underlie biological mechanism. This detailed analysis will yield insights into biological mechanisms, yielding hypotheses that can be experimentally tested, and ultimately will enable better methods for identifying functional sites from sequence information, thus allowing more accurate identification and classification of many protein functions. The development of these general concepts will allow for the modification of enzymes (to improve or alter their activity) and the design of enzyme inhibitors (lead compounds), an early step in pharmaceutical drug discovery.

Relevant references:

  • Fetrow, J.S. Active site profiling to identify protein functional sites in sequences and structures using the Deacon Active Site Profiler (DASP). Current Protocols in Bioinformatics 2006, Chapter 8:Unit 8 10.
  • Huff, R. G., Bayram, E., Tan, H., Knutson, S.T., Knaggs, M.H., Richon, A.B., Santago II, P., and Fetrow, J.S. Chemical and Structural Diversity in Cyclooxygenase Protein Active Sites. Chemistry and Biodiversity. 2005; 2:1533-1552.
  • Baxter, S.M., Rosenblum, J.S., Knutson, S.T., Nelson, M.R., Montimurro, J.S., Di Gennaro, J.A., Speir, J.A., Burbaum, J.J. and Fetrow, J.S. Synergistic computational and experimental proteomics approaches for more accurate detection of active serine hydrolases in yeast. Mol. Cell. Proteomics. 2004 Mar; 3(3):209-25.
  • Cammer, S.A., Hoffman, B.T., Speir, J.A., Canady, M., Nelson, M.R., Knutson, S.T., Gallina, M., Baxter, S.M., and Fetrow, J.S. Structure-based active site profiles for genome analysis and sub-family classification. J. Mol. Biol. 2003; 334(3):387-401.
  • Fetrow, J.S. and Skolnick, J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J. Mol. Biol. 1998 Sep 4; 281(5):949-968.

Back to top


Development of computational algebra and Bayesian tools for biological modeling

In collaboration with Professors David John, Edward Allen, James Norris and William Turkett (WFU); and Leslie Poole, Larry Daniel and Richard Loeser (WFUBMC); Funded by the NIH

Predicting biological networks that underlie experimental data is a major, unsolved problem in modern biology. Constructing models from time course experimental data is particularly difficult, as the number of time points is usually fewer than the number of measured genes or proteins. We are developing computational algebra and Bayesian approaches to modeling such data. Although the number of modified proteins and measured biological endpoints that respond (i.e., the number of variables) exceeds the number of time points that can be collected (i.e., the number of equations), by considering the network under various conditions and by applying game theoretic methods to multiple discretizations of the data, consensus models can be constructed. These models represent aspects of the underlying biological network, identifying dependencies between protein modifications and biological responses. This collaboration among researchers in the departments of Biochemistry, Computer Science, Mathematics, and Physics at Wake Forest University aims to develop theory, algorithms, computational tools, and research methodologies for the network modeling of time course data.

Relevant references:

  • John, D.J., Fetrow, J.S. and J.L. Norris. Metropolis-Hastings Algorithm and Continuous Regression for finding Next-State Models of Protein Modification using Information Scores. Proceedings of the 7th International Symposium of IEEE Bioinformatics and Bioengineering. 2007. Jack Y. Yang and Mary Qu Yang and Michelle M. Zhu and Yanqing Zhang and Hamid R. Arabnia and Youping Deng and Nikolaos Bourbakis, eds. p. 35-41.
  • Allen, E.E., Diao, L., Fetrow, J.S., John, D.J., Loeser, R.F. Jr., and Poole, L.B. The shuffle index and evaluation of models of signal transduction pathways. Proceedings of the 45th ACM Southeast Regional Conference, Winston-Salem, NC. March 2007, p. 250-255.
  • Allen, E.E., Fetrow, J.S., John, D.J., Pecorella A. and Turkett, W. Re-constructing networks using co-temporal functions. Proceedings of the 44th ACM Southeast Conference, (Marius Silaghi, ed), Melbourne, Florida. March 2006, 417-422.
  • Allen, E.E., Fetrow, J.S., Daniel, L.W., Thomas, S.J., John, D.J. Algebraic dependency models of protein signal transduction networks from time-series data. J. Theor. Biol. 2006 Jan 21;238(2):317-30. [Epub 2005 Jul 5]
  • Allen, E.E., Fetrow, J.S., John, D.J., Thomas, S.J. Heuristic dependency conjectures in proteomic signaling pathways. Proceedings of the 43 rd Annual Association for Computing Machinery Southeast Conference (Victor A. Clincy, ed.) Kennesaw, Georgia, March 2005.

Back to top


Modeling signaling networks and transcriptional regulatory networks in osteoarthritis

In collaboration with Professors David John, Edward Allen, William Turkett, James Norris and C. Ferguson (WFU); Xiaoyan 'Iris' Leng and Richard Loeser (WFUBMC); and C. Carison (University of Minnesota)

The long-term goal of this project is to provide a better understanding of the basic cellular and molecular mechanisms driving joint tissue destruction during the development of osteoarthritis (OA). We are utilizing a systems and computational biology approach to map the transcriptional regulatory networks that underlie development of OA in a stage-specific, whole organ, manner. By integrating this transcriptional regulatory network with publicly available information on signaling pathways and protein-protein interaction networks, we are: 1) identifying key genes and proteins that could serve as novel targets for disease modifying therapy, as well as novel stage-specific biomarkers; and 2) identifying pathways that are involved in the disease process, which will enhance our understanding of mechanism. Our approach utilizes a recently developed mouse model of osteoarthritis (destabilization of the medial meniscus). Advantages of this model include: it is biomechanical; damage to the meniscus is a common feature of human OA; it mimics the joint pathology of human OA; and it allows for collection of time course data (early, middle, and late disease stages). Furthermore, the wide availability of transgenic animals permits the future manipulation of identified pathways to test the role of candidate genes and proteins in the network that underlies the development of OA. This project brings together a team of scientists with expertise in computational biology, basic molecular and translational research in OA, surgical models of OA, and the histological evaluation of OA. We aim to provide a comprehensive picture of the OA disease process, thus providing unprecedented insight into the mechanism of that process with the future promise of discovering novel pathways and drug targets responsible for the initiation and progression of the disease.

Relevant references:

  • Loeser R.F., Olex A.L., McNulty M.A., Carlson C.S., Callahan M., Ferguson C., Chou J., Leng X. and Fetrow J.S. Microarray Analysis Reveals Age-related Differences in Gene Expression During the Development of Osteoarthritis in Mice. Arthritis & Rheumatism, accepted in Sept 2011.

Back to top


Modeling the transcriptional regulation involved in dendritic cell maturation

In collaboration with Professors David John, Edward Allen, and William Turkett (WFU); and Elizabeth (Hiltbold) Schwartz (WFUBMC)

Dendritic cells (DC) are essential to the development of protective immunity to a number of infectious pathogens. These cells alert the adaptive immune system to the presence of pathogenic invaders and activate these cells to clear infections. To stimulate such activation, however, they must undergo a process termed maturation that increases their potency. DC maturation is a tightly regulated process involving changes in gene expression, intracellular trafficking, cytoskeletal modifications, and mobilization to lymphoid organs. The gene expression network, the dynamic process of interaction among gene expression, regulatory sequences, and trans-acting factors, underlying this process is extremely important for controlling many of the observed changes. Very few studies have examined this process over a comprehensive time course and none have attempted to derive network models of this process. Our long-term goal is to understand, at a systems level, the biology that underlies DC maturation following stimulation by infectious agents. We aim to identify novel, previously undefined components of the DC maturation network and to identify cause-and-effect relationships that explain how DC maturation is controlled upon exposure to various infectious stimuli. In this project, we are assessing the dynamics of DC maturation by identifying and clustering genes that are significantly expressed during DC maturation over a comprehensive time course following treatment of DC with poly I:C as a model of viral infection. We are also identifying relationships between significantly expressed genes, thus beginning to identify networks of interactions. Ultimately, we will demonstrate that we can identify groups genes involved in subnetworks and model the resulting network neighborhoods, thus beginning to establish cause-and-effect versus correlative relationships within the gene expression network. Because DC maturation is such a pivotal event for protective immunity, a broader understanding of the gene expression program and the comprehensive transcriptional regulatory network underlying their maturation is a key to the identification of new targets for the design and development of vaccines and therapies against infectious agents.

Relevant references:

  • Olex, A.L., John, D.J., Hiltbold, E.M., and Fetrow, J.S. Additional limitations of the clustering validation method figure of merit. Proceedings of the 45th ACM Southeast Regional Conference, Winston-Salem, NC. March 2007.
  • Olex A.L., Hiltbold E.M., Leng X., and Fetrow J.S. Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates. BMC Immunology 2010, Aug 8, 11:41.

Back to top


Flavonoid signaling and pathway modeling in Arabidopsis

In collaboration with Professors Gloria Muday, Edward Allen, and William Turkett (WFU); and Brenda Winkle (Virginia Tech)

Phenylpropanoid biosynthesis is an important component of plant secondary metabolism that has been extremely well characterized at the genetic, biochemical, and molecular levels. Research interest has been spurred by the importance of the endproducts in such diverse functions as flower pigmentation, UV protection, signaling (including regulation of auxin transport), male fertility, and defense against pathogens as well as their anti-oxidant and anti-cancer properties in humans. The pathway also offers a highly tractable genetic system that is characterized by easily-identifiable (i.e., flower, seed, or leaf color), non-lethal mutations that factored into Mendel’s elucidation of heritable traits, McClintock’s work on transposable elements, and the discovery of cosuppression. Extensive molecular, biochemical, and physiological characterization of this pathway and its many branches make it an ideal system in which to begin to address fundamental questions about Arabidopsis systems biology. We are utilizing new methods for producing quantitative genomics, proteomic, and metabolomic data for identification of novel components and developing new tools for defining the relationships among those components. Recent insights into the physiological functions of the metabolic products of this pathway will allow us to place these molecular and biochemical events into a physiological context. This project is unique in attempting to collect time course gene expression, protein expression, and metabolite data and combining these comprehensive data sets to create integrated biological networks to aid in understanding of the relationships between components. The project combines modeling, theory, and experimentation to produce the outcome of systems-level understanding of the phenylpropanoid biosynthetic, transcriptional and regulatory pathways, as exemplary networks, and the biological consequences of hormonal controls of this pathway and will provide a systems level understanding of a metabolic network that synthesizes molecules that are important regulators of plant growth, development, and defense, as well as serving as important antioxidants in human diet. Understanding the controls of this pathway will provide insights into how to engineer the synthetic, signaling and regulatory pathways for both improving plant growth and facilitating production of these important compounds.

Relevant references:

  • Buer, CS, Sukumar, P, and Muday, GK (2006) Ethylene induced flavonoid synthesis modulates root gravitropism. Plant Physiology: 140: 1384-1396
  • Buer, CS, and Muday, GK (2004) The transparent testa4 mutation prevents flavonoid synthesis and alters auxin transport and the response of Arabidopsis roots to gravity and light. Plant Cell, 16: 1191-1205.
  • Brown, DE, Rashotte, AM, Murphy, AS, Normanly, J, Tague, BW, Peer , WS, Taiz, L, and Muday, GK (2001) Flavonoids act as negative regulators of auxin transport in vivo in Arabidopsis. Plant Physiol 126: 524-535

Back to top


Functional site analysis and drug discovery

In collaboration with Professors William Turkett and Fred Salsbury (WFU); Leslie Poole (WFUBMC); and Jeffrey Skolnick (SUNY Buffalo)

FFF cartoonSequence and structural genomics projects have identified and predicted molecular functions in proteins, yet researchers still cannot determine biological mechanisms of, for example, catalysis or substrate specificity or inhibitor binding, without detailed biochemical and biophysical analysis of a single protein. While structural genomics projects are providing the necessary data, they are not being used to reveal the general principles underlying biological mechanism. We are using sequence, structure, bioinformatics, and biophysical methods to characterize the molecular function sites of protein superfamilies. Our tools include fuzzy functional forms (FFFs), active site profilling (DASP), PASSS, and MEAD for electrostatic analysis. The research program focuses on the following objectives: 1) characterizing the sequence and structure of functional-site features and using the results to develop methods for clustering the peroxiredoxin family; 2) analyzing the electrostatics, including ionizable residue pKas, residues affecting these pKas, and electrostatic potential, at peroxiredoxin functional sites and testing them experimentally; 3) integrating the electrostatic, sequence and structural information to create a robust profiling method that can identify peroxiredoxin subfamilies, then making it available; and 4) using it to create active-site signatures and profiles for a well-studied and important set of protein superfamilies and making these data available. Crossing the gap from molecular function to biological mechanism requires integrating sequence, structure, and physical-chemical data. The detailed functional site analysis of protein superfamilies is yielding insights into biological mechanisms, leading to hypotheses that can be experimentally tested. In the long term, the resulting methods will enable more accurate functional site identification from sequence. The development of general concepts for identifying and classifying molecular functional-site features will advance the design of enzymes with improved, altered, or novel activity, and of inhibitors (or lead compounds), an early step in the pharmaceutical drug-discovery process. ribbon figure of protein with active sites

Relevant references:

  • Pryor, E.E., Jr. and Fetrow, J.S. PDBSQL: A Storage Engine for Macromolecular Data. Proceedings of the 45th ACM Southeast Regional Conference, Winston-Salem, NC. March 2007.
  • Huff, R. G., Bayram, E., Tan, H., Knutson, S.T., Knaggs, M.H., Richon, A.B., Santago II, P., and Fetrow, J.S. Chemical and Structural Diversity in Cyclooxygenase Protein Active Sites. Chemistry and Biodiversity. 2005. 2:1533-1552.
  • Baxter, S.M., Rosenblum, J.S., Knutson, S.T., Nelson, M.R., Montimurro, J.S., Di Gennaro, J.A., Speir, J.A., Burbaum, J.J. and Fetrow, J.S. Synergistic computational and experimental proteomics approaches for more accurate detection of active serine hydrolases in yeast. Mol Cell Proteomics. 2004 Mar;3(3):209-25.
  • Cammer, S.A., Hoffman, B.T., Speir, J.A., Canady, M., Nelson, M.R., Knutson, S.T., Gallina, M., Baxter, S.M., and Fetrow, J.S. Structure-based active site profiles for genome analysis and sub-family classification. J. Mol. Biol. 2003 Nov 28;334(3):387-401.
  • Di Gennaro, J.A., Siew, N., Hoffman, B.T., Zhang, L., Skolnick, J., Neilson, L.I., Fetrow, J.S. Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol. 2001 May-Jun;134(2-3):232-245.
  • Fetrow, J.S., Siew, N., and Skolnick, J. Structure-based functional motif identifies a potential disulfide oxidoreductase active site in the serine-threonine protein phosphatase-1 subfamily. FASEB J. 1999 Oct;13(13):1866-1874.
  • Fetrow, J.S., Godzik, A. and Skolnick, J. Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: Identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J. Mol. Biol. 1998 Oct 2;282(4):703-711.
  • Fetrow, J.S. and Skolnick, J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J. Mol. Biol. 1998 Sep 4;281(5):949-968.

Back to top


Experimental and computational analysis of the interaction networks in proteins

In collaboration with Professors Freddie R. Salsbury Jr. (WFU) and Marshall Hale Edgell (UNC-Chapel Hill)

Nonadditive effects (in which the sum of the free energy changes resulting from two single mutations do not equal the measured free energy change for the double mutant) are are common in proteins. They are the basis of a crucial functional feature of proteins, allostery, and are also associated with site pairs that are not involved in allostery. The physical basis for nonadditive effects is poorly understood and our predictive capacity, in either qualitative or quantitative terms, is marginal at best. Current generalizations are based on the analysis of a modest number of site pairs and a small number of mutations at those sites. We will develop new generalizations about the interaction network in proteins by doing thermodynamic cycle measurements with several thousand mutant proteins. This will be accomplished using previously developed high throughput mutagenesis techniques and high precision stability measurements. Another objective of this project is to identify parameters extractable from conformational ensembles generated by molecular dynamics simulations that correlate qualitatively and quantitatively with the equilibrium thermodynamic measurements. Extensive correlations between thermodynamic measurements and computer simulation parameters will be a significant step towards a capacity to predict features of the interaction network.

Relevant references:

  • Knaggs, M.H., Salsbury, F.R., Edgell, M.H., Fetrow, J.S. Insights into CheY relaxation and relaxation derived from molecular dynamics simulations. Biophys. J. [Epub ahead of print 2006 Dec 15]
  • Fetrow, J.S., Knutson, S.T. and Edgell, M.H. Mutations in α-helical solvent exposed sites of eglin c have long-range effects: evidence from molecular dynamics simulations. Proteins: Struct Funct Bioinform. 2006 May 1; 63(2):356-72. [Epub 2005 Dec 9]

Back to top


Classification and dynamics simulations of omega loops and and other protein loops

loops in cytochrome cThe regular secondary structures, alpha helices and beta strands, are easily recognizable in protein structures. The non-regular secondary structures, such as the various types of loops and turns, are less easily recognized, but no less important in the structure and function of proteins. Omega loops, a type of non-regular secondary structure first described in 1988, are segments of non-regular secondary protein structure that are six or more residues in length and are shaped so that the loop ends are close in three-dimensional space. Omega loops constitute approximately 20-23% of protein structure and have been recognized as playing a variety of roles in protein function. Additional loop types, including S-loops and strap loops, were described in the 1990s. These structures are almost always found at the protein surface and it is generally assumed that these non-regular structures are more flexible than other parts of the protein. However, in 1995, it was suggested that loops could be classified based on their roles in function, folding and stability. Recently, trigger loops were proposed to play a specific role in protein function. We hypothesize that loops playing these different roles will exhibit different dynamic characteristics. We are testing this hypothesis by performing simulations on proteins containing loops of various types that have been well-studied experimentally.

Relevant references:

  • Fetrow, J.S., Schaak, D.L., Dreher, U., Wiland, D.J., and Boose, T.L. Mutagenesis of histidine 26 demonstrates the importance of loop-loop and loop-protein attachments for the function of iso-1-cytochrome c. Protein Sci. 1998; 27(4):994-1005.
  • Fetrow, J.S., Horner, S.R., Oehrl,W., Schaak, D.L., Boose, T.L., and Burton, R.E. Analysis of the structure and stability of omega loop A replacements in yeast iso-1-cytochrome c. Protein Sci. 1997 Jan; 6(1):197-210.
  • Mulligan-Pullyblank, P. Spitzer, J.S., Gilden, B.M., and Fetrow, J.S. Loop replacement and random mutagenesis of omega loop D, residues 70-84, in iso-1-cytochrome c. J. Biol. Chem. 1996 Apr 12; 271(15):8633-8645.
  • Fetrow, J.S. Omega Loops: Nonregular secondary structures significant in protein function and stability. FASEB J. 1995 Jun; 9(9):708-717.
  • Murphy, M.E.P., Fetrow, J.S., Burton, R.E. and Brayer, G.D. The structure and function of omega loop A replacements in cytochrome c. Protein Sci. 1993; 2(9):1429-1440.
  • Fetrow, J.S., Cardillo, T.S., and Sherman, F. Deletions and replacements of omega loops in yeast iso-1-cytochrome c. Proteins. 1989; 6(4):372-381.
  • Leszczynski (Fetrow), J.F. and Rose, G.D. Loops in globular proteins: Identification of a novel category of secondary structure. Science. 1986 Nov 14; 234(4778):849-55.

Back to top


Motion and dynamics in yeast iso-1-cytochrome c

Proteins are not static structures. Rather, they exhibit many different kinds of motions on a very wide range of time scales. To better understand these motions in a well-studied protein system, we have used NMR and EPR spectroscopy to study protein motion and dynamics in yeast iso-1-cytochrome c. Beginning with the first successful isotopic labeling of cytochrome c, we have studied the psec-nsec dynamics of the protein backbone using both hydrogen exchange and 15N relaxation measurements. Site-directed spin labeling of the cysteine in the C-terminal helix of iso-1-cytochrome c provides an indication of the flexibility of the C-terminus of the protein.

Relevant references:

  • DeWeerd, K., Grigoryants, V., Sun, Y., Fetrow, J.S., Scholes, C.P. EPR-detected folding kinetics of externally located cysteine-directed spin-labeled mutants of iso-1-cytochrome c. Biochemistry. 2001 Dec 25; 40(51):15846-15855.
  • Fetrow, J.S. and Baxter, S.M. Assignment of 15N chemical shifts and 15N relaxation measurements for oxidized and reduced iso-1-cytochrome c. Biochemistry. 1999 Apr 6; 38(14):4480-4492.
  • Baxter, S.M. and Fetrow, J.S. Hydrogen exchange behavior of [U-15N]-labeled oxidized and reduced iso-1-cytochrome c. Biochemistry. 1999 Apr 6; 38(14):4493-4503.
  • Baxter, S.M., Boose, T.L., and Fetrow, J.S. 15N isotopic labeling and amide hydrogen exchange rates of oxidized iso-1-cytochrome c. J. Am. Chem. Soc. 1998; 119(41):9899-9900.
  • Qu, K. Vaughn, J.L., Sienkiewicz, A. Scholes, C.P., and Fetrow, J.S. Kinetics and motional dynamics of spin labeled yeast iso-1-cytochrome c: 1. Stopped-flow EPR as a probe for protein folding/unfolding of the C-terminal helix spin labeled at cysteine 102. Biochemistry. 1998; 36(10):2884-2897.

Back to top


Cytochrome c structure/function relationships

To better understand the role that omega loops play in proteins, we have developed methods of directed, random mutagenesis of yeast cytochrome c. Yeast cytochrome c can be analyzed for both function and structure in vivo, which makes it an ideal protein for analysis of structure-function relationships. Using directed, random mutagenesis, we mutagenized several pairs of residues in yeast cytochrome c to identify which residue pairs are consistent with structure and function of this protein in vivo.

Relevant references:

  • Fetrow, J.S., Spitzer, J.S., Gilden, B.M., Mellender, S.J., Begley, T., Haas, B., and Boose, T.L. Structure, function, and temperature sensitivity analysis of directed, random mutants of proline 76 and glycine 77 in omega loop D of yeast iso-1-cytochrome c. Biochemistry.1998; 37(8):2477-2487.
  • Fetrow, J.S., Schaak, D.L., Dreher, U., Wiland, D.J., and Boose, T.L. Mutagenesis of histidine 26 demonstrates the importance of loop-loop and loop-protein attachments for the function of iso-1-cytochrome c. Protein Sci. 1998; 27(4):994-1005.
  • Fumo, G. and Fetrow, J.S. A method of directed random mutagenesis of the yeast chromosome shows that iso-1-cytochrome c heme ligand His18 is essential. Gene. 1995 Oct 16; 164(1):33-39.

Back to top


Structural Building Blocks (SBBs) and automatic identification of protein secondary structures

1uul decamerAutomated methods for identification of secondary structure are necessary for large-scale analysis of protein structure. We developed a method for identification and classification of protein secondary structure, without previous knowledge of the types of secondary structures. This method utilizes artificial neural networks to classify and cluster segments of protein structure based on their geometry. Clustering of six-residue segments in a large group of protein structures results in the identification of six classes of secondary structures, which we term structural building blocks (SBBs). Two of these are the canonical alpha helix and beta strand structures, while two other SBBs coincide with N- and C-terminal helix capping structures.

Relevant references:

  • Fetrow, J.S., Palumbo, M.J., and Berg, G. Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme. Proteins. 1997 Feb; 27(2):249-71.
  • Zhang, X., Fetrow, J.S., Rennie, W.A., Waltz, D.L., and Berg, G. Automatic derivation of substructures yields novel structural building blocks in globular proteins. (1993) Proceedings: First International Conference on Intelligent Systems for Molecular Biology. p. 438-446. L. Hunter, D. Searls, J. Shavlik, eds.

Back to top


Integrated database of annotated protein structures

functional family analysis treeFunction OntologyThe functional and structural elucidation of proteins of unknown function is an essential challenge that spans all areas of biology. We have recently shown that the integration of de novo structure prediction with multiple sources of weak functional information can result in large numbers of functional annotations, not available via sequence based methods. We are developing a database that will present function derived from de novo structure prediction, with functional interpretation of the structures carried out automatically via the matching of the predicted structures to libraries of active site, functional site and fold descriptors. We are integrating these novel structure-derived data with several other sources of annotation (such as ontologies containing pathway, process or localization information) and systems-biology data (such as proteomics and microarray data) to present a comprehensive meta-database for exploring the function of proteins of unknown function/structure. From the perspective of modular implementation, the system will consist of three major parts: 1) the generation of structure predictions and domain organization annotations; 2) extraction of structure derived function annotation and integration with preexisting process, localization and non-specific functional information; and 3) development of the database system wtih graphical access to the database via connections to other analysis tools.

Back to top