The worlds of genomics and proteomics
have a critical point of disconnection: for each gene encoded by a genome,
there can be many functionally different proteins produced. As a result,
estimates of the number of genes in e.g. the Human Genome (around 25,000)
is significantly divergent from the estimate of the number of proteins
produced (500,000+). Reconciling these vastly different numbers is a critical
challenge for both the genomics and proteomics fields. To build improved
models of the processes leading from gene to protein(s) will require understanding
how multiple proteins are encoded by a gene, how they are expressed by
the cellular machinery, and how they are differentially regulated. The
nascent field of "systems biology" will increasingly depend on establishing
an understanding of these processes.
We are addressing this challenge using mass-spectrometry data to measure proteins
expressed by a cell, and software that links those measurements back to
the genome. By making the connection directly between observed protein
and genome we can begin to understand the series of steps used by the cell
to produce a protein. This may include the identification of post-translational
modifications on a protein (e.g. phosphorylation), or determination of
the mRNA splicing pathway used in producing the protein. A major component
of this work involves the development of new software and its integration
into a pipeline for analysis and mapping of proteomic data to the genome.
We also are developing software to integrate multiple mass spec measurements
of a protein into a coherent picture of its in vivo state and production
pathway. A current project we have underway is to re-annotate the complete
human genome by incorporating the results of extensive proteomics experiments
into the gene-finding process.
Other projects in the lab include:
- Proteomic analysis of the changes that occur during a pathogen's adaptation
to antibiotic drugs. Specifically, we are using mass spectrometry to locate single
amino-acid substitutions that occur in ribosomal proteins that are responsible
for resistance to the aminoglycoside antibiotics.
- Using a new approach called "agent-based
modeling" to develop test models
of signaling pathways. We are applying it to the chemotaxis pathway in E. coli.
This pathway lets bacteria detect nutrients in the environment and modify swimming
behavior accordingly. Our approach is able to represent all the major facets
of chemotactic behavior, and has begun to lead to new insights as to how components
of the pathway operate.
- With collaborators, development of a new database approach called "ultra-structure" to
provide greater representational flexibility of complex biological data sets.