• No results found

Mass spectrometry based proteomics is now a multidisciplinary scientific endeavor with extensive applications and far-reaching impact. This transition from a previous niche domain to one of the strongest stakeholders of post-genomics science has been propelled by technological advances in mass spectrometry instrumentation complemented by experimental and bioinformatics innovations. A simple comparison of PubMed indexed publications appearing in last 15 years (starting from 1994 when the term “proteomics” was coined) in the field of proteomics and genomics reveals that proteomics is now as widely entrenched in contemporary biomedical research as genomics (Figure 10.1). This in itself is a testimony to the power and importance of this relatively young discipline in current scientific ecosystem.

Figure 10.1 Number of publications in PubMed with title or abstract containing term “Genomics” or “Proteomics” from 1, January 1994 till 30, September 2008. The graph shows the extrapolated values for the end of the year 2008 based on the number of publications till 30, September 2008. The trend illustrates the pervasiveness of proteomics in post genomics biomedical research.

Analogous to every emerging paradigm, proteomics too has brought in its unique set of challenges that are of varying constitution - scientific, technological, experimental and computational. The very nature, scale and novelty of these challenges have attracted serious attention from the stakeholders of proteomics community. Concerted scientific and technical efforts are now underway to harness the untapped potential of proteomics.

Conclusions, challenges and perspective

154

Modern high resolution mass spectrometry instruments can produce gigabytes of data per run and a large proteomics project may employ several stages of upfront protein or peptide fractionation and consist of hundreds of runs. As proteomics endeavors become more ambitious and more comprehensive, the analytical challenges are further compounded. The high dimensionality and complexity of MS data pose novel computational, analytical and infrastructural challenges hitherto unseen by biomedical informatics researchers.

Proteomics in its current form requires extensive informatics support, therefore computational proteomics and bioinformatics have become key constituents of this field. While computational proteomics helps in extracting protein identity and quantity information from mass spectra, bioinformatics subsequently serves in discovering knowledge models, verifying hypothesis and providing biological insights. Throughout my PhD studies I have concentrated on this latter aspect of proteomics dataset analysis by bioinformatics. As proteomics progresses towards achieving the same kind of depth and comprehensiveness as genomics2,64 it cannot be overemphasized that the analysis of current proteomics datasets necessitates elaborate bioinformatics infrastructure and support. Proteomics has opened up newer vistas for bioinformatics researcher and now drives a major part of current biomedical informatics research initiatives. In that context I have adopted two approaches towards proteomics data analysis: (1) adapting functional genomics databases, tools and algorithms for obtaining insights into proteomics dataset and, (2) developing novel analysis algorithms, frameworks and workflows for proteomics dataset. Taking specific examples of typical datasets that are being currently generated in our laboratory (beginning with qualitative catalogues to quantitative multi-time course datasets), I have tried to showcase the diversity of analysis which is needed in typical proteomics experimental scenarios. In this thesis I have discussed some of the novel analytical workflows and algorithms we have developed in our group for functional analysis of high throughput proteomics data using bioinformatics algorithms, tools and databases. All of the projects discussed in this work also showcase the importance of collaborative and interdisciplinary science wherein active dialogue and synergy is required between bioinformatics and experimental biological scientists.

Conclusions, challenges and perspective

155

Current trends in proteomics data analysis also indicate that the present bioinformatics resources and approaches will not be sufficient or adequate to mine proteomics datasets, and novel algorithms and approaches will have to be developed to integrate these datasets with disparate “omics” datasets for knowledge discovery. In that direction novel data mining, analytical and visualization software needs to be developed to harness the uniqueness of such dataset. At the same time one of the biggest challenges faced by current proteomics researcher is the relative scarcity of protein centric annotational knowledgebases. Still today most of the annotational databases are “gene” centric, and while they have been of immense value to researchers they still do not meet the numerous and at times unique demands of the proteomics community. Therefore more scientific investments are required to have a unified and comprehensive proteomics database on the lines of GenBank or Ensemble.

Modern proteomics technologies and its applications span a broad spectrum of biological explorations on various levels of cellular organization. These investigations cover nearly all aspects of cellular composition and architecture including, elucidation of structural, spatial, temporal and relational constitution - at the proteome level. The availability of such data types has in turn infused vigor into the ongoing bioinformatics efforts towards assimilating this important piece of information into the broader framework of systems biology59,362,363. Future bioinformatics activities in proteomics will focus more and more on integrative systems biology, as there are still many open ended questions which can only be answered by adopting this approach. For instance, we still do not have a comprehensive understanding of how protein expression is controlled and regulated as a function of regulatory mechanisms at epigenetic, transcriptional, translational and post-translational levels215. Current debates in biomedical informatics research are replete with many such questions.

In the post-genomic era proteomics along with other “omics” disciplines provides the foundations on which future promises of systems biology will be realized and delivered. The next steps in this direction is consolidation and integration of datasets and information across different layers of the “omics” hierarchy364

(Figure 10.2), ultimately leading to physiologically exact and clinically relevant in silico models of biological processes and systems. Proteomics has

Conclusions, challenges and perspective

156

the potential of making a huge impact in this endeavor by providing comprehensive and quantitative data of constituent proteomes for the systems of interest.

Figure 10.2. The Wheel of Biological Understanding. System biology strives to understand all aspects of an organism and its environment through the combination of a variety of scientific fields (image adapted from Joanne Fox article URL: http://bioinformatics.ubc.ca/about/what_is_bioinformatics/)

One of the ultimate litmus tests for proteomics is to be able to generate data of the nature and scale which is necessary for incorporation into multi-scale simulation and modeling frameworks365. Moreover, augmentation of modeling languages is needed to incorporate proteomics datasets and results into the framework of executable cell biology366. In that context innovative experimental strategies and scalable instrumentation capabilities have to be developed so that fine grained, comprehensive and quantitative datasets are generated in the future, which are amenable for in silico modeling and execution. Recent results indicate that proteomics is well equipped to handle this challenge and is poised to transform the landscape of system biology, thereby engendering profound changes in translational research.

Bibliography

157