by Thomas Lengauer
Computational Biology and Bioinformatics are terms for an interdisciplinary field joining information technology and biology that has skyrocketed in recent years. The field is located at the interface between the two scientific and technological disciplines that can be argued to drive a significant if not the dominating part of contemporary innovation. In the English language, Computational Biology refers mostly to the scientific part of the field, whereas Bioinformatics addresses more the infrastructure part. In other languages (eg German) Bioinformatics covers both aspects of the field.
The goal of this field is to provide computer-based methods for coping with and interpreting the genomic data that are being uncovered in large volumes within the diverse genome sequencing projects and other new experimental technology in molecular biology. The field presents one of the grand challenges of our times. It has a large basic research aspect, since we cannot claim to be close to understanding biological systems on an organism or even cellular level. At the same time, the field is faced with a strong demand for immediate solutions, because the genomic data that are being uncovered encode many biological insights whose deciphering can be the basis for dramatic scientific and economical success. With the pre-genomic era that was characterized by the effort to sequence the human genome just being completed, we are entering the post-genomic era that concentrates on harvesting the fruits hidden in the genomic text. In contrast to the pre-genomic era which, from the announcement of the quest to sequence the human genome to its completion, has lasted less than 15 years, the post-genomic era can be expected to last much longer, probably extending over several generations.
At the basis of the scientific grand challenge in computational biology there are problems in computational biology such as identifying genes in DNA sequences and determining the three-dimensional structure of proteins given the protein sequence (the famed protein folding problem). Other unsolved mysteries include the computational estimation of free energies of biomolecules and molecular complexes in aqueous solution as well as the modeling and simulation of molecular interaction networks inside the cell and between cells. Solving these problems is essential for an accurate and effective analysis of disease processes by computer.
Besides these more timeless scientific problems, there is a significant part of computational biology that is driven by new experimental data provided through the dramatic progress in molecular biology techniques. Starting with genomic sequences, the past few years have provided gene expression data on the basis of ESTs (expressed sequence tags) and DNA microarrays (DNA chips). These data have given rise to a very active new subfield of computational biology called expression data analysis. These data go beyond a generic view on the genome and are able to distiniguish between gene populations in different tissues of the same organism and in different states of cells belonging to the same tissue. For the first time, this affords a cell-wide view of the metabolic and regulatory processes under different conditions. Therefore these data are believed to be an effective basis for new diagnoses and therapies of diseases.
Eventually genes are transformed into proteins inside the cell, and it is mostly the proteins that govern cellular processes. Often proteins are modified after their synthesis. Therefore, a cell-wide analysis of the population of mature proteins is expected to correlate much more closely with cellular processes than the expressed genes that are measured today. The emerging field of proteomics addresses the analysis of the protein population inside the cell. Technologies such as 2D gels and mass spectrometry offer glimpses into the world of mature proteins and their molecular interactions.
Finally, we are stepping beyond analyzing generic genomes and are asking what genetic differences between individuals of a species are the key for predisposition to certain diseases and effectivity of special drugs. These questions join the fields of molecular biology, genetics, and pharmacy in what is commonly named pharmacogenomics.
Pharmaceutical industry was the first branch of the economy to strongly engage in the new technology combining high-throughput experimentation with bioinformatics analysis. Medicine is following closely. Medical applications step beyond trying to find new drugs on the basis of genomic data. The aim here is to develop more effective diagnostic techniques and to optimize therapies. The first steps to engage computational biology in this quest have already been taken.
While driven by the biological and medical demand, computational biology will also exert a strong impact onto information technology. Since, due to their complexity, we are not able to simulate biological processes on the basis of first principles, we resort to statistical learning and data mining techniques, methods that are at the heart of modern information technology. The mysterious encoding that Nature has afforded for biological signals as well as the enormous data volume present large challenges and are continuing to have large impact on the processes of information technology themselves.
In this theme section, we present 15 scientific progress reports on various aspects of computational biology. We begin with two review papers, one from the biological and one from the pharmaceutical perspective. In three further articles we present progress on solving classical grand challenge problems in computational biology. A section of five papers deals with projects addressing computational biology problems pertaining to current problems in the field. In a section with three papers we discuss medical applications. The last two papers concentrate on the role of information technology contributions, specifically, algorithms and visualization.
This theme section witnesses the activity and dynamics that the field of computational biology and bioinformatics enjoys not only among biologists but also among computer scientists. It is the intensive interdisciplinary cooperation between these two scientific communities that is the motor of progress in this key-technology for the 21st century.
Please contact:
Thomas Lengauer - GMD
Tel: +49 2241 14 2777
E-mail: Thomas.Lengauer@gmd.de