GenoStar: A Bioinformatics Platform for Exploratory Genomics
by François Rechenmann
As a direct consequence of the spectacular progress of experimental methods and devices, the diversity and volume of biological data are steadily increasing. In this context, the GenoStar consortium has developed an integrated, interactive and easy-to-use computer platform which helps the biologists to turn these data into knowledge.
The GenoStar consortium was created at the end of 1999. Partly supported by the French ministry of research, it brings together four partners: two biotech companies, Hybrigenics (Paris) and GENOME express (Grenoble), and two research institutes, in biology, the Pasteur Institute (Paris), and in computer science, INRIA Rhône-Alpes (Grenoble). At the end of May, the consortium presented the first version of its platform, which will be made available to academic users by the end of this year.
The GenoStar platform has been designed according to a specific view of the genomic world as a complex network of biological entities and their relationships. As a very simple example, a gene and a protein can be seen as entities linked together through the 'is-coding-for' relationship. The gene is related to its chromosome, the chromosome to its species, and so on. More abstract entities can be described in the same way, such as a set of homologue genes of different species or the components of a eucaryotic gene, ie the (coding) exons and the (non-coding) introns.
GenoStar is structured into four modules: GenoCore, GenoAnnot, GenoLink and GenoBool. As indicated by its name, GenoCore is the central module that provides the various services needed by the other three application modules. It is thus in charge of the management of the entities and their relationships within a knowledge base relying on an advanced entity-relationship model; it offers persistence, query, and editing facilities.
Data, described as values of the attributes of these entities, must however be analysed with the help of adequate methods, which are provided by the application modules. Each of the three existing application modules, GenoAnnot, GenoLink and GenoBool, plays a different role in accordance with the general view of the genomic world supported by GenoStar.
GenoAnnot allows the biologist to identify regions of interest in a genomic sequence. An important example of such a region is a gene, which can be seen as a portion of the genome which contains the information required by the cell machinery to make a protein. In the present version of GenoStar, GenoAnnot provides several methods for identifying genes in procaryotic (bacterial) genomes. Clearly, each time a gene is identified, a corresponding instance of the gene class is created and adequately related to the other entities. The results of the identification methods can be displayed on one-dimensional maps of the genome (see Figure 1). The biologist can then easily compare the predictions of concurrent methods and eventually make a decision on their validation.
|
Figure 1: The results produced by several sequence analysis methods are simultaneously displayed along the genomic sequence. |
|
Figure 2: A GenoLink query is a partial network of entities and relations which is matched against the whole knowledge base. |
In GenoStar, a knowledge base is populated by entities and relations, which are either imported from external databases or produced by GenoAnnot. The GenoLink application module allows the biologist to explore this network of entities. Complex queries can be expressed as partial networks that are searched for in the network of the knowledge base (see Figure 2). Through this exploration process, the biologist may be able to infer new relations between previously unrelated entities. A typical example of such an inference is the prediction of the function of a gene (ie the function of the protein for which the gene codes), using information of other genes to which it is related by some pertinent relations.
The use of GenoAnnot and GenoLink results in the addition of instances to existing classes: GenoAnnot creates instances of entity classes, GenoLink creates instances of relationship classes. To complete the exploratory mission of the GenoStar platform, the GenoBool module provides the user with data analysis methods that can enable the identification of new classes of pertinent entities or relationships (Figure 3). GenoBool offers various ways to transform data before applying classical data analysis methods.
|
Figure 3: GenoBool is the 'data-mining' module of GenoStar. Properly encoded data are submitted to data analysis methods. |
As stated in the introduction, progress in experimental methods and tools is leading to the emergence of new types of data, which obviously require sets of adapted methods. GenoStar has been designed so that it is easy not only to add methods and strategies to the existing modules, but also to add new modules.
At the present time, the next important step is to deliver a version of GenoAnnot dedicated to the analysis of eucaryotic genomes. This is still an open problem in bioinformatics due to the existence of very large intergenic regions and to the structure of the genes, which are made up of exons and introns. This version, together with important improvements and extensions in the other modules, will allow the consortium to deliver next year a commercial version of the GenoStar platform.
Link:
http://www.genostar.org
Please contact:
François Rechenmann, INRIA
Tel: +33 4 76 61 53 65
E-mail: Francois.Rechenmann@inria.fr
|