With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple datasources, and analyzing the data using a variety of analysis programs. A strictly related critical challenge is the fact that data sources often hold the same type of data in a number of different formats; moreover, the formats expected and generated by various data analysis services are often distinct. In Bioinformatics the data are often stored in flat files, therefore accessing them to retrieve a subset of records determined by constraints, is slower with respect to other approaches such as relational DBMS. We have developed a data Grid system, built on top of specific biological data sources in flat file format, which carries out the ingestion into a relational DBMS for data integration reducing the data redundancy present in the biological flat files. In this work, we describe the prototype for the ingestion in a relational DBMS of the Swiss-2D PAGE flat file.
Index Terms:
Grid Computing, Bioinformatics, Data Management
Citation:
Maria Mirto, Sandro Fiore, Massimo Cafaro, Marco Passante, Giovanni Aloisio, "A Grid-Based Bioinformatics Wrapper for Biological Databases," cbms, pp.191-196, 2008 21st IEEE International Symposium on Computer-Based Medical Systems, 2008