There is a lot of research going on around the world, and that means a lot of data.
On a personal level, we’ve seen computer hard drives constantly jump into memory to keep up with all the information, bigger images, etc. Many people own an external drive with a storage capacity of 1 terabyte (terabyte) or 2 terabytes.
To show the scale of the problem, the European Bioinformatics Institute (EMBL-EBI) went from managing a volume of 40 petabytes to working with 250 petabytes in just six years. A petabyte is 1,024 TB, which is the equivalent of 256,000 1 TB drives.
The rapid development of different disciplines in biological and biomedical research fields (such as genomics, proteomics and transcriptomics) in recent decades has led to an exponential growth in the amount of available biological data.
The scientists are led by Patrick Aloy, ICREA Researcher and Head of the Laboratory of Structural Bioinformatics and Network Biology at IRB Barcelona, developed a computational tool to align, integrate and simplify this data. The result is a knowledge graph that provides information on how different biological entities relate to each other, including over 30 million functional interactions.
Bioteque works by integrating different levels of biological complexity and can report, for example, about two related genes, whether they interact physically, whether they are active in the same type of cell, and whether they are linked to the same disease. It can also predict a cell type’s sensitivity or resistance to a particular drug.
“This computational resource that we have developed is one of the first sources aimed at standardizing biological information and the only one that addresses such diversity and amount of data. It allows access, in an easy and consistent way, to practically all biological knowledge currently available, and has tremendous potential to accelerate research,” said Aloi. Biomedical.
Nearly 1,000 descriptors for 12 biological entities
The information in Bioteque has been organized into 12 types of biological entities, such as gene, disease, tissue, cell, etc. For each of these entities, the tool takes into account a series of descriptors or characteristics, for example, the pattern of mutations of a gene, the profile of physical interactions of the resulting proteins, the expression of the gene in different cell types, or its relationship to different diseases. Among the 12 biological entities, the system covers about 1,000 types of descriptors.
“We worked with information from 150 different databases, so first we had to combine it, i.e. put it all in the same ‘language.’ And then we transformed that knowledge into numerical descriptors that could be interpreted by algorithms, that way we could exploit these networks and connections computationally,” said Adria Fernandez, first author of the article and a doctoral student in the same lab.
Bioteque will be expanded periodically with new databases, as they are published. no Tool, databases and algorithms are open access.