A distributed digital repository for data integration in biomedical research

Intervenant(s) : Massimiliano Izzo

  • Langue : Anglais
  • Type d'événement : Conférence
  • Date : Donderdag 11 juli 2013
  • Horaire : 17h00
  • Durée : 20 minutes
  • Lieu : K.4.201
Fils rouges : CloudOpen Data
Public cible : Grand publicProfessionnels


Biomedical research is evolving into international multi-disciplinary collaborations based on increasing data sharing among institutions worldwide. As collaborations are moving to global scales, the heterogeneity of the collected data grows and no single standardization is plausible. We have designed and developed a digital repository with a flexible and extensible data model, in order to manage data heterogeneity and encourage information integration when different formats or platforms are used. The repository is composed by three components: a JAVA EE web application, a MySQL database to store information about patients, samples and system management and a data Grid Storage, to better manage a large number of potentially huge files (as it may be the case in neuroimaging and genomics). The data model is built on two entities: processes (corresponding to research studies) and events. A patient can be involved in one or more processes. A data type is univocally associated to a specific event type. A set of sequential events can be grouped in a process building up a hierarchical structure. A data type is described by a set of user-defined metadata that are stored as a JavaScript Object Notation (JSON) schema. A metadata body is composed by one or more metadata groups. Each group contains attributes (non recursive fields) and/or loops (recursive fields constituted by one or more attributes). Attributes are defined by an extensible set of properties. A user-friendly graphical interface has been developed to allow data type definition without dealing directly lwith JSON schemas. These are conveniently converted into web forms using dform, a jQuery plugin. When a new data is inserted, one or more files may be associated to it, and saved in the data Grid storage managed by the iRODS middleware. Metadata are saved both in the local database and on the data Grid as attribute-value-unit (AVU) triples. A flexible search interface allows users to compose queries based on metadata attributes and run them both on the database and on the Grid. Additional operations may be required depending on the event type. A set of custom actions, modeled with the ’command’ design pattern, associated to a specific data type has been implemented. Each action is composed by three methods: ’check’ (to verify if the requirements to save the data are met), ’execute’ (the operations to be done when the data is saved) and ’recovery’ (restore the previous conditions if something goes wrong when saving the data).


Massimiliano Izzo is a research assistant at the Gianninina Gaslini Children Hospital in Genoa, Italy and a PhD Student at the University of Genoa. He got a bachelor degree in Biomedical Engineering in 2003 and a master’s degree in Bioengineering in 2006. His field of research is development of web-based repositores with distributed storage and data grids for integration in biobanking, genomics and neuroscience. The work is collaboration between the Gaslini Institute (Dr. Luigi Varesio) and the Faculty of Engineering (Prof. Marco M. Fato).

Part of this work has already been presented at the NETTAB2012 conference in Como, Italy. See here for the proccedings: http://journal.embnet.org/index.php...