(preload) (preload)

Context of the activities

Origin of the research

A noteworthy trend in biomedical research is the increase of the scale at which studies are performed. This trend results from the emergence of high-throughput experimental approaches and from an increase of the volume of data which is made available in public or shared databanks. A complementary issue lies in the heterogeneity of both information sources (multiple and distant databases with heterogeneous formats and interfaces, files with various formats, etc.) and data (multiple scales : from population to molecule ; multiple natures : quantitative or qualitative ; multiple modes : text or image ; multiple structuring levels : database fields, mark-up structuring, free text).

An increasing need for computerised assistance to the experimental process (from experiment design to data representation and management, to information exploitation) is appearing. This assistance has to be intended for domain experts but Information Technology average users with limited Computer Science literacy, which suggests a high abstraction level. This project fits within this context and explores pluri-disciplinary approaches allowing to facilitate the acquisition and exploitation of scientific data in a knowledge acquisition perspective.

Schematically, scientific inquiry in domains such as biology or medicine, where the process is empirical, is often based on an hypothetico-deductive method which includes three important steps : identification of a problem to study, hypotheses formulation within the range of this problem and corroboration, which allows to accept or reject the hypotheses. This corroboration is generally based on experiments designed according to the hypothesis to test. Experimental data are then analysed in order to support decision making regarding the tested hypothesis.

The technological advances from the last twenty conducted to a mutation of the classical experimental process. Indeed, the experiment step relies more and more on robots, which allows both to increase the automation of the process and to perform mass treatments of more and more miniaturised samples. Examples are Tissue MicroArrays, technology which I have studied as part of my PhD Thesis, Protein MicroArrays, one of the application domains of my post-doc activities, or AlphaScreen technology associated with an automated workstation around which I'm collaborating with INSERM Unit 889.

This automation and this sample miniaturisation have impacts on the steps surrounding the experiment itself. Experiment design has to be considered as a parallelisation or a multiplexing of experiments as the were conducted beforehand, with complementary technical issues linked with the size of samples. Moreover these techniques imply an increase of the cost in time and money of the experiments. Reusing datasets from other teams or from previous experiments in a new context is becoming common practice. Computerised storage of these datasets and their availability on the Internet within public databanks increase the amount of data available to test a particular hypothesis. The analysis of the datasets becomes difficult and requires the set up of specific approaches, which are more and more computerised.

Activities

In this context, the scientific community is faced with a set of issues (experimentation assistance, data integration, data selection, data exploitation). The goal of my research project is to provide solutions to these problems, while at the same time trying to ease the acceptation of these solutions by researchers from the biomedical domain. The objective is to explore innovative approaches dedicated to the assistance to the whole experimental process, intended to scientists with no computer sciences background.

The various projects in which i was or am still involved are targetting these various problems:

These various activities lead to a set of publications and communications which are listed here.