Nowadays information access remains a critical issue for scientists because of the ever rising numbers of publications, the multiplication of open access data repositories and the increasing number of technologies permitting the mass acquisition of data. A typical example could be Tissue MicroArrays (TMA) technology, which is more and more used in oncology research and allows for the mass treatment of hundreds of micro-samples on a single histological slide. However this kind of technology poses two problems:
In particular the second issue of data exploitation is a topic which is becoming classical for experimental sciences in a context where costs in time and materials for each experiment is exploding and where reuse of data generated during previous experiments or by other teams in a new context becomes common.
Reusing data however poses to researchers a real issue of data sets grasping, because the data was the results of other teams or was acquired out of the context of validation of a particular scientific hypothesis in a precisely delimited experimental area. However this preliminary understanding of the considered data set is a mandatory stage for a more advanced exploitation. Data mining tools have to be directed and this can't be done without a minimum knowledge of the data space. In the same trend, the data set can be used to pursue studies according to a more classical experimental approach, through validation of hypothesis on an extract of the data set. The researcher then has to check if available information are sufficient to validate a given hypothesis. This also goes through getting a preliminary grasp on the data.
This grasp on the data, in the perspective considered in my work, implies to solve a set of complex problems:
Given the complexity of these problems it appears an increasing need for a computerised assistance to help researchers solve them. The proposed solution is a synthesis notion, which federates the activities of information retrieval and extraction, aggregation, organisation and presentation of the data, which underline the data grasping problem. Inspired from Information Retrieval principles, this synthesis is based on an intermediary model between classical Information Retrieval and an information behaviour point of view. This model gives a central role to the goal of the mining or the hypothesis to test by defining a task-oriented Information Retrieval.
In my thesis the model underlying this synthesis concept allows for the operationalisation of information synthesis through a prototype. The prototype which has been developed is validated by case studies and an user study. It opens interesting prospects for the extension of the model or extensions towards other application domains.
The considered system has been illustrated in the medical field and in particular in the Tissue Microarray technology field. Tissue Microarray (TMA) technology is a new technique which is already frequently used in oncology research. Along with global molecular studies it allows for quick in situ visualisations of molecular targets (ADN or ARN sequences or proteins) in thousands of tissue samples.
A few external resources are worth a look around this topic: