Given the huge amount of unstructured data in bibliographic databases, but also the development of open knowledge bases, accessing the knowledge they contain require to have a global view of multiple heterogeneous sources of information. To achieve this purpose, the MIAM project aims at proposing methods which rely on Natural Language Processing and text mining but also knowledge representation and modeling, in order to aggregate those data and knowledge issued from knowledge bases, Linked Open Data, scientific articles with research results, etc. To evaluate the results of the project in a real use case, the MIAM project focuses on the interactions existing between drugs and food which might lead to an adverse drug effect. Indeed, such information is currently fragmented and scattered over heterogeneous sources. Aggregating this information will help to formalize and visualize the description of these interactions for avoiding such adverse effects.

Overall objectives

With the accumulation of knowledge in bibliographic databases, an increasing amount of unstructured data, and the development of open knowledge bases (KBs), professionals of specialized domains, as well as lay people, are facing problems for accessing, assessing and visualizing the knowledge from huge amounts of data in a reasonable time. Moreover, while knowledge artifacts, such as ontologies, terminologies and KBs, aim at recording knowledge of a given domain, they generally focus on specific types of information. Besides, the evolution and the certainty of knowledge are not recorded in such bases. Nowadays, finding connections between KBs and knowledge contained in unstructured data is crucial for obtaining a more global and comprehensive view on the links existing between complementary knowledge. The Linked Open Data initiative is a first step towards the resolution of this issue. However, it focuses on providing links between data, without trying to relate their constituting knowledge at a higher level. Thus, accessing and merging knowledge issued from these heterogeneous sources require sophisticated approaches from Natural Language Processing (NLP) and text mining (TM) communities but also in knowledge representation and modeling. The MIAM project aims to propose approaches for transforming data (unstructured data, bibliographic databases or KBs) into knowledge. As an illustration, the knowledge extracted from scientific literature, after analysis and semantic processing, must be aggregated and compared to existing KBs in order to assess its relevancy and to enrich these KBs. Thus, the project tackles specific issues of big valuable data creation, knowledge extraction at a large scale, semantic interpretation and modeling of extracted data, integration of heterogeneous and multi-sources data, as well as the use of Linked Open Data.

The use case of the project focuses on the following interactions, which might exist between drugs and food and lead to adverse drug events (ADEs): (1) decrease or suppression of a drug effect due to food; (2) increase of a drug effect; (3) occurrence of new ADEs still unknown for a drug. Indeed, prescribed medicines depend on an initial marketing authorization to guarantee the security of patients. Nevertheless, medicines can cause ADEs discovered during clinical trials, but usually later, in a pharmacovigilance context, while drugs are administered to patients. Food may have interactions with drugs, and those can lead to harmful consequences on the patient’s health and well-being. But those interactions are less known and studied (DrugBank records textual information about food/drug interactions for less than 10% of the drugs, mainly on the optimal drug intake time).

Thus, information on food/drug interactions and related ADEs is currently fragmented and scattered over heterogeneous sources. Besides, information is mainly available in English while relevant knowledge can also be found in textual data written in other languages. Finally, unstructured data, such as scientific literature, provide another source of information which is under-used regarding the objective of the MIAM project. Regarding these observations, the objective of the MIAM project is to use and mine bibliographical data and existing KBs in order to formalize the description of interactions that exist between drugs and food, and may lead to an ADE. Currently, several KBs concerning the studied elements (i.e., diseases, drugs and food) are available and can thus be exploited, while the recorded information is fragmented and scattered. Aggregating this information will help to formalize and visualize the description of food/drug interactions. The MIAM project goes further current linked Data projects, by proposing approaches for mining unstructured data but also for aggregating and presenting the available data to both the healthcare professionals and patients for a better knowledge of food/drug interactions.