Research
Research activities of the LaHDAK team are organized in 3 axes:
- Data and knowledge integration
- Automated Reasoning
- Web data management
Data and knowledge integration
Scientific coordinator: F. Saïs;
Permanent members: N. Pernelle, C. Reynaud, B. Safar, F. Saïs
Key words: data linking, knowledge extraction, ontology alignment, semantic annotation, ontology enrichment
We work on ontology-based approaches that aim to facilitate data integration.
The evaluation of the quality of the results that can be obtained by such approaches stays a challenging issue. We develop models and methods to evaluate the quality of declared (or automatically-generated) identity links between data items. Furthermore, we also investigate how linking rules can be discovered and exploited when datasets contain erroneous data, or when data are poorly described. Much of this work is done In the setting of the ANR project Qualinca, in collaboration with the Graphik group of the LIRMM (Montpellier) and the HADAS group of the LIG (Grenoble).
We also work on the problem of maintaining semantic correspondences between heterogeneous ontologies in collaboration with the research center Henri Tudor (Luxemburg). The aim is to define appropriate adaptation strategies to apply to existing mappings in order to keep their validity over time. In the continuation of the work undertaken in semantic annotation, we investigate approaches that aim to exploit semantically rich ontologies to enrich or refine annotations that are proposed by existing tools.
A crucial point is that we are interested in developping data integration methods for real-world applications. Thus, we work with data provided by industrial partners or institutions such as ABES (Agence Bibliographique de l'Enseignement Sup\'erieur), INA (Institut National de l’Audiovisuel) or the startup WEPINGO. Besides, we have obtained a grant to promote the technologies developped in this activity in the setting of the IASI-Tools project. This project aims to develop a framework to integrate these tools and make them accessible.
Automated Reasoning
Scientific coordinator: P. Dague;
Permanent members: P. Chatalic, P. Dague, Yue Ma, N. Pernelle
Associate members: M. Bienvenu, F. Goasdoué;
Key words: Reasoning in description logic, reasoning in propositional logic, model-based reasoning, diagnosis, diagnosability, distributed reasoning
Within this activity, we work on automated reasoning in propositional logic and description logic.
In the propositional setting, our work is related to diagnosis / diagnosability with an emphasis on distributed systems: from diagnosability to predictability analysis of distributed discrete-event systems; joint and complementary analysis of diagnosability and testability in distributed and concurrent systems (STIC AmSud project with VALS group) with emphasis on compositionality properties; use of (parallel) SAT solvers for diagnosis and diagnosability analysis of (distributed) systems after propositionalization (co-supervised thesis with LaBRI); advanced methods in propositional logic (BDD, SAT solvers), abductive, explanation-based and qualitative reasoning applied to metabolic paths analysis in metabolic networks (co-supervised thesis with BioInfo group and LaBRI).
Our research on description logics is mainly center on the problem of ontology-based data access and seeks to address two important challenges: scalability of query answering algorithms and robustness to inconsistencies. Specifically, we intend to develop novel querying algorithms which demonstrate improved scalability and applicability, and to perform detailed complexity analyses to explain what makes reasoning hard (or easy) and to help guide the selection of algorithms for particular applications. Much of the work is carried out within the ANR JCJC project PAGODA and involves collaborations with other project participants (LIRMM, LIG, IRISA) as well as researchers from foreign universities (Univ. of Bremen, Univ. of Liverpool, Univ. of Rome La Sapienza, Technical Univ. of Vienna).
Web data management
Scientific coordinator: B. Cautis;
Permanent members: N. Bidoit, B. Cautis, B. Groz, S. Maniu, F. Saïs, E. Waller;
Associate members: D. Colazzo, F. Goasdoué
Key words: Uncertain data, information extraction, semi-structured data, crowdsourcing, social networking, query optimization, materialized views, top-k, personalisation, updates and dynamic constraints
In this axis, we work in several key areas.
An important part of Web content and applications is created and exploited in a social fashion. Users’ needs in real personalized and social-aware search require richer data models, for instance, capturing semantic annotations over semi-structured or unstructured data, or having multiple important facets such as recency, geolocation, social context and textual relevance. We also intend to focus on discovering and continuously refining user profiles in user centric applications (including crowdsourcing). Profiles are a cornerstone of successful applications, as they help better personalize content provided to users (collaborations with Xerox RCE, LIG, and industrial partners such as Skyrock).
Semistructured data is currently experiencing a comeback which cuts across many applications and is strongly connected to the “NoSQL” and “BigData” trends. In such applications, data is heterogeneous, complex-structured, may have errors and missing information, and thus may require probabilistic models. Classic database problems such as query optimization and query answering using views must be revisited in this context. Our study will be broadened to richer models for probabilistic semi-structured data.
Concerning semantic Web data, a historic gap of expressive power and efficiency persists between database-style query query evaluation (as commonly supported by DBMSs) and query answering (as considered in the KM and AI communities). The problem of efficiently answering (as opposed to merely evaluating) queries on Semantic Web data is still open, and we work in this area. Other advanced languages and algorithms for semantic-rich semistructured data will be investigated as part of the joint Inria-UC San Diego team OakSaD (notably A. Deutsch).