Bibliography of François Goasdoué

Publications for 2024

Query Optimization for Ontology-Mediated Query Answering by Wafaa El Husseini, Cheikh-Brahim El Vaigh, François Goasdoué and Hélène Jaudoin. The ACM Web Conference (WWW), 2024.

Ontology-mediated query answering (OMQA) consists in asking database queries on knowledge bases (KBs); a KB is a set of facts called the KB's database, which is described by domain knowledge called the KB's ontology. A widely-investigated OMQA technique is FO-rewriting: every query asked on a KB is reformulated w.r.t. the KB's ontology, so that its answers are computed by the relational evaluation of the query reformulation on the KB's database. Crucially, because FO-rewriting compiles the domain knowledge relevant to queries into their reformulations, query reformulations may be complex and their optimization is the crux of efficiency. We devise a novel optimization framework for a large set of OMQA settings that enjoy FO-rewriting: conjunctive queries, i.e., the core select-project-join queries, asked on KBs expressed in datalog+/- and existential rules, description logic and OWL, or RDF/S. We optimize the query reformulations produced by state-of-the-art FO-rewriting algorithms by computing rapidly, with the help of a KB's database summary, simpler (contained) queries with same answers that can be evaluated faster by RDBMSs. We show on a well-established OMQA benchmark that time performance is significantly improved by our optimization framework in general, up to three orders of magnitude.

Publications for 2023

OptiRef: Query Optimization for Knowledge Bases by Wafaa El Husseini, Cheikh-Brahim El Vaigh, François Goasdoué and Hélène Jaudoin. Demonstration paper. ACM The Web Conference (WWW), 2023.

Ontology-mediated query answering (OMQA) consists in asking database queries on a knowledge base (KB); a KB is a set of facts, the KB's database, described by domain knowledge, the KB's ontology. FOL-rewritability is the main OMQA technique: it reformulates a query w.r.t. the KB's ontology so that the evaluation of the reformulated query on the KB's database computes the correct answers. However, because this technique embeds the domain knowledge relevant to the query into the reformulated query, a reformulated query may be complex and its optimization is the crux of efficiency. We showcase OptiRef that implements a novel, general optimization framework for efficient query answering on datalog+/-, description logic, existential rules, OWL and RDF/S KBs. OptiRef optimizes reformulated queries by rapidly computing, based on a KB's database summary, simpler (contained) queries with the same answers. We demonstrate OptiRef's effectiveness on well-established benchmarks: performance is significantly improved in general, up to several orders of magnitude in the best cases!

Publications for 2022

Gestion de données efficace dans les bases de connaissances by Wafaa El Husseini, Cheikh Brahim El Vaigh, François Goasdoué, Hélène Jaudoin. BDA "Gestion de Données – Principes, Technologies et Applications", 2022.
OptiRef : optimisation pour la gestion de données dans les bases de connaissances by Wafaa El Husseini, Cheikh Brahim El Vaigh, François Goasdoué, Hélène Jaudoin. BDA "Gestion de Données – Principes, Technologies et Applications", 2022.
Towards Speeding Up Graph-Relational Queries in RDBMSs by Angelos Christos Anadiotis, François Goasdoué, Mhd Yamen Haddad, Ioana Manolescu. BDA "Gestion de Données – Principes, Technologies et Applications", 2022.

Publications for 2021

Towards Faster Reformulation-based Query Answering on RDF graphs with RDFS ontologies by Maxime Buron, Cheikh-Brahim El Vaigh and François Goasdoué. International Semantic Web Conference (ISWC), 2021.

Answering queries on RDF knowledge bases is a crucial data management task, usually performed through either graph saturation or query reformulation. In this short paper, we optimize our recent state-of-the-art query reformulation technique for RDF graphs with RDFS ontologies, and we report on preliminary encouraging experiments showing performance improvement by up to two orders of magnitudes!
Ontology-mediated query answering: performance challenges and advances by Wafaa El Husseini, Cheikh-Brahim El Vaigh, François Goasdoué and Hélène Jaudoin. Demonstration paper. International Semantic Web Conference (ISWC), 2021.

Ontology-mediated query answering (OMQA) is a recent data management trend in the Artificial Intelligence, Database and Semantic Web areas, which aims at answering database queries on knowledge bases. Because it is an intricate combination of automated reasoning and database query evaluation, it raises major performance challenges. In this demonstration, we showcase a decade of OMQA optimization to understand ``Where do we stand now and how did we get there?'' and we highlight a promising new OMQA optimization that brings further significant performance improvement to discuss ``What's next?''.
A Well-founded Graph-based Summarization Framework for Description Logics by Cheikh-Brahim El Vaigh and François Goasdoué. International workshop on Description Logics (DL), 2021.

The quotient operation from graph theory offers an elegant graph summarization framework that has been widely investigated in the literature, notably for the exploration and efficient management of large graphs; it consists in fusing equivalent vertices according to an equivalence relation. In this paper, we study whether a similar operation may be used to summarize description logic (DL) databases, i.e., ABoxes. Towards this goal, we define and examine the quotient operation on an ABox: we establish that a quotient ABox is more specific than the ABox it summarizes, and characterize to which extent it is more specific. This preliminary investigation validates the interest of a quotient-based ABox summarization framework, and paves the way for further studies on it in the DL setting, e.g., to devise equivalence relations suited to the optimization of typical DL data management and reasoning tasks on large ABoxes or to the visualization of large ABoxes, and on its utilization in related settings, e.g., Semantic Web.
Special issue on Data Management - Principles, Technologies and Applications (BDA'20) - of Transactions on Large-Scale Data and Knowledge-Centered Systems (TLDKS journal) by Bernd Amann and François Goasdoué (guest editors), Springer, 2021.

Publications for 2020

RDF graph summarization for first-sight structure discovery by François Goasdoué, Pawel Guzewicz and Ioana Manolescu. VLDB Journal (VLDBJ), Volume 29, Number 5, pages 1191-1218, 2020.

To help users get familiar with large RDF graphs, RDF summarization techniques can be used. In this work, we study quotient summaries of RDF graphs, that is: graph summaries derived from a notion of equivalence among RDF graph nodes. We make the following contributions: (i) four novel summaries which are often small and easy-to-comprehend, in the style of E-R diagrams; (ii) efficient (amortized linear-time) algorithms for computing these summaries either from scratch, or incrementally, reflecting additions to the graph; (iii) the first formal study of the interplay between RDF graph saturation in the presence of an RDFS ontology, and summarization; we provide a sufficient condition for a highly efficient shortcut method to build the quotient summary of a graph without saturating it; (iv) formal results establishing the shortcut conditions for some of our summaries and others from the literature; (v) experimental validations of our claim within a tool available online.
Ontology-Based RDF Integration of Heterogeneous Data by Maxime Buron, François Goasdoué, Ioana Manolescu and Marie-Laure Mugnier. International Conference on Extending Database Technology (EDBT), 2020.

The proliferation of heterogeneous data sources in many application contexts brings an urgent need for expressive and efficient data integration mechanisms. There are strong advantages to using RDF graphs as the integration format: being schemaless, they allow for flexible integration of data from heterogeneous sources; RDF graphs can be interpreted with the help of an ontology, describing application semantics; last but not least, RDF enables joint querying of the data and the ontology. To address this need, we formalize RDF Integration Systems (RIS), Ontology Based-Data Access mediators, that go beyond the state of the art in the ability to expose, integrate and flexibly query data from heterogeneous sources through GLAV (global-local-as-view) mappings. We devise several query answering strategies, based on an innovative integration of LAV view-based rewriting and a form of mapping saturation. Our experiments show that one of these strategies brings strong performance advantages, resulting from a balanced use of mapping saturation and query reformulation.
Obi-Wan: Ontology-Based RDF Integration of Heterogeneous Data by Maxime Buron, François Goasdoué, Ioana Manolescu and Marie-Laure Mugnier. Demonstration paper. Proceedings of the Very Large Data Bases Endowment (PVLDB), 2020.

The proliferation of digital data sources in many domains brings a new urgency to the need for tools which allow to flexibly query heterogeneous data (relational, JSON, key-values, graphs etc.) Traditional data integration systems fall into two classes: data warehousing, where all data source content is materialized in a single repository, and mediation, where data remains in their original stores and all data can be queried through a mediator. We propose to demonstrate Obi-Wan, a novel mediator following the Ontology-Based Data access (OBDA) paradigm. Obi-Wan integrates data sources of many data models under an interface based on RDF graphs and ontologies (classes, properties, and relations between them). The novelty of Obi-Wan is to combine maximum integration power (GLAV mappings, see below) with the highest query answering power supported by an RDF mediator: RDF queries not only over the data but also over the integration ontologies. This makes it more flexible and powerful than comparable systems.
A Novel Path-based Entity Relatedness Measure for Efficient Collective Entity Linking by Cheikh Brahim El Vaigh, François Goasdoué, Guillaume Gravier and Pascale Sébillot. International Semantic Web Conference (ISWC), 2020.

Collective entity linking is a core natural language processing task, which consists in jointly identifying the entities of a knowledge base (KB) that are mentioned in a text exploiting existing relations between entities within the KB. State-of-the-art methods typically combine local scores accounting for the similarity between mentions and entities, with a global score measuring the coherence of the set of selected entities. The latter relies on the structure of a KB: the hyperlink graph of Wikipedia in most cases or the graph of an RDF KB, e.g., BaseKB or Yago, to benefit from the precise semantics of relationships between entities. In this paper, we devise a novel RDF-based entity relatedness measure for global scores with important properties: (i) it has a clear semantics, (ii) it can be calculated at reasonable computational cost, and (iii) it accounts for the transitive aspects of entity relatedness through existing (bounded length) property paths between entities in an RDF KB. Further, we experimentally show on the TAC-KBP2017 dataset, both with BaseKB and Yago, that it provides significant improvement over state-of-the-art entity relatedness measures for the collective entity linking task.
Revisiting RDF storage layouts for efficient query answering by Maxime Buron, François Goasdoué, Ioana Manolescu and Marie-Laure Mugnier. International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS), in conjunction with International Semantic Web Conference (ISWC), 2020.

The performance of query answering in an RDF database strongly depends on the data layout, that is, the way data is split in persistent data structures. We consider answering Basic Graph Pattern Queries (BGPQs), and in particular those with variables (also) in class and property positions, in the presence of RDFS ontologies, both through data saturation and query reformulation. We show that such demanding queries often lead to inefficient query answering on two popular storage layouts, so-called T and CP. We present novel query answering algorithms on the TCP layout, which combines T and CP. In exchange to occupying more storage space, e.g. on an inexpensive disk, TCP avoids the bad or even catastrophic performance that T and/or CP sometimes exhibit. We also introduce summary-based pruning, a novel technique based on existing RDF quotient summaries, which improves query answering performance on the T, CP and the more robust TCP layouts.

Publications for 2019

Summarizing Semantic Graphs: A Survey by Sejla Cebiric, François Goasdoué, Haridimos Kondylakis, Dimitris Kotzinos, Ioana Manolescu, Georgia Troullinou and Mussab Zneika. VLDB Journal (VLDBJ), Volume 28, Number 3, pages 295-327, 2019 (accepted in 2018).

The explosion in the amount of the available RDF data has lead to the need to explore, query and understand such data sources. Due to the complex structure of RDF graphs and their heterogeneity, the exploration and understanding tasks are significantly harder than in relational databases, where the schema can serve as a first step toward understanding the structure. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; each is better suited for some uses, and each presents specific challenges with respect to its construction. This survey is the first to provide a comprehensive survey of summarization method for semantic RDF graphs. We propose a taxonomy of existing works in this area, including also some closely related works developed prior to the adoption of RDF in the data management community; we present the concepts at the core of each approach and outline their main technical aspects and implementation. We hope the survey will help readers understand this scientifically rich area, and identify the most pertinent summarization method for a variety of usage scenarios.
Computing and Explaining Query Answers over Inconsistent DL-Lite Knowledge Bases by Meghyn Bienvenu, Camille Bourgaux and François Goasdoué. Journal of Artificial Intelligence Research (JAIR), Volume 64, pages 563-644, 2019 (accepted in 2018).

Several inconsistency-tolerant semantics have been introduced for querying inconsistent description logic knowledge bases. The first contribution of this paper is a practical approach for computing the query answers under three well-known such semantics, namely the AR, IAR and brave semantics, in the lightweight description logic DL-liteR. We show that query answering under the intractable AR semantics can be performed efficiently by using IAR and brave semantics as tractable approximations and encoding the AR entailment problem as a propositional satisfiability (SAT) problem. The second issue tackled in this work is explaining why a tuple is a (non-)answer to a query under these semantics. We define explanations for positive and negative answers under the brave, AR and IAR semantics. We then study the computational properties of explanations in DL-liteR. For each type of explanation, we analyze the data complexity of recognizing (preferred) explanations and deciding if a given assertion is relevant or necessary. We establish tight connections between intractable explanation problems and variants of SAT, enabling us to generate explanations by exploiting solvers for Boolean satisfaction and optimization problems. Finally, we empirically study the efficiency of our query answering and explanation framework using a benchmark we built upon the well-established LUBM benchmark.
Reformulation-based query answering for RDF graphs with RDFS ontologies by Maxime Buron, François Goasdoué, Ioana Manolescu and Marie-Laure Mugnier. Extended Semantic Web Conference (ESWC), 2019. Also selected for presentation at Bases de Données Avancées (BDA), in the "Recently published papers" track, 2019.

Query answering in RDF knowledge bases has traditionally been performed either through graph saturation, i.e., adding all implicit triples to the graph, or through query reformulation, i.e., modifying the query to look for the explicit triples entailing precisely what the original query asks for. The most expressive fragment of RDF for which reformulation-based quey answering exists is the so-called database fragment, in which implicit triples are restricted to those entailed using an RDFS ontology. Within this fragment, query answering was so far limited to the interrogation of data triples (non-RDFS ones); however, a powerful feature specific to RDF is the ability to query data and schema triples together. In this paper, we address the general query answering problem by reducing it, through a pre-query reformulation step, to that solved by a query reformulation technique from the literature. We also report on experiments demonstrating the low cost of our reformulation algorithm.
Incremental structural summarization of RDF graphs by François Goasdoué, Pawel Guzewicz and Ioana Manolescu. Demonstration paper. International Conference on Extending Database Technology (EDBT), 2019.

Realizing the full potential of Linked Open Data sharing and reuse is currently limited by the difficulty users have when trying to understand the data modeled within an RDF graph, in order to determine whether or not it may be useful for their need. We propose to demonstrate our RDFQuotient tool, which builds compact summaries of heterogeneous RDF graphs for the purpose of first-sight visualizations. An RDFQuotient summary provides an overview of the complete structure of an RDF graph, while being typically many orders of magnitude smaller, thus can be easily grasped by new users. Our summarization algorithms are time linear in the size of the input graph and incremental: they incrementally update a summary upon addition of new data. For the demo, we plan to show the visualizations of our summaries obtained from well-known synthetic and real data sets. Further, attendees will be able to add data to the summarized RDF graphs and visually witness the incurred changes.
A Linked Data Model for Facts, Statements and Beliefs by Ludivine Duroyon, François Goasdoué and Ioana Manolescu. International Workshop on Misinformation, Computational Fact-Checking and Credible Web, in conjunction with The Web Conference (WWW), 2019.

A frequent journalistic fact-checking scenario is concerned with the analysis of statements made by individuals, whether in public or in private contexts, and the propagation of information and hearsay (``who said/knew what when''). Inspired by our collaboration with fact-checking journalists from Le Monde, France's leading newspaper, we describe here a Linked Data (RDF) model, endowed with formal foundations and semantics, for describing facts, statements, and beliefs. Our model combines temporal and belief dimensions to trace propagation of knowledge between agents along time, and can answer a large variety of interesting questions through RDF query evaluation. A preliminary feasibility study of our model incarnated in a corpus of tweets demonstrates its practical interest.
BeLink: Querying Networks of Facts, Statements and Beliefs by Tien-Duc Cao, Ludivine Duroyon, François Goasdoué, Ioana Manolescu and Xavier Tannier. Demonstration paper. ACM Conference on Information and Knowledge Management (CIKM), 2019. Also selected for demonstration at Bases de Données Avancées (BDA), 2019.

An important class of journalistic fact-checking scenarios involves verifying the claims and knowledge of different actors at different moments in time. Claims may be about facts, or about other claims, leading to chains of hearsay. We have recently proposed a data model for (time-anchored) facts, statements and beliefs. It builds upon the W3C's RDF standard for Linked Open Data to describe connections between agents and their statements, and to trace information propagation as agents communicate. We propose to demonstrate BeLink, a prototype capable of storing such interconnected corpora, and answer powerful queries over them relying on SPARQL 1.1. The demo will showcase the exploration of a rich real-data corpus built from Twitter and mainstream media, and interconnected through extraction of statements with their sources, time, and topics.
Using Knowledge Base Semantics in Context-Aware Entity Linking by Cheikh Brahim El Vaigh, François Goasdoué, Guillaume Gravier and Pascale Sébillot. ACM Symposium on Document Engineering (DocEng), 2019.

Entity linking is a core task in textual document processing, which consists in identifying the entities of a knowledge base (KB) that are mentioned in a text. Approaches in the literature consider either independent linking of individual mentions or collective linking of all mentions. Regardless of this distinction, most approaches rely on the Wikipedia encyclopedic KB in order to improve the linking quality, by exploiting its entity descriptions (web pages) or its entity interconnections (hyperlink graph of web pages). In this paper, we devise a novel collective linking technique which departs from most approaches in the literature by relying on a structured RDF KB. This allows exploiting the semantics of the interrelationships that candidate entities may have at disambiguation time rather than relying on raw structural approximation based on Wikipedia's hyperlink graph. The few approaches that also use an RDF KB simply rely on the existence of a relation between the candidate entities to which mentions may be linked. Instead, we weight such relations based on the RDF KB structure and propose an efficient decoding strategy for collective linking. Experiments on standard benchmarks show significant improvement over the state of the art.
SHAMAN: Symbolic and Human-centric view of dAta MANagement (in French) by François Goasdoué. Bulletin AFIA (the newsletter of the French Association for Artificial Intelligence), July 2019.

Publications for 2018

Browsing Linked Data Catalogs with LODAtlas by Emmanuel Pietriga, Hande Gözükan, Caroline Appert, Marie Destandau; Šejla Čebirić, François Goasdoué and Ioana Manolescu. International Semantic Web Conference (ISWC), 2018.

The Web of Data is growing fast, as exemplified by the evolution of the Linked Open Data (LOD) cloud over the last ten years. One of the consequences of this growth is that it is becoming increasingly difficult for application developers and end-users to find the datasets that would be relevant to them. Semantic Web search engines, open data catalogs, datasets and frameworks such as LODStats and LOD Laundromat, are all useful but only give partial, even if complementary, views on what datasets are available on the Web. We introduce LODAtlas, a portal that enables users to find datasets of interest. Users can make different types of queries about both the datasets' metadata and contents, aggregated from multiple sources. They can then quickly evaluate the matching datasets' relevance, thanks to LODAtlas' summary visualizations of their general metadata, connections and contents.

Publications for 2017

Learning Commonalities in SPARQL by Sara El Hassad, François Goasdoué and Hélène Jaudoin. International Semantic Web Conference (ISWC), 2017. Also selected for presentation at Bases de Données Avancées (BDA), in the "Recently published papers" track, 2017.

Finding the commonalities between descriptions of data or knowledge is a foundational reasoning problem of Machine Learning. It was formalized in the early 70's as computing a least general generalization (lgg) of such descriptions. We revisit this well-established problem in the SPARQL query language for RDF graphs. In particular, and by contrast to the literature, we address it for the entire class of conjunctive SPARQL queries, aka Basic Graph Pattern Queries (BGPQs), and crucially, when background knowledge is available as RDF Schema ontological constraints, we take advantage of it to devise much more precise lggs, as our experiments on the popular DBpedia dataset show.
A Framework for Efficient Representative Summarization of RDF Graphs by Sejla Cebiric, François Goasdoué and Ioana Manolescu. Poster paper. International Semantic Web Conference (ISWC), 2017.

RDF is the data model of choice for Semantic Web applications. RDF graphs are often large and have heterogeneous, complex structure. Graph summaries are compact structures computed from the input graph; they are typically used to simplify users' experience and to speed up graph processing. We introduce a formal RDF summarization framework, based on graph quotients and RDF node equivalence; our framework can be instantiated with many such equivalence relations. We show that our summaries represent the structure and semantics of the input graph, and establish a sufficient condition on the RDF equivalence relation which ensures that a graph can be summarized more efficiently, without materializing its implicit triples.
Learning Commonalities in RDF by Sara El Hassad, François Goasdoué and Hélène Jaudoin. Extended Semantic Web Conference (ESWC), 2017. Also selected for presentation at Inductive Logic Programming (ILP), in the "Recently published papers" track, 2017.

Finding the commonalities between descriptions of data or knowledge is a foundational reasoning problem of Machine Learning introduced in the 70's, which amounts to computing a least general generalization (lgg) of such descriptions. It has also started receiving consideration in Knowledge Representation from the 90's, and recently in the Semantic Web field. We revisit this problem in the popular Resource Description Framework (RDF) of W3C, where descriptions are RDF graphs, ie a mix of data and knowledge. Notably, and in contrast to the literature, our solution to this problem holds for the entire RDF standard, ie we do not restrict RDF graphs in any way (neither their structure nor their semantics based on RDF entailment, ie inference) and, further, our algorithms can compute lggs of small-to-huge RDF graphs.
Towards Learning Commonalities in SPARQL by Sara El Hassad, François Goasdoué and Hélène Jaudoin. Poster paper. Extended Semantic Web Conference (ESWC), 2017.

Finding the commonalities between descriptions of data or knowledge is a foundational reasoning problem of Machine Learning, which amounts to computing a least general generalization (lgg) of such descriptions. We revisit this old problem in the popular conjunctive fragment of SPARQL, aka Basic Graph Pattern Queries (BGPQs). In particular, we define this problem in all its generality by considering general BGPQs, while the literature considers unary tree-shaped BGPQs only. Further, when ontological knowledge is available as RDF Schema constraints, we take advantage of it to devise much more pregnant lggs.
Approche algébrique de la manipulation des données by François Goasdoué and Virginie Thion. Article in the book "Les Big Data à découvert", CNRS éditions, 2017.

Publications for 2016

Teaching an RDBMS about ontological constraints by Damian Bursztyn, François Goasdoué and Ioana Manolescu. Proceedings of the Very Large Data Bases Endowment (PVLDB), 2016. Also in Description Logics (DL), as an Extended Abstract, and in Journées Bases de Données Avancées (BDA), 2015.

In the presence of an ontology, query answers must reflect not only data explicitly present in the database, but also implicit data, which holds due to the ontology, even though it is not present in the database. A large and useful set of ontology languages enjoys FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. We present a novel query optimization framework for ontology-based data access settings enjoying FOL reducibility. Our framework is based on searching within a set of alternative equivalent FOL queries, i.e., FOL reformulations, one with minimal evaluation cost when evaluated through a relational database system. We apply this framework to the DL-liteR Description Logic underpinning the W3C's OWL2 QL ontology language, and demonstrate through experiments its performance benefits when two leading SQL systems, one open-source and one commercial, are used for evaluating the FOL query reformulations.
Mixed-instance querying: a lightweight integration architecture for data journalism by Raphael Bonaque, Tien-Duc Cao, Bogdan Cautis, François Goasdoué, Javier Letelier, Ioana Manolescu, Oscar Mendoza, Swen Ribeiro, Xavier Tannier and Michael Thomazo. Demonstration paper. Proceedings of the Very Large Data Bases Endowment (PVLDB), 2016. Also in Journées Bases de Données Avancées (BDA), 2016.

As the world's affairs get increasingly more digital, timely production and consumption of news require to efficiently and quickly exploiting heterogeneous data sources. Discussions with journalists from major newspapers revealed that content management tools currently at their disposal fall very short of expectations. We demonstrate Tatooine, a lightweight data integration prototype, that allows to quickly set up integration query across (very) heterogeneous data sources, capitalizing on the many data links (joins) available in this application domain. Our demonstration is based on scenarios we currently study in collaboration with Le Monde, France's major newspaper.
Query-driven Repairing of Inconsistent DL-Lite Knowledge Bases by Meghyn Bienvenu, Camille Bourgaux and François Goasdoué. International Joint Conference on Artificial Intelligence (IJCAI), 2016. Also in Description Logics (DL), as an Extended Abstract, 2016.

We consider the problem of query-driven repairing of inconsistent DL-Lite knowledge bases: query answers are computed under inconsistency-tolerant semantics, and the user provides feedback about which answers are erroneous or missing. The aim is to find a set of ABox modifications (deletions and additions), called a repair plan, that addresses as many of the defects as possible. After formalizing this problem and introducing different notions of optimality, we investigate the computational complexity of reasoning about optimal repair plans and propose interactive algorithms for computing such plans. For deletion-only repair plans, we also present a prototype implementation of the core components of the algorithm.
Explaining Inconsistency-Tolerant Query Answering over Description Logic Knowledge Bases by Meghyn Bienvenu, Camille Bourgaux and François Goasdoué. AAAI Conference on Artificial Intelligence (AAAI), 2016. Also in Description Logics (DL), as an Extended Abstract, 2015.

Several inconsistency-tolerant semantics have been introduced for querying inconsistent description logic knowledge bases. This paper addresses the problem of explaining why a tuple is a (non-)answer to a query under such semantics. We define explanations for positive and negative answers under the brave, AR and IAR semantics. We then study the computational properties of explanations in the lightweight description logic DL-liteR. For each type of explanation, we analyze the data complexity of recognizing (preferred) explanations and deciding if a given assertion is relevant or necessary. We establish tight connections between intractable explanation problems and variants of propositional satisfiability (SAT), enabling us to generate explanations by exploiting solvers for Boolean satisfaction and optimization problems. Finally, we empirically study the efficiency of our explanation framework using the well-established LUBM benchmark.
Social, Structured and Semantic Search by Raphaël Bonaque, Bogdan Cautis, François Goasdoué and Ioana Manolescu. International Conference on Extending Database Technology (EDBT), 2016. Also in Journées Bases de Données Avancées (BDA), 2016.

Social content such as blogs, tweets, news etc is a rich source of interconnected information. We identify a set of requirements for the meaningful exploitation of such rich content, and present a new data model, called S3, which is the first to satisfy them. S3 captures social relationships between users, and between users and content, but also the structure present in rich social content, as well as its semantics. We provide the first top-k keyword search algorithm taking into account the social, structured, and semantic dimensions and formally establish its termination and correctness. Experiments on real social networks demonstrate the efficiency and qualitative advantage of our algorithm through the joint exploitation of the social, structured, and semantic dimensions of S3.
Optimizing FOL reducible query answering: understanding performance challenges by Damian Bursztyn, François Goasdoué and Ioana Manolescu. Demonstration paper. International Semantic Web Conference (ISWC), 2016.

Semantic Web data management raises the challenge of answering queries under constraints (i.e., in the presence of implicit data). To bridge the gap between this extended setting and that of query evaluation provided by database engines, a reasoning step (wrt the constraints) is necessary before query evaluation. A large and useful set of ontology languages enjoys FOL reducibility of query answering: queries can be answered by evaluating a SQLized first-order logic (FOL) formula (obtained from the query and the ontology) directly against the explicitly stored data (i.e., without considering the ontological constraints). Our demonstration showcases to the attendees, and analyzes, the performance of several reformulation-based query answering techniques applied to the lightweight description logic DL-liteR underpinning the W3C's OWL2 QL profile.

Publications for 2015

RDF Data Management: Reasoning on Web Data by Damian Bursztyn, François Goasdoué, Ioana Manolescu and Alexandra Roatiş. Tutorial. IEEE International Conference on Data Engineering (ICDE), 2015.
CliqueSquare: Flat Plans for Massively Parallel RDF Queries by François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané-Ruiz and Stamatis Zampetakis. IEEE International Conference on Data Engineering (ICDE), 2015.

As increasing volumes of RDF data are being produced and analyzed, many massively distributed architectures have been proposed for storing and querying this data. These architectures are characterized first, by their RDF partitioning and storage method, and second, by their approach for distributed query optimization, i.e., determining which operations to execute on each node in order to compute the query answers. We present CliqueSquare, a novel optimization approach for evaluating conjunctive RDF queries in a massively parallel environment. We focus on reducing query response time, and thus seek to build flat plans, where the number of joins encountered on a root-to-leaf path in the plan is minimized. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. We have deployed our algorithms in a MapReduce-based RDF platform and demonstrate experimentally the interest of the flat plans built by our best algorithms.
CliqueSquare in Action: Flat Plans for Massively Parallel RDF Queries by Benjamin Djahandideh, François Goasdoué, Zoi Kaoudi, Ioana Manolescu, Jorge Quiané-Ruiz and Stamatis Zampetakis. Demonstration paper. IEEE International Conference on Data Engineering (ICDE), 2015.

RDF is an increasingly popular data model for many practical applications, leading to large volumes of RDF data; efficient RDF data management methods are crucial to allow applications to scale. We propose to demonstrate CliqueSquare, an RDF data management system built on top of a MapReduce-like infrastructure. The main technical novelty of CliqueSquare resides in its logical query optimization algorithm, guaranteed to find a logical plan as flat as possible for a given query, meaning: a plan having the smallest possible number of join operators on top of each other. CliqueSquare's ability to build flat plans allows it to take advantage of a parallel processing framework in order to shorten response times. We demonstrate loading and querying the data, with a particular focus on query optimization, and on the performance benefits of CliqueSquare's flat plans.
Optimizing Reformulation-based Query Answering in RDF by Damian Bursztyn, François Goasdoué and Ioana Manolescu. International Conference on Extending Database Technology (EDBT), 2015. Also in Journées Bases de Données Avancées (BDA), 2015.

Reformulation-based query answering is a query processing technique aiming at answering queries under constraints. It consists of reformulating the query based on the constraints, so that evaluating the reformulated query directly against the data (i.e., without considering any more the constraints) produces the correct answer set. In this paper, we consider optimizing reformulation-based query answering in the setting of ontology-based data access, where SPARQL conjunctive queries are posed against RDF facts on which constraints expressed by an RDF Schema hold. The literature provides query reformulation algorithms for many fragments of RDF. However, reformulated queries may be complex, thus may not be efficiently processed by a query engine; well established query engines even fail processing them in some cases. Our contribution is (i) to generalize prior query reformulation languages, leading to investigating a space of reformulated queries we call JUCQs (joins of unions of conjunctive queries), instead of a single reformulation; and (ii) an effective and efficient cost-based algorithm for selecting from this space, the reformulated query with the lowest estimated cost. Our experiments show that our technique enables reformulation-based query answering where the state-of-the-art approaches are simply unfeasible, while it may decrease its cost by orders of magnitude in other cases.
Query-Oriented Summarization of RDF Graphs by Sejla Cebiric, François Goasdoué and Ioana Manolescu. Demonstration paper. Proceedings of the Very Large Data Bases Endowment (PVLDB), 2015. Also in Journées Bases de Données Avancées (BDA), 2015.

The Resource Description Framework (RDF) is the W3C's graph data model for Semantic Web applications. We study the problem of RDF graph summarization: given an input RDF graph G, find an RDF graph G’ which summarizes G as accurately as possible, while being possibly orders of magnitude smaller than the original graph. Our summaries are aimed as a help for query formulation and optimization; in particular, querying a summary of a graph should reflect whether the query has some answers against this graph. We introduce two summaries: a baseline which is compact and simple and satisfies certain accuracy and representativeness properties, but may oversimplify the RDF graph, and a refined one which trades some of these properties for more accuracy in representing the structure. The demonstration will allow the audience to compute such summaries out of a large variety of datasets, and explore their usage for data exploration and query optimization.
Reformulation-based query answering in RDF: alternatives and performance by Damian Bursztyn, François Goasdoué and Ioana Manolescu. Demonstration paper. Proceedings of the Very Large Data Bases Endowment (PVLDB), 2015. Also in Journées Bases de Données Avancées (BDA), 2015.

Answering queries over Semantic Web data, RDF graphs, must account for both explicit data and implicit data, entailed by the explicit data and the semantic constraints holding on them. Two main query answering techniques have been devised, namely Saturation-based (SAT) which precomputes and adds to the graph all implicit information, and Reformulation}-based (REF) which reformulates the query based on the graph constraints, so that evaluating the reformulated query directly against the explicit data (ie without considering the constraints) produces the query answer. While SAT is well known, REF has received less attention so far. In particular, reformulated queries often perform poorly if the query is complex. Our demonstration showcases a large set of REF techniques, including but not limited to one we proposed recently. The audience will be able to 1. test them against different datasets, constraints and queries, as well as different well-established systems, 2. analyze and understand the performance challenges they raise, and 3. alter the scenarios to visualize the impact on performance. In particular, we show how a cost-based REF approach allows avoiding reformulation performance pitfalls.
Query-Oriented Summarization of RDF Graphs by Sejla Cebiric, François Goasdoué and Ioana Manolescu. Work in progress paper. British International Conference on Databases (BICOD), 2015.

The Resource Description Framework (RDF) is the W3C's graph data model for Semantic Web applications. We study the problem of RDF graph summarization: given an input RDF graph G, find an RDF graph G' which summarizes G as accurately as possible, while being possibly orders of magnitude smaller than the original graph. Our approach is query-oriented, ie querying a summary of a graph should reflect whether the query has some answers against this graph. The summaries are aimed as a help for query formulation and optimization. We introduce two summaries: a baseline which is compact and simple and satisfies certain accuracy and representativeness properties, but may oversimplify the RDF graph, and a refined one which trades some of these properties for more accuracy in representing the structure.
Efficient OLAP operations for RDF analytics by Elham Akbari Azirani, François Goasdoué, Ioana Manolescu, Alexandra Roatis. IEEE ICDE Workshops, 2015.

RDF is the leading data model for the Semantic Web, and dedicated query languages such as SPARQL 1.1, featuring in particular aggregation, allow extracting information from RDF graphs. A framework for analytical processing of RDF data was introduced in the literature, where analytical schemas and analytical queries (cubes) are fully re-designed for heterogeneous, semantic-rich RDF graphs. In this novel analytical setting, we consider the following optimization problem: how to reuse the materialized result of a given RDF analytical query (cube) in order to compute the answer to another cube. We provide view-based rewriting algorithms for these cube transformations, and demonstrate experimentally their practical interest.

Publications for 2014

Querying Inconsistent Description Logic Knowledge Bases under Preferred Repair Semantics by Meghyn Bienvenu, Camille Bourgaux, and François Goasdoué. AAAI Conference on Artificial Intelligence (AAAI), 2014. Also in Description Logics (DL), as an Extended Abstract, 2014.

Recently several inconsistency-tolerant semantics have been introduced for querying inconsistent description logic knowledge bases. Most of these semantics rely on the notion of a repair, defined as an inclusion-maximal subset of the facts (ABox) which is consistent with the ontology (TBox). In this paper, we study variants of two popular inconsistency-tolerant semantics obtained by replacing classical repairs by various types of preferred repair. We analyze the complexity of query answering under the resulting semantics, focusing on the lightweight logic DL-Lite_R. Unsurprisingly, query answering is intractable in all cases, but we nonetheless identify one notion of preferred repair, based upon priority levels, whose data complexity is 'only' coNP-complete. This leads us to propose an approach combining incomplete tractable methods with calls to a SAT solver. An experimental evaluation of the approach shows good scalability on realistic cases.
RDF Analytics: Lenses over Semantic Graphs by Dario Colazzo, François Goasdoué, Ioana Manolescu and Alexandra Roatiş. International World Wide Web conference (WWW), 2014.

The development of Semantic Web (RDF) brings new requirements for data analytics tools and methods, going beyond querying to semantics-rich analytics through warehouse-style tools. In this work, we fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data, leading to the first complete formal framework for warehouse-style RDF analytics. Notably, we define (i) analytical schemas tailored to heterogeneous, semantics-rich RDF graph, (ii) analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and (iii) OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach.
SPARQL Query Processing in the Cloud by Francesca Bugiotti, Jesús Camacho-Rodríguez, François Goasdoué, Zoi Kaoudi, Ioana Manolescu, and Stamatis Zampetakis. Book chapter in Linked Data Management, Chapman and Hall/CRC, Emerging Directions in Database Systems and Applications, 2014.

Cloud computing has been massively adopted recently in many applications for its elastic scaling and fault-tolerance. At the same time, given that the amount of available RDF data sources on the Web increases rapidly, there is a constant need for scalable RDF data management tools. In this chapter, we start by providing a compact survey of the existing works which have aimed at storing and querying large volumes of RDF data in a cloud. The second part of the chapter focuses on our specific work in this context, namely, an architecture for storing RDF data within the Amazon Web Services (AWS) cloud. Since in a cloud environment, the total consumption of storage and computing resources translates into monetary costs, reducing these costs is an objective potentially just as important as reducing response time. We present the architecture we devised to store, index and query RDF data within the AWS cloud, by exploiting the various services it provides. At the core of our proposed architecture are RDF indexing strategies which allow to guide queries directly to a (hopefully tight) superset of the RDF datasets which may provide answers to a given query, thus reducing the total work entailed by query execution. We provide a set of experiments validating the interest and performance of this architecture. This second part is based on our previous publication, however, the platform and write-up have undergone significant changes and extensions since.
Toward Social, Structured and Semantic Search by Raphaël Bonaque, Bogdan Cautis, François Goasdoué and Ioana Manolescu. Surfacing the Deep and the Social Web (SDSW), ISWC workshop, 2014.

Social content such as social network posts, tweets, news articles and more generally web page fragments is often structured. Such social content is also frequently enriched with annotations, most of which carry semantics, either by collaborative effort or from automatic tools. Searching for relevant information in this context is both a basic feature for the users and a challenging task. We present a data model and a preliminary approach for answering queries over such structured, social and semantic-rich content, taking into account all dimensions of the data in order to return the most meaningful results.
Fact Checking and Analyzing the Web with FactMinder by François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu and Stamatis Zampetakis. Computation + Journalism Symposium, 2014.

Fact checking and data journalism are currently strong trends. The sheer amount of data at hand makes it difficult even for trained professionals to spot biased, outdated or simply incorrect information. We present FactMinder, a fact checking and analysis assistance application targeted at journalists. FactMinder is built on the top of XR, a platform for management semantics annotations on semi-structured documents. It enables its users to analyze documents and experience how background knowledge and open data repositories help build insightful overviews of current topics.

Publications for 2013

Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents by François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, and Stamatis Zampetakis. Very Large Data Bases Journal (VLDBJ), 2013.

Since the beginning of the Semantic Web initiative, significant efforts have been invested in finding efficient ways to publish, store and query metadata on the Web. RDF and SPARQL have become the standard data model and query language, respectively, to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured (typically XML) documents. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. We propose XR, a novel hybrid data model capturing the structural aspects of XML data and the semantics of RDF, also enabling us to reason about XML data. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. This data model comes with the XRQ query language that combines features of both XQuery and SPARQL. To demonstrate the feasibility of this hybrid XML-RDF data management setting, and to validate its interest, we have developed an XR platform on top of well-known data management systems for XML and RDF. In particular, the platform features several XRQ query processing algorithms, whose performance is experimentally compared.
Robust Module-based Data Management by François Goasdoué and Marie-Christine Rousset. IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, num. 3, 2013.

The current trend for building an ontology-based data management system (DMS) is to capitalize on efforts made to design a preexisting well-established DMS (a reference system). The method amounts to extracting from the reference DMS a piece of schema relevant to the new application needs -- a module --, possibly personalizing it with extra-constraints wrt the application under construction, and then managing a dataset using the resulting schema.
In this paper, we extend the existing definitions of modules and we introduce novel properties of robustness that provide means for checking easily that a robust module-based DMS evolves safely wrt both the schema and the data of the reference DMS. We carry out our investigations in the setting of description logics which underlie modern ontology languages, like RDFS, OWL, and OWL2 from W3C. Notably, we focus on the DL-lite_A dialect of the DL-lite family, which encompasses the foundations of the QL profile of OWL2 (ie DL-lite_R): the W3C recommendation for efficiently managing large datasets.
Efficient Query Answering against Dynamic RDF Databases by François Goasdoué, Ioana Manolescu, and Alexandra Roatiş. International Conference on Extending Database Technology (EDBT), 2013.

A promising method for efficiently querying RDF data consists of translating SPARQL queries into efficient RDBMS-style operations. However, answering SPARQL queries requires handling RDF reasoning, which must be implemented outside the relational engines that do not support it. We introduce the database (DB) fragment of RDF, going beyond the expressive power of previously studied RDF fragments. We devise novel sound and complete techniques for answering Basic Graph Pattern (BGP) queries within the DB fragment of RDF, exploring the two established approaches for handling RDF semantics, namely reformulation and saturation. In particular, we focus on handling database updates within each approach and propose a method for incrementally maintaining the saturation; updates raise specific difficulties due to the rich RDF semantics. Our techniques are designed to be deployed on top of any RDBMS(-style) engine, and we experimentally study their performance trade-offs.
Fact checking and analyzing the Web by François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, and Stamatis Zampetakis. Demonstration paper. ACM Special Interest Group on Management Of Data (SIGMOD), 2013

Fact checking and data journalism are currently strong trends. The sheer amount of data at hand makes it difficult even for trained professionals to spot biased, outdated or simply incorrect information. We propose to demonstrate FactMinder, a fact checking and analysis assistance application. SIGMOD attendees will be able to analyze documents using FactMinder, and experience how background knowledge and open data repositories help build insightful overviews of current topics.

Publications for 2012

Getting more RDF support from relational databases by François Goasdoué, Ioana Manolescu, and Alexandra Roatiş (poster paper). International World Wide Web conference (WWW), 2012.

We introduce the database fragment of RDF, which extends the popular Description Logic fragment, notably with support for incomplete information. We then provide novel sound and complete saturation- and reformulation-based techniques for answering the Basic Graph Pattern queries of SPARQL in this fragment. We extend the state of the art on pushing RDF query processing within robust / efficient relational database management systems. Finally, we experimentally compare our query answering techniques using well-established datasets.
RDF Data Management in the Amazon Cloud by Francesca Bugiotti, François Goasdoué, Zoi Kaoudi, and Ioana Manolescu. EDBT/ICDT Workshop on Data analytics in the Cloud (DanaC), 2012.

Cloud computing has been massively adopted recently in many applications for its elastic scaling and fault-tolerance. At the same time, given that the amount of available RDF data sources on the Web increases rapidly, there is a constant need for scalable RDF data management tools. In this paper we propose a novel architecture for the distributed management of RDF data, exploiting an existing commercial cloud infrastructure, namely Amazon Web Services (AWS). We study the problem of indexing RDF data stored within AWS, by using SimpleDB, a key-value store provided by AWS for small data items. The goal of the index is to efficiently identify the RDF datasets which may have answers for a given query, and route the query only to those. We devised and experimented with several indexing strategies; we discuss experimental results and avenues for future work.
Répondre aux requêtes par reformulation dans les bases de données RDF par François Goasdoué, Ioana Manolescu, and Alexandra Roatiş. Reconnaissance des Formes et Intelligence Artificielle (RFIA), 2012.

Dans RDF, répondre aux requêtes repose soit sur la saturation des données, soit sur la reformulation des requêtes. L'idée des deux techniques est de découpler la notion d'entailment - le mécanisme de raisonnement à partir duquel les réponses aux requêtes sont définies - de l'évaluation de requêtes. Dans cet article, nous étendons l'état de l'art en proposant une technique de réponse aux requêtes par reformulation pour un fragment de RDF plus significatif et un langage de requêtes plus expressif que ceux étudiés dans la littérature. Nous comparons ensuite expérimentalement cette nouvelle technique avec la technique standard fondée sur la saturation de données.
AMADA: Web Data Repositories in the Amazon Cloud by Andrés Aranda-Andújar, Francesca Bugiotti, Jesús Camacho-Rodríguez, Dario Colazzo, François Goasdoué, Zoi Kaoudi, and Ioana Manolescu. Demonstration paper. ACM Conference on Information and Knowledge Management (CIKM), 2012.

We present AMADA, a platform for storing Web data (in particular, XML documents and RDF graphs) based on the Amazon Web Services (AWS) cloud infrastructure. AMADA operates in a Software as a Service (SaaS) approach, allowing users to upload, index, store, and query large volumes of Web data. The demonstration shows (i) the step-by-step procedure for building and exploiting the warehouse (storing, indexing, querying) and (ii) the monitoring tools enabling one to control the expenses (monetary costs) charged by AWS for the operations involved while running AMADA.
Des triplets sur des arbres. Un modèle hybride XML-RDF pour documents annotés par François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, and Stamatis Zampetakis. Ingénierie des Systèmes d'Information (ISI), 2012.

Une énergie considérable est consacrée à l’enrichissement sémantique de données XML du web via des annotations. Celles-ci vont de simples métadonnées jusqu’à des relations sémantiques complexes entre données. Bien que l’idée d’utiliser des annotations soit de plus en plus largement partagée, il reste à définir l’architecture qui permettra de la mettre en oeuvre. Dans ce but, nous présentons un framework permettant le stockage et l’interrogation de documents annotés. Nous introduisons (i) le modèle de données XR, dans lequel les documents annotés sont des documents XML décrits sémantiquement par des triplets RDF, et (ii) le langage de requêtes XRQ permettant d’interroger les documents annotés par leur structure et leur sémantique. Un premier prototype de plate-forme de gestion de documents annotés, nommé XRP, a été développé, afin de montrer la pertinence de notre approche par une série d’expériences.
Knowledge Representation meets DataBases for the sake of ontology-based data management par François Goasdoué. HDR thesis (Habilitation à Diriger des Recherches), Univ. Paris-Sud, 2012.

This Habilitation thesis outlines my research activities carried out as an Associate Professor at Univ. Paris-Sud and Inria Saclay Île-de-France. During this period, from 2003 to early 2012, my work was - and still is - at the interface between Knowledge Representation and Databases. I have mainly focused on ontology-based data management using the Semantic Web data models promoted by W3C: the Resource Description Framework (RDF) and the Web Ontology Language (OWL). In particular, my work has covered (i) the design, (ii) the optimization, and (iii) the decentralization of ontology-based data management techniques in these data models. This thesis briefly reports on the results obtained along these lines of research.

Publications for 2011

View Selection in Semantic Web Databases by François Goasdoué, Konstantinos Karanasos, Julien Leblay, and Ioana Manolescu. Proceedings of the VLDB Endowment (PVLDB), vol. 5, num. 2, 2011/2012.

We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.
Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents by François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, and Stamatis Zampetakis. VLDB Workshop on Very Large Data Search (VLDS) et Journées de bases de données avancées (BDA), 2011.

Content on today's Web is typically document-structured and richly connected; XML is by now widely adopted to represent Web data. Moreover, the vision of a computer-understandable Web relies on Web (and real world) resources described by simple properties having names or values; URIs are the normative method of identifying resources and RDF (the Resource Description Framework) enjoys important traction as a way to encode such statements. We present XR, a carefully designed hybrid model between XML and RDF, for describing RDF-annotated XML documents. XR follows and combines the W3C's XML, URI and RDF standards by assigning URIs to all XML nodes and enabling these URIs to appear in RDF statements. The XR management platform thus provides the capabilities to create and handle interconnected XML and RDF content. We define the XR data model, its query language, and present preliminary results with a prototype implementation.
RDFViewS: A Storage Tuning Wizard for RDF Applications by François Goasdoué, Konstantinos Karanasos, Julien Leblay, and Ioana Manolescu. Démonstration à Journées de bases de données avancées (BDA), 2011.

L'émergence du Web sémantique et la multiplication des applications qui y sont liées nous conduisent à rechercher les moyens d’interroger efficacement de grands volumes de données RDF. Dans cette démonstration, nous présentons RDFViewS, un système permettant de trouver automatiquement le meilleur ensemble de vues à matérialiser, pour un ensemble de requêtes SPARQL donné. La solution doit minimiser conjointement le temps d'évaluation des requêtes, le coût de maintenance des vues et l'espace qu'elles occupent. L'algorithme sur lequel repose notre système explore un espace d'états au moyen de stratégies et d’heuritiques, en quête d'une configuration optimale. Ce faisant, il tient compte d'éventuels schémas RDFS accompagnant les données pour garantir la complétude du résultat des requêtes.

Publications for 2010

RDFViewS: A Storage Tuning Wizard for RDF Applications by François Goasdoué, Konstantinos Karanasos, Julien Leblay, and Ioana Manolescu. ACM Conference on Information and Knowledge Management (CIKM), 2010.

In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of poten- tial view configurations, and exploits the possibly available semantic information - expressed via an RDF Schema - to ensure the completeness of the query evaluation.
Traitement de requêtes RDF fondé sur des vues matérialisées par François Goasdoué, Konstantinos Karanasos, Julien Leblay et Ioana Manolescu. Journées de bases de données avancées (BDA), 2010.
Modules sémantiques robustes pour une réutilisation saine en DL-lite par François Goasdoué et Marie-Christine Rousset. Journées de bases de données avancées (BDA), 2010.

L'extraction de modules à partir d’ontologies a été récemment étudié dans le cadre des logiques de description, qui sont au coeur des langages modernes d’ontologies. Dans cet article, nous définissons une nouvelle notion de modules sémantiques capturant à la fois les modules obtenus par extraction d'un sous-ensemble d'une Tbox ou par "forgetting" de concepts et de roles d’une Tbox. Nous définissons et étudions ensuite la réutilisation saine d'un module sémantique d'une Tbox globale afin de construire des Aboxes locales et de les interroger soit indépendamment, soit de facon conjointe avec la Abox globale. Afin que la Abox locale (associée au module) et que la Abox globale (associée à la Tbox initiale) puissent évoluer indépendamment, mais de manière cohérente, nous généralisons la notion d’extension conservative de requete et nous l'étendons au test de consistance. Enfin, nous fournissons des algorithmes et des résultats de complexité pour le calcul de modules sémantiques minimaux et robustes dans DL-liteF et DL-liteR. Ces dialectes sont membres de la famille DL-lite qui a été spécialement définie pour l’interrogation efficace de grandes masses de données.
Gestion décentralisée de données en DL-LITE par Nada Abdallah, François Goasdoué et Marie-Christine Rousset. Reconnaissance des Formes et Intelligence Artificielle (RFIA), 2010.

Cet article propose un modèle décentralisé de données et les algorithmes associés afin de mettre en oeuvre des systèmes pair-à-pair de gestion de données (PDMS) fondés sur la logique de description DL-liteR. Cette logique est un fragment - ayant de bonnes propriétés théoriques et pratiques - de la prochaine recommandation du W3C pour le Web Sémantique : OWL2. Notre approche consiste à réduire la reformulation de requêtes et le test de consistance des données par rapport à une ontologie à un raisonnement en logique propositionnelle. Ceci permet de déployer de manière simple des PDMS pour DL-liteR "au-dessus" de SomeWhere, un système pair-à-pair d'inférence pour la logique propositionnelle passant à l'échelle du millier de pairs. Nous montrons aussi comment répondre à des requêtes à l'aide de vues - des requêtes prédéfinies - dans DL-liteR pour les cas centralisés et décentralisés, en combinant l'algorithme de reformulation de DL-liteR et l'algorithme MiniCon.

Publications for 2009

Non-conservative Extension of a Peer in a P2P Inference System by Nada Abdallah and François Goasdoué. AI Communications: The European Journal on Artificial Intelligence, volume 22, number 4, pages 211-233, 2009.

This paper points out that the notion of non-conservative extension of a knowledge base (KB) is important to the distributed logical setting of peer-to-peer inference systems (P2PIS), a.k.a. peer-to-peer semantic systems. It is useful to a peer in order to detect/prevent that a P2PIS corrupts (part of) its knowledge or to learn more about its own application domain from the P2PIS. That notion is all the more important since it has connections with the privacy of a peer within a P2PIS and with the quality of service provided by a P2PIS. We therefore study the following tightly related problems from both the theoretical and decentralized algorithmic perspectives: (i) deciding whether a P2PIS is a conservative extension of a given peer and (ii) computing the witnesses to the corruption of a given peer's KB within a P2PIS so that we can forbid it. We consider here scalable P2PISs that have already proved useful to Artificial Intelligence and DataBases.
DL-liteR in the Light of Propositional Logic for Decentralized Data Management by Nada Abdallah, François Goasdoué, and Marie-Christine Rousset. International Joint Conference on Artificial Intelligence (IJCAI), pages 2010-2015, 2009.

This paper provides a decentralized data model and associated algorithms for peer data management systems (PDMS) based on the DL-liteR description logic. Our approach relies on reducing query reformulation and consistency checking for DL-liteR into reasoning in propositional logic. This enables a straightforward deployment of DL-liteR PDMSs on top of SomeWhere, a scalable propositional peer-to-peer inference system. We also show how to use the state-of-the-art Minicon algorithm for rewriting queries using views in DL-liteR in the centralized and decentralized cases.
Numéro spécial Web sémantique de la revue Technique et Science Informatiques (TSI), édité par François Goasdoué et Alain Léger aux éditions Hermès-Lavoisier, volume 28, février 2009.

Publications for 2008

Semantic Web Take-Off in a European Industry Perspective by Alain Léger, Johannes Heinecke, Lyndon J.B. Nixon, Pavel Shvaiko, Jean Charlet, Paola Hobson, and François Goasdoué. Book chapter in Semantic Web for Business: Cases and Applications, Idea Group Inc., pages 1-29, 2008.

Semantic Web technology is being increasingly applied in a large spectrum of applications in which domain knowledge is conceptualized and formalized (e.g., by means of an ontology) in order to support diversified and automated knowledge processing (e.g., reasoning) performed by a machine. Moreover, through an optimal combination of (cognitive) human reasoning and (automated) machine processing (mimicking reasoning); it becomes possible for humans and machines to share more and more complementary tasks. The spectrum of applications is extremely large and to name a few: corporate portals and knowledge management, e-commerce, e-work, e-business, healthcare, e-government, natural language understanding and automated translation, information search, data and services integration, social networks and collaborative filtering, knowledge mining, business intelligence and so on. From a social and economic perspective, this emerging technology should contribute to growth in economic wealth, but it must also show clear cut value for everyday activities through technological transparency and efficiency. The penetration of Semantic Web technology in industry and in services is progressing slowly but accelerating as new success stories are reported. In this chapter we present ongoing work in the cross-fertilization between industry and academia. In particular, we present a collection of application fields and use cases from enterprises which are interested in the promises of Semantic Web technology.
WebContent: Efficient P2P Warehousing of Web Data by Serge Abiteboul, Tristan Allard, Philippe Chatalic, Georges Gardarin, Anca Ghitescu, François Goasdoué, Ioana Manolescu, Benjamin Nguyen, Mohamed Ouazara, Aditya Somani, Nicolas Travers, Gabriel Vasile, and Spyros Zoupanos. Demonstration paper in Very Large Data Bases (VLDB), pages 1428-1431, 2008.

We present the WebContent platform for managing distributed repositories of XML and semantic Web data. The platform allows integrating various data processing building blocks (crawling, translation, semantic annotation, full-text search, structured XML querying, and semantic querying), presented as Web services, into a large-scale efficient platform. Calls to various services are combined inside ActiveXML documents, which are XML documents including service calls. An ActiveXML optimizer is used to: (i) efficiently distribute computations among sites; (ii) perform XQuery-specific optimizations by leveraging an algebraic XQuery optimizer; and (iii) given an XML query, chose among several distributed indices the most appropriate in order to answer the query.
Calcul de conséquences dans un système d'inférence pair-à-pair propositionnel (revisité) par Nada Abdallah et François Goasdoué. Reconnaissance des Formes et Intelligence Artificielle (RFIA), 2008.

Dans cet article, nous étudions le calcul de conséquences dans les systèmes d'inférence pair-à-pair (P2PIS) propositionnels avec des mappings orientés. Dans ces systèmes, un mapping allant d'un pair vers un autre spécifie un ensemble de connaissances que le premier pair doit observer, ainsi que les connaissances qu'il doit notifier au second pair si les connaissances observées sont satisfaites. Ces nouveaux P2PIS pouvant modéliser de nombreuses applications réelles, il est important de les doter d'inférences clés de l'IA, en l'occurrence du calcul de conséquences. Nos contributions sont doubles. Nous définissons tout d'abord le premier cadre logique pour représenter des P2PIS propositionnels avec des mappings orientés. Nous étudions ensuite le calcul de conséquences dans ce nouveau cadre. En particulier, nous proposons un algorithme totalement décentralisé pour ce problème.
Systèmes pair-à-pair sémantiques et extension (non) conservative d'une base de connaissances par Nada Abdallah et François Goasdoué. Journées de bases de données avancées (BDA), 2008.

Cet article montre en quoi la notion d'extension non conservative d'une base de connaissances (KB) est importante dans les systèmes d'inférence pair-à-pair (P2PIS), aussi connus sous le nom de systèmes pair-à-pair sémantiques. Cette notion est utile à un pair afin de détecter si (une partie de) sa KB est corrompue par un P2PIS ou pour apprendre du P2PIS de nouvelles connaissances sur son propre domaine d'application. Cette notion est d'autant plus importante qu'elle a des liens étroits avec la confidentialité des connaissances d'un pair au sein d'un P2PIS et avec la qualité de service fournie par un P2PIS. Nous étudions ici, du point de vue théorique et de l'algorithmique décentralisée, les deux problèmes suivants : (i) décider si un P2PIS est une extension conservative d'un pair donné et (ii) calculer les témoins d'une possible corruption de la KB d'un pair donné par un P2PIS, de sorte à pouvoir l'empêcher. Nous considérons des P2PIS passant à l'échelle d'un millier de pairs et dont l'utilité a déjà été démontrée en Intelligence Artificielle et en Bases de Données.
Calcul de conséquences pour le test d'extension conservative dans un système pair-à-pair par Nada Abdallah et François Goasdoué. Journées Francophones de Programmation par Contraintes (JFPC), 2008.

Dans un système d'inférence pair-à-pair (P2PIS), un pair étend sa base de connaissances (KB) avec celles des autres pairs afin d'utiliser leurs connaissances pour répondre aux requêtes qui lui sont posées. Toutefois, l'extension d'une KB n'est pas nécessairement conservative. Une extension conservative garantit que le sens d'une KB est le même lorsqu'elle est considérée seule ou avec son extension. En revanche, une extension non conservative peut changer radicalement le sens d'une KB au sein de la théorie résultante. Il est par conséquent crucial pour un pair de savoir si un P2PIS est une extension conservative de sa KB.

Publications for 2007

SomeRDFS in the Semantic Web by Philippe Adjiman, François Goasdoué, and Marie-Christine Rousset. Journal on Data Semantics, No 8, pages 158-181, Springer Journal (LNCS 4380), 2007.

The Semantic Web envisions a world-wide distributed architecture where computational resources will easily inter-operate to coordinate complex tasks such as query answering. Semantic marking up of web resources using ontologies is expected to provide the necessary glue for making this vision work. Using ontology languages, (communities of) users will build their own ontologies in order to describe their own data. Adding semantic mappings between those ontologies, in order to semantically relate the data to share, gives rise to the Semantic Web: data on the web that are annotated by ontologies networked together by mappings. In this vision, the Semantic Web is a huge semantic peer data management system. In this paper, we describe the SomeRDFS peer data management systems that promote a "simple is beautiful" vision of the Semantic Web based on data annotated by RDFS ontologies.

Publications for 2006

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset and Laurent Simon. Journal of Artificial Intelligence Research, Vol. 25, pages 269-314, 2006.

In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The main contribution of this paper is to provide the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Another important contribution is to apply this general distributed reasoning setting to the setting of the Semantic Web through the somewhere semantic peer-to-peer data management system. The last contribution of this paper is to provide an experimental analysis of the scalability of the peer-to-peer infrastructure that we propose, on large networks of 1000 peers.
SomeWhere: A Scalable Peer-to-Peer Infrastructure for Querying Distributed Ontologies by Marie-Christine Rousset, Philippe Adjiman, Philippe Chatalic, François Goasdoué, Laurent Simon. Invited talk paper (talk given by Marie-Christine Rousset) in OTM Conferences 2006, Lecture Notes in Computer Science, volume 4275, pages 698-703, 2006.

In this invited talk, we present the SomeWhere approach and infrastructure for building semantic peer-to-peer data management systems based on simple personalized ontologies distributed at a large scale. Somewhere is based on a simple class-based data model in which the data is a set of resource identifiers (e.g., URIs), the schemas are (simple) definitions of classes possibly constrained by inclusion, disjunction or equivalence statements, and mappings are inclusion, disjunction or equivalence statements between classes of different peer ontologies. In this setting, query answering over peers can be done by distributed query rewriting, which can be equivalently reduced to distributed consequence finding in propositional logic. It is done by using the message-passing distributed algorithm that we have implemented for consequence finding of a clause w.r.t a set of distributed propositional theories. We summarize its main properties (soundness, completeness and termination), and we report experiments showing that it already scales up to a thousand of peers. Finally, we mention ongoing work on extending the current data model to RDF(S) and on handling possible inconsistencies between the ontologies of different peers.
The Semantic Web from an Industrial Perspective by Alain Léger, Johannes Heinecke, Lyndon J.B. Nixon, Pavel Shvaiko, Jean Charlet, Paola Hobson, and François Goasdoué. Tutorial paper for Reasoning Web Summer School, LNCS 4126, pages 232-268, Springer-Verlag, 2006.

Semantic Web technology is being increasingly applied in a large spectrum of applications in which domain knowledge is conceptualized and formalized (e.g., by means of an ontology) in order to support diversified and automated knowledge processing (e.g., reasoning) performed by a machine. Moreover, through an optimal combination of (cognitive) human reasoning and (automated) machine reasoning and processing, it is possible for humans and machines to share complementary tasks. The spectrum of applications is extremely large and to name a few: corporate portals and knowledge management, e-commerce, e-work, e-business, healthcare, e-government, natural language understanding and automated translation, information search, data and services integration, social networks and collaborative filtering, knowledge mining, business intelligence and so on. From a social and economic perspective, this emerging technology should contribute to growth in economic wealth, but it must also show clear cut value for everyday activities through technological transparency and efficiency. The penetration of Semantic Web technology in industry and in services is progressing slowly but accelerating as new success stories are reported. In this paper and lecture we present ongoing work in the cross-fertilization between industry and academia. In particular, we present a collection of application fields and use cases from enterprises which are interested in the promises of Semantic Web technology. The use cases are detailed and focused on the key knowledge processing components that will unlock the deployment of the technology in the selected application field. The paper ends with the presentation of the current technology roadmap designed by a team of Academic and Industry researchers.

Publications for 2005

SomeWhere in the Semantic Web by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset and Laurent Simon. International Workshop on Principles and Practice of Semantic Web Reasoning, Lecture Notes in Computer Science, volume 3703, pages 1-16, 2005. Also in SOFSEM'06: Theory and Practice of Computer Science (a.k.a. International Conference on Current Trends in Theory and Practice of Computer Science), as an invited talk paper (talk given by Marie-Christine Rousset).

In this paper, we describe the SomeWhere semantic peer-to-peer data management system that promotes a "small is beautiful" vision of the Semantic Web based on simple personalized ontologies (e.g., taxonomies of classes) but which are distributed at a large scale. In this vision of the Semantic Web, no user imposes to others his own ontology. Logical mappings between ontologies make possible the creation of a web of people in which personalized semantic marking up of data cohabits nicely with a collaborative exchange of data. In this view, the Web is a huge peer-to-peer data management system based on simple distributed ontologies and mappings.
Scalability Study of Peer-to-Peer Consequence Finding by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset and Laurent Simon. International Joint Conference on Artificial Intelligence (IJCAI), 2005.

In peer-to-peer inference systems, each peer can reason locally but also solicit some of its acquaintances, sharing part of its vocabulary. This paper studies both theoretically and experimentally the problem of computing proper prime implicates for propositional peer-to-peer systems, the global theory (union of all peer theories) of which is not known (as opposed to partition-based reasoning).

Publications for 2004

Answering Queries using Views: a KRDB Perspective for the Semantic Web by François Goasdoué and Marie-Christine Rousset. ACM Journal - Transactions on Internet Technology (TOIT), Volume 4, Issue 3, pages 255-288, 2004.

In this paper, we investigate a first step towards the long-term vision of the Semantic Web by studying the problem of answering queries posed through a mediated ontology to multiple information sources whose content is described as views over the ontology relations. The contributions of this paper are twofold. We first offer a uniform logical setting which allows us to encompass and to relate the existing work on answering and rewriting queries using views. In particular, we make clearer the connection between the problem of rewriting queries using views and the problem of answering queries using extensions of views. Then we focus on an instance of the problem of rewriting conjunctive queries using views through an ontology expressed in a description logic, for which we exhibit a complete algorithm.
Distributed Reasoning in a Peer-to-peer Setting by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset et Laurent Simon. European Conference on Artificial Intelligence (ECAI'04), pages 945-946 (accepted as a short paper), 2004.

In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The contribution of this paper is twofold. We provide the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. We also present first experimental results that are promising.
Raisonnement distribué dans un environnement de type pair-à-pair by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset et Laurent Simon. Actes des dixièmes Journées Nationales sur la résolution Pratique de Problèmes NP-Complets (JNPC'04), pages 11-22, 2004.

Dans un système d'inférence pair-à-pair, chaque pair peut raisonner localement mais peut également solliciter son voisinage constitué des pairs avec lesquels il partage une partie de son vocabulaire. Dans cet article, on s'intéresse aux systèmes d'inférence pair-à-pair dans lesquels la théorie de chaque pair est un ensemble de clauses propositionnelles construites à partir d'un vocabulaire local. Une caractéristique importante des systèmes pair-à-pair est que la théorie globale (l'union des théories de tous les pairs) n'est pas connue (par opposition aux systèmes de raisonnement fondés sur le partitionnement). La contribution de cet article est double. Nous exposons le premier algorithme de calcul d'impliqués dans un environnement pair-à-pair : il est anytime et calcule les impliqués progressivement, depuis le pair interrogé jusqu'aux pairs de plus en plus distants. Nous énonçons une condition suffisante sur le graphe de voisinage du système d'inférence pair-à-pair, garantissant la complétude de notre algorithme. Nous présentons également quelques résultats expérimentaux prometteurs.
Intégration d'Information par Médiation by François Goasdoué et Marie-Christine Rousset. Plein Sud Spécial Recherche, Université Paris-Sud XI, 2004.

L'intégration d'information est une discipline récente et incontournable de l'informatique. Elle a pour but de faciliter aux utilisateurs de moyens informatiques l'accès aux informations disséminées sur les réseaux (Internet, intranets,...). Les applications d'intégration d'informations les plus connues du grand public sont certainement les moteurs de recherche (Google, Voila, Yahoo,...). Une méthode d'intégration nommée médiation permet d'aller au-delà des services rendus par ces moteurs en concevant par exemple des portails de commerce électronique capables d'intégrer les données de plusieurs fournisseurs de contenu (Kelkoo,...). C'est cette méthode que nous abordons ici en présentant les travaux menés sur ce sujet dans l'équipe Intelligence Artificielle et Systèmes d'Inférence du Laboratoire de Recherche en Informatique de l'Université Paris Sud XI.

Publications for 2003

Querying Distributed Data through Distributed Ontologies: a Simple but Scalable Approach by François Goasdoué and Marie-Christine Rousset. IEEE Intelligent Systems, Volume 18, Issue 5, pages 60-65, 2003. Also in the international workshop Information Integration on the Web (IIWeb'03) of the Internation Joint Conference on Artificial Intelligence (IJCAI'03).

In this paper, we define a simple but scalable framework for peer-to-peer data sharing systems, in which the problem of answering queries over a network of semantically related peers is always decidable. Our approach is characterized by a simple class-based language for defining peer schemas as hierarchies of atomic classes, and mappings as inclusions of logical combinations of atomic classes. We provide an anytime and incremental method for computing all the certain answers to a query posed to a given peer such that the answers are ordered from the ones involving peers close to the queried peer to the ones involving more distant peers.

Publications for 2002

Construction de Médiateurs pour Intégrer des Sources d'Information Multiples et Hétérogènes : le Projet PICSEL by Alain Bidault, Christine Froidevaux, Hélène Gagliardi, François Goasdoué, Chantal Reynaud, Marie-Christine Rousset et Brigitte Safar. Journal I3 : Information - Interaction - Intelligence, Volume 2, Numéro 1, pages 9-58, 2002.

Le nombre croissant de données accessibles via des réseaux (intranet, internet, etc.) pose le problème de l'intégration de sources d'information préexistantes, souvent distantes et hétérogènes, afin de faciliter leur interrogation par un large public. Une des premières approches proposées en intégration d'informations pr&ocric;ne la construction de médiateurs.
Un médiateur joue un rôle d'interface de requêtes entre un utilisateur et des sources de données. Il donne à l'utilisateur l'illusion d'interroger un système homogène et centralisé en lui évitant d'avoir à trouver les sources de données pertinentes pour sa requête, de les interroger une à une, et de combiner lui-même les informations obtenues.
L'objectif de cet article est de présenter le projet PICSEL qui offre un environnement déclaratif de construction de médiateurs.
PICSEL se distingue des systèmes d'intégration d'informations existants par la possibilité d'exprimer le schéma du médiateur dans un langage CARIN combinant le pouvoir d'expression d'un formalisme à base de règles et d'un formalisme à base de classes (la logique de description ALN). PICSEL intègre un module d'affinement de requêtes, première brique d'un module de dialogue coopératif entre un médiateur et ses utilisateurs.
Compilation and Approximation of Conjunctive Queries by Concept Descriptions by François Goasdoué and Marie-Christine Rousset. European Conference on Artificial Intelligence (ECAI'02), pages 267-271, 2002. Also in the international workshops Description Logics (DL'02) and Knowledge Representation meets DataBases (KRDB'02).

In this paper, we characterize the logical correspondence between conjunctive queries and concept descriptions. We exhibit a necessary and sufficient condition for the compilation of a conjunctive query into an equivalent ALE concept description. We provide a necessary and sufficient condition for the approximation of conjunctive queries by maximally subsumed ALN concept descriptions.

Publication for 2001

Réécriture de Requêtes en termes de Vues dans CARIN et Intégration d'Informations by François Goasdoué. Thèse de doctorat, Université Paris-Sud XI, 2001.

Publications for 2000

The Use of CARIN Language and Algorithms for Information Integration: The PICSEL Project by François Goasdoué, Véronique Lattes, and Marie-Christine Rousset. International Journal of Cooperative Information Systems (IJCIS), World Scientific Publishing Company, Volume 9, Number 4, pages 383-401, 2000.

PICSEL is an information integration system over sources that are distributed and possibly heterogeneous. The approach which has been chosen in PICSEL is to define an information server as a knowledge-based mediator in which CARIN is used as the core logical formalism to represent both the domain of application and the contents of information sources relevant to that domain. In this paper, we describe the way the expressive power of the CARIN language is exploited in the PICSEL information integration system, while maintaining the decidability of query answering. We illustrate it on examples coming from the tourism domain, which is the first real case that we have to consider in PICSEL, in collaboration with the travel agency Degriftour.
Rewriting Conjunctive Queries using Views in Description Logics with Existential Restrictions by François Goasdoué and Marie-Christine Rousset. Description Logics (DL'00), pages 113-122, 2000.

In database, rewriting queries using views has received significant attention because of its relevance to several fields such as query optimization, data warehousing, and information integration. In those settings, data used to answer a query are restricted to be extensions of a set of predefined queries (views).
The information integration context is typical of the need of rewriting queries using views for answering queries. Users of information integration systems do not pose queries directly to the (possibly remote) sources in which data are stored but to a set of virtual relations that have been designed to provide a uniform and homogeneous access to a domain of interest.
When the contents of the sources are described as views over the (virtual) domain relations, the problem of reformulating a user query into a query that refers directly to the relevant sources becomes a problem of rewriting queries using views.
In this paper, we study the problem of rewriting conjunctive queries over DL expressions into conjunctive queries using a set of views that are a set of distinguished DL expressions, for three DLs allowing existential restrictions: FLE, ALE and ALEN. For FLE, we present an algorithm that computes a representative set of all the rewritings of a query. In the full version of this paper (cf. Technical report), we show how to adapt it to deal with the negation of atomic concepts in the queries and in the views, in order to obtain a rewriting algorithm for ALE. Finally, we show that obtaining a finite set representative of all the rewritings of a query is not guaranteed in ALEN.

Publications for 1999

A Knowledge Based Approach for Information Integration: The PICSEL System by François Goasdoué. Declarative Data Access on the Web, Dagstuhl-Seminar-Report 251, page 7, 1999.

Nowadays, a large amount of data is reachable on the web. Data are stored in information sources that can be heterogeneous and distributed. Information integration provides many interesting approaches like mediation to allow users to access these data. Mediation aims at building a mediator which acts as an interface between users and information sources, giving users the illusion of querying a homogeneous and centralized system. To do this, a mediator provides a unique query language to users and a vocabulary from a semantic description (ontology) of a particular application domain. Those ones are used to formulate queries.
Here, we present our knowledge based mediator: the PICSEL system. Its main characteristic is an integration of information sources fully driven by the semantic description of an application domain, and by a semantic description of integrated sources consisting in (i) one-to-one mappings between sources and domain relations (semantic views), (ii) semantic constraints over thoses views.
Moreover, since XML emerges as a new standard for web documents, we show how easy it is for us to perform the integration of such documents in PICSEL. First, we show how we can capture the semantic of an XML document schema (DTD) thanks to the vocabulary of the application domain semantic description. Then, we present a generic way to connect a mediator to an XML repository. In PICSEL, it consists in building a generic wrapper which translates a PICSEL query into an X-OQL query (X-OQL is an XML query language). This generic wrapper is not a traditional one i.e., a fixed set of predefined queries. Our generic wrapper looks like a black box which dynamicaly generates for any PICSEL query, the right X-OQL query.
Modeling Information Sources for Information Integration by François Goasdoué and Chantal Reynaud. International Conference on Knowledge Engineering and Knowledge Management (EKAW'99), p. 121-138, Lecture Notes in AI 1621, Springer-Verlag, 1999.

The aim of this paper is to present an approach and automated tools for designing knowledge bases describing the contents of information sources in PICSEL knowledge-based mediators. We address two problems: the abstraction problem and the representation problem, when information sources are relational databases. In a first part, we present an architectural overview of the PICSEL mediator showing its main knowledge components. Then, we describe our approach and tools that we have implemented (1) to identify, by means of an abstraction process, the main relevant concepts, called semantic concepts, in an Entity Relationship model and (2) to help representing these concepts using CARIN, a logical language combining description logics and Datalog Rules, and using specific terms in the application domain model.

Publications for 1998

Aide à la Conception de Bases de Connaissances pour le Médiateur PICSEL by François Goasdoué. Rapport de DEA, Université Paris-Sud XI, 1998.