论文标题

使用Morph-CSV增强基于虚拟本体的访问,而不是表格数据

Enhancing Virtual Ontology Based Access over Tabular Data with Morph-CSV

论文作者

Chaves-Fraga, David, Ruckhaus, Edna, Priyatna, Freddy, Vidal, Maria-Esther, Corcho, Oscar

论文摘要

基于本体的数据访问(OBDA)传统上专注于提供异质数据集的统一视图,要么通过将集成数据实现为RDF,要么通过通过SPARQL查询翻译进行飞行查询。在表示为多个CSV或Excel文件的表格数据集的特定情况下,通过将每个源作为一个可以加载到关系数据库管理系统(RDBMS)中的单个表来应用查询翻译方法。然而,对这些表的约束并未表示;因此,属性之间的一致性均未强制执行。结果,SPARQL到SQL翻译过程的效率可能会受到影响,以及在评估生成的SQL查询期间产生的答案的完整性。我们的工作专注于对OBDA查询翻译过程应用隐式约束,而不是表格数据。我们提出了Morph-CSV,该框架是查询表格数据的框架,该框架从典型的OBDA输入(例如,映射,查询)中利用信息来强制执行可以与任何SPARQL-to-SQL-SQL OBDA发动机一起使用的约束。 Morph-CSV均依赖于约束组件和一组约束操作员。对于给定的一组约束,将操作员应用于每种类型的约束,以增强查询完整性和性能。我们在几个领域中评估了Morph-CSV:具有BSBM基准的电子商务;使用来自马德里地铁的GTFS数据集的基准运输;和生物学,并从Bio2RDF项目中提取的用例。我们比较并报告了两种sparql-to-sql obda发动机的性能,而无需并入morphCSV。观察到的结果表明,Morph-CSV能够将总查询执行时间加快速度多达两个数量级,而能够产生所有查询答案。

Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets, either by materializing integrated data into RDF or by performing on-the fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented; thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with a benchmark using the GTFS dataset from the Madrid subway; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of MorphCSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源