Top PDF A Mapping-based Method to Query MongoDB Documents with SPARQL

A Mapping-based Method to Query MongoDB Documents with SPARQL

A Mapping-based Method to Query MongoDB Documents with SPARQL

Exposing legacy data as RDF has been the object of much research during the last years, usually following two approaches: either by materialization, i.e. translation of all legacy data into an RDF graph at once, or based on on-the-fly translation of SPARQL queries into the target query language. The materializa- tion is often difficult in practice for big datasets, and costly when data freshness is at stake. Several methods have been proposed to achieve SPARQL access to relational data, either in the context of RDB-backed RDF stores [8,21,11] or us- ing arbitrary relational schemas [4,23,17,18]. R2RML [9], the W3C RDB-to-RDF mapping language recommendation is now a well-accepted standard and several SPARQL-to-SQL rewriting approaches hinge upon it [23,17,19]. Other solutions intend to map XML [3,2] or CSV 4 data to RDF. RML [10] tackles the mapping of heterogeneous data formats such as CSV/TSV, XML and JSON. xR2RML [14] is an extension of R2RML and RML addressing the mapping of an extensible scope of databases to RDF. Regarding MongoDB specifically, Tomaszuk pro- posed a solution to use MongoDB as an RDF triple store [22]. The translation of SPARQL queries that he proposed is closely tied to the data schema and does not fit with arbitrary documents. MongoGraph 5 is an extension of the AllegroGraph triple store to query arbitrary MongoDB documents with SPARQL. Similarly to the Direct Mapping [1] the approach comes up with an ad-hoc ontology (e.g. each JSON field name is turned into a predicate) and hardly supports the reuse of existing ontologies. More in line with our work, Botoeva et al. recently pro- posed a generalization of the OBDA principles to MongoDB [6]. They describe a two-step rewriting process of SPARQL queries into the MongoDB aggregate query language. In the last section we analyse in further details the relationship between their approach and ours.
Show more

16 Read more

Mapping-based SPARQL access to a MongoDB database

Mapping-based SPARQL access to a MongoDB database

Related works. Much work has been achieved during the last decade to expose legacy data as RDF, in which two approaches generally apply: either the RDF graph is materialized by translating the data into RDF and loading it in a triple store (in an ETL – Extract, Transform and Load - manner), or the raw data is unchanged and a query language such as SPARQL is used to access the virtual RDF graph through query rewriting techniques. While materializing the RDF graph can be needed in some contexts, it is often impossible in practice due to the size of generated graphs, and not desirable when data freshness is at stake. Several methods have been proposed to achieve SPARQL access to relational data, either in the context of RDF stores backed by RDBs [5,18,8] or using arbitrary relational schemas [3,20,15,16]. R2RML [6], the W3C RDB-to-RDF mapping language recommendation is now a well-accepted standard and various SPARQL-to-SQL rewriting approaches rely on it [20,15,16]. Other solutions intend to map XML data to RDF [2,1], and the CSV on the Web W3C working group 6 makes a recommendation for the description of and access to CSV data on the Web. RML [7] is an extension of R2RML that tackles the mapping of data sources with heterogeneous data formats such as CSV/TSV, XML or JSON. The xR2RML mapping language [12] is an extension of the R2RML and RML addressing the mapping of a large and extensible scope of non-relational databases to RDF. Some works have been proposed to use MongoDB as an RDF triple store, and in this context they designed a method to translate SPARQL queries into MongoDB queries [22]. MongoGraph 7 is an extension of AllegroGraph 8 to query MongoDB documents with SPARQL queries. It follows an approach very similar to the Direct Mapping approach defined in the context of RDBs [19]: each field of a MongoDB JSON document is translated into an ad-hoc predicate, and a mapping links MongoDB document identifiers with URIs. SPARQL queries use the specific find predicate to tell the SPARQL engine to query MongoDB. Despite those approaches, to the best of our knowledge, no work has been proposed yet to translate a SPARQL query into the MongoDB query language and map arbitrary MongoDB documents to RDF.
Show more

74 Read more

A Generic Mapping-Based Query Translation from SPARQL to Various Target Database Query Languages

A Generic Mapping-Based Query Translation from SPARQL to Various Target Database Query Languages

<#Mbox> must be projected, hence the Project part for tp 1 : {$.id AS ?x, $.emails.* AS ?mbox1} . Further- more, the child and parent joined references of a ref- erencing object map must be projected in order to fit databases that do not support joins. In the relational database case, those projected references (columns) are useless since the database can compute the join operation. Conversely, in MongoDB for instance, the join shall be processed by the query processing en- gine, therefore joined references are necessary. - Where is a set of conditions about xR2RML references. They are entailed by matching each term of triple pattern tp with its corresponding term map in triples map TM: the subject of tp is matched with TM’s subject map, the predicate with TM’s predicate map and the object with TM’s object map. Additional conditions are entailed from the SPARQL filter f. In (Michel et al., 2015b) we show that three types of condition may be created:
Show more

14 Read more

Bridging the Semantic Web and NoSQL Worlds: Generic SPARQL Query Translation and Application to MongoDB

Bridging the Semantic Web and NoSQL Worlds: Generic SPARQL Query Translation and Application to MongoDB

terms of a SPARQL triple pattern and R2RML mappings, which enables man- aging variable predicates. Furthermore, to address the second limitation, they pre-checking join constraints implied by shared variables in order to reduce the number of candidate mappings for each triple pattern. Yet again, two limitations can be noticed: (iii) References between R2RML mappings are not considered, hence joins implied by shared variables are dealt with but joins declared in the R2RML mapping graph are ignored. (iv) The rewriting process associates each part of a mapping to a set of columns, called column group, which enables filter, join and data type compatibility checks. This leverages SQL capabilities (CASE, CAST, string concatenation, etc.), making it hardly applicable out of the scope of SQL-based systems. In the three aforementioned approaches, the optimization is dependent on the target database language, and can hardly be generalized. In our attempt to rewrite SPARQL queries in the general case, such optimization are performed earlier, regardless of the target database capabilities.
Show more

41 Read more

Predicting SPARQL Query Performance and Explaining Linked Data

Predicting SPARQL Query Performance and Explaining Linked Data

Recent work on predicting database query performance [2, 13, 14] has argued that the analytical costs models used by the current generation query optimizers are good for comparing alternative query plans, but ineffective for predicting actual query performance metrics such as query execution time. Analytical cost models are unable to capture the complexities of modern database systems [2]. To address this, database researchers have experimented with machine learning techniques to learn query performance metrics. Ganapathi et al. [13] use Kernel Canonical Correlation Analysis (KCCA) to predict a set of performance metrics. For the individual query elapsed time performance metric, they were able to predict within 20% of the actual query elapsed time for 85% of the test queries. Gupta et al. [14] use machine learning for predicting query execution time ranges on a data warehouse and achieve an accuracy of 80%. Akdere et al. [2] study the effectiveness of machine learning techniques for predicting query latency of static and dynamic workload scenarios. They argue that query performance prediction using machine learning is both feasible and effective.
Show more

11 Read more

SPARQL query processing with conventional relational database systems

SPARQL query processing with conventional relational database systems

Additionally the literal hash values are modified by applying the bitwise exclusive-or operator to bit ranges of the literal hash and the integer identifier of the datatype and language. This ensures that that hash value for an integer literal "100" is distinct from the float, string etc. literals with the same lexical form, "100". This stage is not strictly necessary, but it facilitates certain comparisons and joins we wish do do in the query language, and enables some optimisations.

12 Read more

Query Recommender System using Fragments and Projection values Based F-measure and MongoDB

Query Recommender System using Fragments and Projection values Based F-measure and MongoDB

ABSTRACT: A key challenge in getting the intended output from the data warehouse is query preparation. Even for the expert user to search the information from available data is a tough task. This project primarily aims to provide the easiest way to search and recommend the information as per users need. This project is useful to create recommendations for those systems which are based on the SQL query interface and operates considering the users are unaware about the actual schema. This in fact provides them the information about schema and their attributes for search. It also facilitates options for all conditions which a user wants to apply to filter the available data. The search truly works effectively as all the cells have been identified with the specific value; this also enhances the accuracy the precision of the recommendation results. The system has been designed keeping in mind the time required to generate a query recommendation and the computing complexity associated with it. This enables user to search a data with less efforts simply by selecting the available options in the drag-down list. The list contains all the possible options for each condition and ensures no result is missed. The system is implemented with the MongoDB database which work on unstructured database and compare the result with the system implemented with the SQL database which is structural database system.
Show more

7 Read more

SPARQL Query Optimization for Structural Indexed RDF Data

SPARQL Query Optimization for Structural Indexed RDF Data

RDF-3X [16] stores all triples in a single table with compressed indexes of clustered B+-trees. The table is maintained with all six possible permutations of subject (S), predicate (P) and object (O). With sophisticated join planning and fast merge joins, the RDF-3X approach can perform a single index scan and then start processing from any literal/URI position in the pattern. However, this approach creates redundant indexes and when the size of the index is comparable to that of the data source, the increase in data storage requirements can be significant. The authors optimize join orderings and use an efficient query plan with a dedicated cost-model, which improves the selectivity estimation accuracy for joins on very large RDF graphs. However, indexing and processing queries against a whole data source still requires many join operations. When the RDF data is increased, join operations are used to produce many duplicated and useless intermediate results, increasing query response time.
Show more

21 Read more

SPARQL++ for mapping between RDF vocabularies.

SPARQL++ for mapping between RDF vocabularies.

As already mentioned in the introduction, other approaches often allow only map- pings at the level of the ontology level or deploy their own rules language such as SWRL [19] or WRL [8]. A language more specific for ontology mapping is C-OWL [3], which extends OWL with bridge rules to relate ontological entities. C-OWL is a formal- ism close to distributed description logics [2]. These approaches partially cover aspects which we cannot handle, e.g., equating instances using owl:sameAs in SWRL or relat- ing ontologies based on a local model semantics [17] in C-OWL. None of these ap- proaches though offers aggregations which are often useful in practical applications of
Show more

19 Read more

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Campus de Beaulieu, 35042 Rennes cedex, France joris.guyonvarch@gmail.com, ferre@irisa.fr Abstract. This paper overviews the participation of Scalewelis in the QALD-3 open challenge. Scalewelis is a Faceted Search system. Faceted Search systems refine the result set at each navigation step. In Scalewelis, refinements are syntactic operations that modify the user query. Scalewelis uses the Semantic Web standards (URI, RDF, SPARQL) and connects to SPARQL endpoints.

5 Read more

SPARQL Query Rewriting with Paths [Master's Thesis]

SPARQL Query Rewriting with Paths [Master's Thesis]

For reaching our purpose, we present in the rest of this document a case study for the possible SPARQL fragments, and we deal with each fragment separately. We finally combine our results into one global transformation function. We found out that not all fragments of SPARQL can be rewritten without non- distinguished variables while preserving the meaning of the query. For this reason, we define constraints on the queried dataset that allows our proposed transformation rules to become sound and complete if such constraints are met.

47 Read more

Towards a Visual SPARQL-DL Query Builder

Towards a Visual SPARQL-DL Query Builder

Aiming at integrating SPARQL-DL in an ontology engineering environment, this paper introduces a graphical language that preserves similarities to UML for creating visual SPARQL-DL queries. Furthermore, it presents a modification of the crowd architecture that allows visual SPARQL-DL queries. Implementing a query language in an already developed tool for creating ontologies via visual languages is not only the next logical step, but also a relevant one as its approach to ontology engineers mean to understand and study their ontologies and also help to create queries that their users require for satisfying their information needs. Moreover, it is possible to provide more reasoning services like anti-pattern based searches, in order to look for modelling issues in the ontology by obtaining a subset of the input concepts and rules and displaying them visually. However, a tool that supports both scenarios, design and query in a graphical way, and an implementation of a SPARQL-DL-based visual language is not available to our best of our knowledge.
Show more

10 Read more

A Framework to Process Range Aggregate Query Using MongoDB

A Framework to Process Range Aggregate Query Using MongoDB

ABSTRACT: Range searching is a fairly well-structured problem in computational geometry. Big Data deals with class of problems called Range Aggregate Query problems, the aim is to deal with some composite queries involving range searching, where one needs to do more than simple range reporting or counting. A range query applies an aggregate function over all selected cells of an OLAP data cube. The essential idea is to precompute some auxiliary information that is used to answer ad hoc queries at runtime. In order to analyze and process range aggregate query M- AQ : A framework is proposed in this paper. Existing approaches were dealt only with adhoc queries and results yielded were not satisfactory. Here M-AQ is implemented on linux platform and performance is evaluated on very large park data records .M-AQ supports range queries and also runs multiple servers. When a range aggregate query arrives it is split based on the Balanced Partitioning algorithm and distributed across multiple shards (A shard is nothing but a master with one or more slaves). Queries here return specific fields of documents and also includes user defined JavaScript functions. JavaScript is used in queries, aggregate functions (such as MapReduce) and sent directly to the MongoDB to be executed. M-AQ has O(1) time complexity for the updates of data and time complexity for range aggregate queries where N happens to be the unique tuples, P happens to be the partition number B happens to be the bucket in each of the histogram. This M-AQ framework there by reduces the cost of both network communication and local file scanning and has better performance compared to hive.
Show more

8 Read more

Natural language query interpretation into SPARQL using patterns

Natural language query interpretation into SPARQL using patterns

4.1 Justification The main postulate directing our work states that, in real applications, the submitted queries are variations of a few typical query families. The authors from [18] analyse query logs from English-speaking users on a real Web search engine and discuss their results based on previous similar studies [7,17]. Although first observations tend to contradict our hypothesis, their conclusions reinforce our need to retain it. Authors firstly point out that the vocabulary (and so potentially, for what matters to us, the semantics) of queries is highly varied. On the query set they analysed, including 926,877 queries containing at least one keyword, of the 140,279 unique terms, some 57,1% where used only once, 14,5% twice, and 6,7% three times. This represents quite a higher rate of very rarely used terms, compared to “classical” text resources. These figures must be moderated, since this phenomenon is partially caused by a high number of spelling errors, terms in languages other than English, and Web specific terms, such as URLs.
Show more

13 Read more

Parallel SPARQL Query Execution using Apache Spark

Parallel SPARQL Query Execution using Apache Spark

Recently, several large-scale graph processing frameworks have emerged for graph analytics in a shared-nothing cluster (e.g., Pregel [12], GraphLab [13], GraphX [14], Flink [15]). These frameworks can handle graphs with billions of vertices and edges. Popular computations on such graphs (e.g., PageRank) tend to be iterative in nature. For example, in the PageRank algorithm, an iterative computation is performed on each vertex in a graph by gathering state from neighbors of the vertex. Popular data-parallel systems built around the MapReduce model [16] (e.g., Apache Hadoop [17]) tend to perform poorly on such large-scale graph computations. Apache Spark and GraphX combine the data-parallel and graph-parallel model of computation within a single ecosystem. Moreover, Spark has gained a lot of traction in the industry due to its in-memory data processing capability. Spark has been benchmarked against Hadoop’s MapReduce and ran 100 times faster [18]. Spark’s SQL engine also performed well against tools like Apache Hive, Impala, Amazon Redshift, and Hortonwork’s Hive/Tez [19]. Therefore, we explore how Spark (and GraphX) can be used to enable parallel SPARQL query processing. Our goal is to handle massive RDF datasets with over a trillion RDF triples using an enterprise-grade cluster.
Show more

64 Read more

Scalable Query-based Faceted Search on top of SPARQL Endpoints for Guided and Expressive Semantic Search

Scalable Query-based Faceted Search on top of SPARQL Endpoints for Guided and Expressive Semantic Search

Documents is generally processed by keywords. However, the Web of Data [ 12 ] formalizes data in order to let machines understand and retrieve precise information. Every concept in the Web of Data is identified by a URI 1 . Information is structured as basic sentences called triples. Each triple contains a subject, a predicate and an object. SPARQL [ 9 , 1 ] is the standard query language for querying this structured information. SPARQL is expressive and its syntax is similar to SQL. However, SPARQL queries are not intended to be written by casual users. A number of works have adopted keyword search for the Web of data [ 6 , 15 ], but the Web of Data can be explored in a more user-friendly way. On one hand, the query language can be more natural but still formal. A usability study [ 14 ] compared several search systems from the less formal to the more formal one. On the other hand, faceted search [ 21 ] allows a progressive search of information in which users explore the result set with the application of successive refinements.
Show more

29 Read more

The SPARQL usage for mapping maintenance and reuse methodology

The SPARQL usage for mapping maintenance and reuse methodology

This lab-based experiment tests the expressivity of the SPARQL Centric Mapping Representation (SCMR) in two parts. The first part is concerned with the expression of vocabulary transformation category mappings in SPARQL and the second part is concerned with the encoding of SPARQL-based vocabulary transformation category mapping in RDF. This involves testing the ability of SPARQL CONSTRUCT queries to represent vocabulary transformation category mapping and the SPIN SPARQL syntax to encode those SPARQL CONSTRUCT queries in RDF, two technologies that the SCMR uses. While the SCMR also represents interlink category mappings, this experiment is not concerned with evaluating its expressivity for this category. Representing interlinks is straight forward as an interlink typically consists of a single RDF triple. Vocabulary transformation category mappings on the other hand are more complex, as they are executable mappings that are used to transform data between the vocabularies of Linked Data datasets. Vocabulary transformation category mappings consist of two or more RDF triples (at least one triple each related to the source and target of the mapping) which specify how instance data is transformed from the vocabulary of one Linked Data dataset to another. Vocabulary transformation category mapping can also contain data transformations such as converting miles to kilometres or concatenation of a first name and a last name. Therefore, since vocabulary transformation category mappings consists of two or more RDF triples, and interlinks consist of a single triple - if the SCMR can sufficiently represent vocabulary transformation category mappings then it can sufficiently represent interlink category mappings.
Show more

169 Read more

Query Response TIME Comparison Nosqldb Mongodb with Sqldb Oracle

Query Response TIME Comparison Nosqldb Mongodb with Sqldb Oracle

MongoDB There are some steps (execution plan) in select operations on Oracle. Firstly, select statement will be executed. If select based on certain criteria (where clause), Oracle access a table that will be selected with full access table. Execution plan performed on the select statement is to filter based on the criteria. Select operations on tables that have relationship to other table executed by accessing table based on index rowid then scan the unique index (pri- mary key). Filter also executed against foreign key (primary key from other tables) that has not null value and to the specific criteria. Select operation in MongoDB is more flexible due to using of dynamic schema and docu
Show more

12 Read more

Multilingual SPARQL Query Generation Using Lexico-Syntactic Patterns

Multilingual SPARQL Query Generation Using Lexico-Syntactic Patterns

The creation of our QA system was done in three steps. First, the AMAL system was created, aimed at answering simple questions with the help of some lexical patterns and without real support for multiple languages, being able to only answer questions in French. Second, the AMAL system led to LAMA, a more complex QA system that incorporates multilingual support by allowing questions in both French and English and improves various components of the original system. Finally, LAMA was enhanced by adding a dependency parsing module and a pattern-based processing step in its pipeline and introduced lexico-syntactic patterns. The presented patterns are separated into two different groups: dependency-based and POS- based ones. Each pattern is presented with its corresponding generated SPARQL query. An evaluation was done over two different datasets for each pattern type separately as well as a combination of both pattern types. Results show that the impact of using patterns in our QA system is noticeable, especially on the SQA dataset where an improvement of 0.28 on the F-score can be observed (from 0.535 to 0.816). Experiments on the QALD dataset show a smaller impact of 0.061, mostly due to the type of questions in the dataset. In addition to this, an was done to compute the relative frequency of each pattern in order to determine if they were really representative enough to be considered.
Show more

105 Read more

Evaluation of SPARQL query generation from natural language questions

Evaluation of SPARQL query generation from natural language questions

they generated. No gold standard was prepared— the authors examined each query and determined whether or not it accurately represented the orig- inal natural language question. (Yahya et al., 2012) used two human judges to manually exam- ine the output of their system at three points— disambiguation, SPARQL query construction, and the answers returned. If the judges disagreed, a third judge examined the output. (McCarthy et al., 2012) does not have a formal evaluation, but rather gives two examples of the output of the SPARQL Assist system. (This is not a system for query generation from natural language ques- tions per se, but rather an application for assisting in query constructions through methods like auto- completion suggestions.) (Unger et al., 2012) is evaluated on the basis of a gold standard of an- swers from a static data set. It is not clear how (Lopez et al., 2007) is evaluated, although they give a nice classification of error types. Review- ing this body of work, the trends that have char- acterized most past work are that either systems are not formally evaluated, or they are evaluated in a functional, black-box fashion, examining the mapping between inputs and one of two types of outputs—either the SPARQL queries themselves, or the answers returned by the SPARQL queries. The significance of the work reported here is that it attempts to develop a unified methodology for evaluating systems for SPARQL query generation from natural language questions that meets a vari- ety of desiderata for such a methodology and that is generalizable to other systems besides our own. In the development of our system for SPARQL query generation from natural language questions, it became clear that we needed a robust approach to system evaluation. The approach needed to meet a number of desiderata:
Show more

5 Read more

Show all 10000 documents...