Much work has been achieved during the last decade to expose legacy data as RDF, in which two approaches generally apply: either the RDF graph is materialized by translating the data into RDF and loading it in a triple store (in an ETL – Extract, Transform and Load - manner), or the raw data is unchanged and a query language such as SPARQL is used to access the virtual RDF graph through query rewriting techniques. While materializing the RDF graph can be needed in some contexts, it is often impossible in practice due to the size of generated graphs, and not desirable when data freshness is at stake. Several methods have been proposed to achieve SPARQLaccess to relational data, either in the context of RDF stores backed by RDBs [5,18,8] or using arbitrary relational schemas [3,20,15,16]. R2RML , the W3C RDB-to-RDF mapping language recommendation is now a well-accepted standard and various SPARQL-to-SQL rewriting approaches rely on it [20,15,16]. Other solutions intend to map XML data to RDF [2,1], and the CSV on the Web W3C working group 6 makes a recommendation for the description of and access to CSV data on the Web. RML  is an extension of R2RML that tackles the mapping of data sources with heterogeneous data formats such as CSV/TSV, XML or JSON. The xR2RML mapping language  is an extension of the R2RML and RML addressing the mapping of a large and extensible scope of non-relational databases to RDF. Some works have been proposed to use MongoDB as an RDF triple store, and in this context they designed a method to translate SPARQL queries into MongoDB queries . MongoGraph 7 is an extension of AllegroGraph 8 to query MongoDB documents with SPARQL queries. It follows an approach very similar to the Direct Mapping approach defined in the context of RDBs : each field of a MongoDB JSON document is translated into an ad-hoc predicate, and a mapping links MongoDB document identifiers with URIs. SPARQL queries use the specific find predicate to tell the SPARQL engine to query MongoDB. Despite those approaches, to the best of our knowledge, no work has been proposed yet to translate a SPARQL query into the MongoDB query language and map arbitrary MongoDB documents to RDF.
NoSQL datastores are emerging non relational databases committed to provide high scalability for database operations over several servers. These platforms are getting increasing attention by companies and organizations for the ease and efficiency of handling high volumes of heterogeneous and even unstructured data. Although NoSQL datastores can handle high volumes of personal and sensitive information, up to now the majority of these systems provide poor privacy and security protection. Initial research contributions started studying these issues, but they have mainly targeted on security aspects. To the best of our knowledge, we are not aware of any work targeting privacy-aware access control for NoSQL systems, but we believe that privacy policies can also be enforced in NoSQL database systems similar to what has been proposed for Relational Database Management Systems. With this work, we begin to solve this issue, by proposing an approach using purpose-basedaccess control capabilities into MongoDB, one of the most popular NoSQL datastore. However, different from relational databases, where all existent systems refer to the same data model and query language, NoSQL datastores operate with various languages and data models. This variety makes the definition of a general approach to have privacy-aware access control into NoSQL datastores a very important goal. It is believed that a stepwise approach is necessary to define such a general solution. As such, here, focus is on: 1) a single datastore, and 2) selected rules for privacy policies. The problem is approached by focusing on MongoDB, which, according to the DB-Engines Ranking, 2 ranks, by far, as the most popular NoSQL datastore. MongoDB uses a document-oriented data model. Data are modelled as documents, namely records, possibly images collections that are stored into a database.
of these discrepancies, bridging the gap between those two worlds is a challenging endeavor.
Two strategies generally apply when it comes to access non-RDF data as RDF. In the graph materialization strategy, the transformation is applied ex- haustively to the database content, the resulting RDF graph is loaded into a triple store and accessed through a SPARQL query engine [ 18 ] or by derefer- encing URIs (as Linked Data). On the one hand, this strategy easily supports further processing or analysis, since the graph is made available at once. On the other hand, the materialized RDF graph may rapidly become outdated if the pace of database updates is high. Running the transformation process pe- riodically is a common workaround, but in the context of large data sets, the cost (in time, memory and CPU) of materializing and reloading the graph may become out of reach. To work out this issue, the query rewriting strategy aims to access heterogeneous databases as virtual RDF graphs. A query processor rewrites a SPARQL query into the query language of the target database. The target database query is evaluated at run-time such that only relevant data are fetched from the database and translated into RDF triples. This strategy better scales to big data sets and guarantees data freshness, but entails overheads that may penalize performances if complex analysis is needed.
For the first set of templates, SUMMR provides five templates designed for the five mapping maintenance and reuse tasks previously derived (from the state of the art review) at a one-to-one relationship. These five task templates rely on standard SPARQL queries and are designed to be performed over an RDF-basedmapping representation. Table 5-1 shows the SUMMR task templates and their purpose. While there are five task templates, each one is flexible with the exception of Task Template 3 (as seen in the examples below). They can vary based on factors such as whether vocabulary transformation or interlink category mappings are to be considered at the same time or each separately, how general or specific the mapping search criteria has to be or whether a federated SPARQL call is needed to access a remote dataset. This flexibility allows the templates to be useful in a range of scenarios where the needs of users differ. Note also that task templates can be combined to form complex operations and also are used in combination to implement the maintenance and reuse use cases.
Chapter 8. SPARQL-to-SQL Translation 117
mode, that is, ontology instances are only created to answer queries (only the data instances needed to answer a given query are transformed to ontology instances).
This approach requires a phase of query translation, where queries submitted over ontologies (using an ontology query language) are translated into corresponding queries over local infor- mation sources (using the native query languages of these sources, such as SQL for relational databases, and XQuery for XML data sources). We use SPARQL as the ontology query language in OWSCIS. An introductory presentation of this query language is given in Appendix F . In Section 3.5 , we gave an overview on query processing in OWSCIS system. We saw that the querying web service decomposes a user query into a set of sub-queries, and send them to the different data providers. Each data provider locally processes its sub-query and returns the results to the querying web service.
The storage method of production data is the important factor of the informatization oilfield. Many aspects are impact on efficiency, such as data query, large-scale data analysis, data mining, real-time data compute. The most smart oilfield constructions of the Huabei Oilfield are based on SCADA or DCS system in PetroChina. There are some problems on data storage and usage by this approach, such as inconsistent data format, lacks of storage flexibility and extensibility, low data security and using rate, etc. The traditional ways are not suitable for large-scale production data storage, application and big data analysis. In this paper, a petroleum production storage platform based on MongoDB and relational database is designed. The platform is built for convert, transmission, save uniform data from mostly SCADA or DCS system, and can provide unified restful API for third-party data access. The Production Command System (PCS) is developed, which can provide functions of real-time monitoring, auto-computing different kinds of report, production data management, production schedule management, etc.
sented in Section 4.4. We then discuss the proposed approach for automated conflict resolution in a CDSS in Chapter 5.
4.2 Data Integration
A conventional integrative information system aims to combine heterogeneous (and possibly autonomous) data sources or schemas to provide users with a single uni- fied (and usually reconciled) view of the data, which is known as a global or me- diated schema (Hull, 1997; Ullman, 1997; Halevy, 2001). The heterogeneous data sources may have different data models, schemas, and data representations (Motro and Anokhin, 2006). The global schema provides a single representation for any real- world object that might have multiple representations in different data sources. In other words, when users submit queries against the global schema, they should not be aware about the multiple and heterogeneous data sources behind this global schema, and the query result should contain a consistent answer in respect to all the heteroge- neous data sources. The most common integration scenario for integrating multiple and heterogeneous sources into a unified view is composed of three steps (Naumann et al., 2006). The three steps are schema matching and mapping, duplicate detection, and data fusion. Before the data integration process starts, we should have access to the remote data sources, which is currently solved by many technologies, like ODBC and JDBC connections, Web services, and many others. In the following, we provide a brief overview of the three steps of the data integration process.
Among the most prominent automatic tools, we mention BootOX  and IncMap . BootOX is based on direct mapping 4 : every table in the database is mapped into a class of the ontology, data attributes are mapped on data properties, and foreign keys to object properties. IncMap, instead, runs in two phases: firstly, it uses lexical and structural matching and, secondly, it represents with a meta-graph the ontology and the schema of the input dataset in order to preserve their structure. In the field of more general purpose tools, we remark MIRROR  and D2RQ . Both tools do not necessarily exploit an existing domain ontology, but they can generate an ontology on-the-fly based on the input data schema. In details, MIRROR produces R2RML direct mappings exploiting the morph-RDB 5 engine.
We exploit the edge insertion algorithm in . Or- dinarily there may be or may not be lub or glb in a role hierarchy. Thus, we construct the global role graph be- fore inserting the RT relations in the way that mapping all the maximal nodes of each domain (including appli- cation hierarchy) to the (global) MaxRole and all the minimal nodes to the MinRole [Fig. 3] before config- uring RT relation. Then we apply the algorithm 1 to insert RT relation between the application hiearachy and the local domain. The algorithm detects inconsis- tency. If the global role graph falls to be inconsistent state after certain RT relation (edge) insertion – for ex- ample, inserting (2) after (1) or vice versa in Figure 3 – effective privileges of conflicting roles get to be same as the aformentioned analysis.
of the target database to RDF, a SPARQL query is translated into an pivot abstract query by matching SPARQL graph patterns with relevant mappings. This step can be made generic if the mapping language used is generic enough to apply to an extensible set of databases. In a second step, the abstract query is translated into the target database query language, taking into account the specific database capabilities. Our focus, in this paper, is on the first step. Lever- aging previous works on R2RML-basedSPARQL-to- SQL methods, we define a pivot abstract query lan- guage and a method to translate a SPARQL query into an abstract query, utilizing xR2RML to describe the mapping of a target database to RDF. The method determines a reduced set of mappings matching each SPARQL graph pattern, and takes into account join constraints implied by shared variables and cross- references denoted in the mappings, and SPARQL fil- ters. Lastly, common query optimization techniques are applied to the abstract query in order to alleviate the work required in the second step.
Map Reduce operations are deﬁned reducing the data size. The execution time is less on the number of documents that are effectively processed. The security level for data in each user when varying the policy rule. The considered selectivity range of rule takes into account policy with method of ﬁltering effect . The general approach to the rule of privacy-aware access control into NoSQL data stores a very important goal. Users are only allowed to execute for access purposes for which they have a proper authorization. Purpose authorizations are granted to users as well as to roles. The data storage and network transfer format for documents, simple and fast. Recommendation of index type for proposed indexes. Using frequent item set as a method to build a certain order of combined indexes out of ﬁelds of each frequent query. Use of query optimizer to select the ﬁnal recommended indexes. Our approach to create virtual indexes which removes any modiﬁcation in the database. Applying the approach to a document-based NoSQL database. A typical setting involves two users: one that gets information from the other that is either to share (only) the requested information. Consequently, there is a tension between information sharing and privacy. On the one hand, sensitive data needs to be kept confidential; on the other hand, data owners may be willing, or forced, to share information. Integrity and authentication is necessary while it is clear that safety-critical applications require authentication, it is still wise to use it even for the rest of applications. However, authentication alone does solve the problem
3 Translating SPARQL Queries into Abstract Queries under xR2RML Mappings
Various methods have been defined to translate SPARQL queries into another query language, that are generally tailored to the expressiveness of the target query language. Notably, the rich expressiveness of SQL and XQuery makes it possible to define semantics-preserving SPARQL rewriting methods [8,2]. By contrast, NoSQL databases typically trade off expressiveness for scalability and fast retrieval of denormalised data. For instance, many of them hardly support joins. Therefore, to envisage the translation of SPARQL queries in the general case, we propose a two-step method. Firstly, a SPARQL query is rewritten into a pivot abstract query under xR2RML mappings, independently of any tar- get database (illustrated by step 1 in Figure 1). Secondly, the pivot query is translated into concrete database queries based on the specific target database capabilities and constraints. In this paper we focus on the application of the sec- ond step to the specific case of MongoDB. The rest of this section summarizes the first step to provide the reader with appropriate background. A complete description is provided in .
NOSQL databases group together heterogeneous data. They permit to store large volumes of structured, semi- structured, and unstructured data. Moreover, They provide high speed access to the stored data and are very flexible. There are four types of NOSQL databases namely key/value stores, column family stores, document oriented databases and graph databases. We are convinced that document oriented databases are the best ones to initiate ontology learning process for many reasons. They are very flexible and may handle very large amounts of structured and unstructured data. Moreover, they are schema less which reduces the complexity. Our choice is oriented particularly to MongoDB ** as a document oriented database to which we will wrap all data sources for many reasons. First, it is the fastest-growing new database in the world that provides a rich document oriented structure with dynamic queries. Second, it allows to compartmentalize data into collections in order to divide data logically. Thus, the speed of queries can increase dramatically by querying on a subset of the data instead of all it. Collections are analogous to tables in a relational database. Each collection contains documents that can be nested in complex hierarchies but still easy to query and index. A document is seen as a set of fields, each one being a key-value pair. A key is a string and the associated value may be a basic type, an embedded document, or an array of values. MongoDB can manage data of any structure without expensive data warehouse loads, no matter how often it changes. Thus, we can cheap new functionality without redesigning the database.
Results: We present SAFE: a query federation engine that enables policy-aware access to sensitive statistical datasets represented as RDF data cubes. SAFE is designed specifically to query statistical RDF data cubes in a distributed setting, where access control is coupled with source selection, user profiles and their access rights. SAFE proposes a join-aware source selection method that avoids wasteful requests to irrelevant and unauthorised data sources. In order to preserve anonymity and enforce stricter access control, SAFE’s indexing system does not hold any data instances—it stores only predicates and endpoints. The resulting data summary has a significantly lower index generation time and size compared to existing engines, which allows for faster updates when sources change. Conclusions: We validate the performance of the system with experiments over real-world datasets provided by three clinical organisations as well as legacy linked datasets. We show that SAFE enables granular graph-level access control over distributed clinical RDF data cubes and efficiently reduces the source selection and overall query execution time when compared with general-purpose SPARQL query federation engines in the targeted setting.
To conclude, in this paper, we build feature vectors for SPARQL queries by exploiting the syntactic and structural characteristics of the queries. We observe that KNN performs better than SVR on predicting the elapsed time of real-world SPARQL queries. The proposed two-step prediction performs better than one- step prediction because it considers the broad range of observed elapsed time. The prediction in the warm stage is generally better than in the cold stage. We identify the reason comes from same structured queries because many queries are issued by programmatic users, who tend to issue queries using query templates. Our work is on static data and we will consider dynamic workload in the future. Techniques that can incorporate new training data into an existing model will also be considered.
special case of RDF data holds a very special poten- tial: RDF data encoding other formal languages. In computer science, formal languages have been used for instance to define programming languages, query languages, data models and formats, knowledge for- malisms, inference rules, etc. Among them, in the early 2000’s, XML has gained the status of a meta- language or syntax enabling to define so-called XML languages. In the same way, we are now assisting to the advent of RDF that will likely be more and more used as a syntax to represent other languages. For instance in the domain of the Semantic Web alone, this is the case of three W3C standards: OWL 2 (Pe- ter F. Patel-Schneider and Motik, 2012) is provided with several syntaxes, among which the Functional syntax, the Manchester syntax used in several ontol- ogy editors and RDF/XML and RDF/Turtle; the Rule Interchange Format (RIF) (Sandro Hawke, 2012) is provided with several syntaxes among which a ver- bose XML syntax, two compact syntaxes for RIF- BLD and RIF-PRD and an RDF syntax; SPARQL In- ference Notation (SPIN) is a W3C member submis- sion (Knublauch, 2011) to represent SPARQL rules in RDF, to facilitate storage and maintenance. Many other languages can (and will) be “serialized” into RDF. For instance (Follenfant et al., 2012) is an at- tempt to represent SQL expressions in RDF.
There are following different types of data providers included in ADO.Net
The .Net Framework data provider for SQL Server - provides access to Microsoft SQL Server. The .Net Framework data provider for OLE DB - provides access to data sources exposed by using OLE DB.
1.4.1 Background and Properties
Data is increasing so much fast rate, so cannot think about schema in advance. So need a schema less database that can be easily scalable and also follow the BASE properties. So MongoDB is document oriented store having so many properties. It is developed by 10gen Company. It is originated from word humongous 23 . It is open source, free software has cheaper storage and used for web based applications. It is used in those where applications need updating, insertion. It is simply performed join and complex operations have dynamic schemas. It is written in C++ and supports ad hoc queries. It is used for load balancing means scaling can be done with MongoDB is very easily. It is based on document store, it contains JSON java script notation documents having semi structured data. It has APIs that can supported any programming languages that is why it can easily use with any programming language
Snapshot-type Recordset— a static copy of a set of records that you can use to find data or generate reports. A snapshot-type Recordset object can contain fields from one or more tables in a database but can't be updated. This type corresponds to an ODBC static cursor.
Forward-only-type Recordset— identical to a snapshot except that no cursor is provided. You can only scroll forward through records. This improves performance in situations where you only need to make a single pass through a result set. This type corresponds to an ODBC forward-only cursor.
5. Running the application. When your application compiles correctly, run it by typing java ATM sun.jdbc.odbc.JdbcOdbcDriver jdbc:odbc:ATM . The first command-line argument sun.jdbc.odbc.JdbcOdbcDriver speci- fies the database driver for the Access databases. The second command-line argument jdbc:odbc:ATM specifies the database location. If you use a name other than ATM in Fig. 17.6, say test , then the second command-line argu- ment should be jdbc:odbc:test . Figure 17.8 shows the application run- ning. Test the application as in Chapter 26.