• No results found

INSY 5337 DATA WAREHOUSE AND BUSINESS INTELLIGENCE

N/A
N/A
Protected

Academic year: 2021

Share "INSY 5337 DATA WAREHOUSE AND BUSINESS INTELLIGENCE"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

OLAP TOOLS: A COMPARITIVE STUDY

Pradeep Medagiri

1001757740

INSY 5337

DATA WAREHOUSE AND BUSINESS INTELLIGENCE

RESEARCH PAPER

(2)

I.

INTRODUCTION

It is important for businesses to analyze the data available to them to make important decisions for the organization. Businesses often use analytical methods to reduce expenditure and increase efficiency of operations, leading to successful performance of the firm. It is understood that the more data a firm has available regarding a certain department or activity, the more refined the analysis would be as there would be a higher level of detail. Most businesses gather data using different systems and interfaces but managing such volumes of data to create reliable and accurate information whenever required remains a challenge. Such challenges can be overcome by using OLAP tools. Online Analytical Processing or OLAP, as it is commonly abbreviated are tools used for data analytics and it enables the user to analyze data from multiple database systems simultaneously and efficiently. It is the tool behind many Business Intelligence (BI) applications.

OLAP technology has been attributed the ability to aggregate and make underlying calculations in data sets; these actions help in making well-informed business decisions. The OLAP technology implemented may be business and industry specific and certain level of customization is required. The OLAP system may be different for a finance firm in comparison to a medical institution, as their requirements will be vastly different. The user must have a clear objective about the required solution system, and the freedom lies in their hands as to which type of tool they implement for their business.

The source data for OLAP is Online Transactional Processing (OLTP) databases that are commonly stored in data warehouses. OLAP data is extrapolated from historical data and aggregated into dynamic systems that enable advanced analysis. Data in OLAP systems is organized hierarchically and stored in information cubes instead of the traditional tabular column

(3)

format. It is an advanced technology which consists of multidimensional structures to deliver fast access to data for desired analysis.

OLAP databases consist of two basic types of data, numeric data which are also called measures. The quantities and general numeric data that is used to make informed business decisions falls under this category. Categories that you use to organize these measures are called dimensions. OLAP databases help organize data by many levels of detail. A comparative analysis between four types of OLAP tools has been done, namely IBM Cognos, MicroStrategy, Mondrian and Apache Kylin.

II.

OLAP TOOLS

1. IBM Cognos

IBM Cognos Analytics is a cloud and on-premises-based business intelligence solution that offers users access to a wide range of business analytics functionality to help them make well-informed decisions, faster. It offers a full complement of essential analytics functions, including advanced dash boarding, data integration, reporting, exploration, and data modelling

Cognos Analytics was recently upgraded completely by adding AI assistant to it which allows user to ask questions and get the answer in natural language. Interface is also changed to whole new level making it easier for the user in business field. With the all these new features Cognos Analytics is accessible to businesses of every size.

Features:

1. An AI Assistant that learns clients' information and proposes new information joins and representations

(4)

2. Normal language age that permits clients to pose inquiries and find solutions in plain English

3. Simplified capacities to make new dashboards and reports 4. Information purging and upgraded information demonstrating 5. Information reconciliation from two to numerous sources

6. The capacity to redo reports with organization hues, text styles and logos. Burst capacity that lets users appropriate reports by means of email or Slack

7. An interface that is basic enough for amateurs, however amazing enough for information researchers

Target Market:

It services over 26,000 companies from a broad cross-section of industries including aerospace, defense, banking, education, healthcare and many more. Below are some of their clients:

Nike, GKN Land System, Spain’s Ministry of Defense, British Airways, Chemring, Quinte Health Care, Troy Corporation, Michigan State University, Lufthansa Cargo, Jabil

Customer Service & Support:

1. IBM has a thorough client care framework that incorporates preparing and affirmation courses, and client support accessible for clients of their Business Intelligence programs, for example, Cognos.

2. They offer preparing that can be individualized dependent on explicit jobs inside an organization and can be actualized both on the web and face to face. They additionally offer private courses, and concentrated confirmations to reflect ability that guide in professional success.

(5)

3. Online technical support furnishes clients with access to IBM pros that can address questions or potential issues with programming.

Pricing:

IBM Cognos has a free preliminary where clients get the full item limit with regards to 30 days. At that point, there are two estimating plans: Premium and Enterprise.

Premium:

This arrangement permits the solution to be sent in a multi-occupant cloud design. It's valued at $70 per client every month and incorporates:

1. Transferring information to cloud sources 2. Joining information from various sources 3. Sharing content with users

4. Capacity to make information stories utilizing website pages, pictures, content and other media

5. Data discovering using artificial intelligence 6. Advanced pattern detection

Enterprise:

This arrangement incorporates the full ability of Cognos Analytics, for example, 500 GB client stockpiling and reinforcement. Evaluating isn't openly accessible, so you'll need to contact IBM straightforwardly for a statement. Clients can alternatively buy cloud throughput limit and an extra 250 GB client stockpiling

(6)

Pros:

1. Simulated intelligence Assistant that learns clients' information and proposes new information joins and perceptions; intuitive abilities to make new dashboards and reports; information mix from two to numerous sources

Cons:

1. Steep learning curve as software can be robust.

IBM Cognos Analytics offers users access to a wide range of business analytics functionality to help them make well-informed decisions faster (1) (2).

2. Mondrian

Mondrian is open source software to actualize an OLAP server using the Java programming language. Mondrian embraced the widely favored Microsoft query language MDX also known as multi-dimensional expressions query language from Analysis Services offered by Microsoft. It also implements XML for analysis [3] [4]. The fundamental Application Programming Interface (API) of Mondrian is the Mondrian proprietary. It examines from SQL and supplementary data sources and accumulates in memory cache [3] [4].

The Mondrian has a segmental architecture which facilitates users to create a reporting tool capable of visualizing the Mondrian OLAP data in Java or other language by means of the XML interface of Mondrian. Mondrian also offers JPivot, an open source web-based reporting tool for visualizing OLAP data for users that do not want to create a tool. Mondrian is capable to run on the most popular databases including MySQL, Oracle and Microsoft SQL Server etc.

(7)

Features:

1. Data Handling - Mondrian is a very interactive tool with exceptional features including its ability to work with categorical data, enormous data and geographical data. It comprises of interconnected plots and queries. It supports geographical data by employing highly interactive maps. Mondrian also works with standard ASCII files and is capable of loading data from R workspaces [6].

2. Flexibility –It can be embedded in diverse environments and incorporated with the third-party tools. Certain limitations of the Mondrian schema language can be overcome by adding custom plug-ins to the application [6].

3. Performance & scalability - Mondrian employs in-memory storage for calculations to increase speed capability. Set up, Maintenance and restructuring of OLAP cubes is time intensive which is often a more expensive resource than power and memory for the developer [6]. By employing the right configurations, user can scale Mondrian for use by thousands of users. Additionally, in collaboration with R, Mondrian also offers exceptional statistical functions like density estimation, principal component analysis (PCA) and multi-dimensional scaling (MDS) etc.

4. Security - Mondrian employs role-based access control. The access-control profile can be set in the xml schema which defines the OLAP model or can be controlled through programming [3]. To restrict access to the specific sections of the OLAP cube, user can define roles in the schema definition by implementing role mapping of the server [6].

(8)

Deployment:

There are a number of ways that Mondrian can be deployed.

1. Mondrian can be embedded in a standalone application. This approach is beneficial for custom applications with a definitive purpose, but it is not useful for supporting an extensive variety of uses of Mondrian unless user has multiple applications [5].

2. Another way to set up would be in an application server to run on its own and offer services. This approach is not very user friendly due to lack of a good graphical user interface [5].

3. Finally, Mondrian can be employed as a segment of a larger web application that provides tools for users. This method is the most flexible, but it also demands the most organization and configuration. Fortunately, with the availability of appropriate solutions it can be easily configured and deployed [5].

Pros:

1. Mondrian is free and open source. It is licensed under the ETL.

2. It is a good choice for single relational database as storage layer; those which do not fully meet the needs for fast fetching search results through queries.

3. Mondrian is highly suitable for real-time systems since it can run multi-dimensional queries for a continuously changing database [7].

4. It enables users to scrutinize business data by drilling and cross-tabulating information [7].

5. Since it is completely written in Java it is easy to employ and integrate to existing Java applications [3].

(9)

6. The official Pentaho Forum is a beneficial community for getting help related to the functionality of Mondrian [7].

Cons:

1. Scripting Mondrian schemas demands a higher learning curve and can be a tedious process.

2. It requires an advanced front-end tool for displaying and analyzing the data from cubes. 3. It is an outdated and less popular tool due to the competition of open source tools in the

IT market.

4. Documentation available for Mondrian is comprehensive but not easy to browse easily. 5. In spite of the efficiency, if the SQL queries are not optimized or the schema is

inadequately designed it could slow down the user application.

3. Micro-Strategy

MicroStrategy web is for interactive analysis with user friendly environment where as MicroStrategy developer is built with business intelligence environment to meet BI requirements.

Grids, Graphs, Grids & Graphs combined, and SQL view are different types of reports can be generated in MicroStrategy depending on type of the work or requirements[11].Objects used on reports are Attributes, Metrics, Filters, Prompts and all of these objects created in MicroStrategy fall into one of three groups: Schema objects (created by project designer), Application objects (created by report designer) and building block objects (created by report designer).

Sharing sets of data among reports is by intelligent cubes, In order to implement Intelligent cube technology on MicroStrategy BI platform there are two unique methods namely: Personal

(10)

intelligent cubes (OLAP service features are processed on in-memory copy of data instead of processing in data warehouse) and Intelligent cubes (returned sets of data from data warehouse can be directly saved in Intelligence server memory rather than returning data from data warehouse for single report).

MicroStrategy’s OLAP Services is an expansion of MicroStrategy Intelligence Server[8]. MicroStrategy OLAP services allows user to make use of features that slice and dice data in reports without re-executing SQL against the data source. This improves performance by resulting in quicker data display within a report as users analyse and manipulate the data. MicroStrategy is one of the examples of Relational OLAP servers.

Devices that support MicroStrategy: web based also IOS & Android supported. Recommended customer types are: Medium business and Enterprise. Both phone and online customer supports are available. Pricing plans varies depending on device/package starting with $600 and free trial available. According to “comparecamp”, MicroStrategy tool scored 80%.

In generating reports, Advanced Analytics and self-service, MicroStrategy tops in rating compared with SAP Business Objects. Undoubtedly Oracle BI is best at self-service when compared with MicroStrategy [Based on user reviews collected by G2 Crowd][9].

The first drawback of MicroStrategy would be pricing, at first it seems to be competitive but then its quick to find that the addition of all required units quickly raises the price levels close to Cognos and business objects. Also, it uses different interface types because it needs separate applications. “EMANIO” suggests that, Companies under $1B in revenues would be wise to consider alternatives to MicroStrategy business intelligence [10].

(11)

4. Apache Kylin

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis which support large data sets. Using this algorithm, we can increase the capability of hadoop Map Reduce. It takes care of Sorting, grouping and shuffling, developers focus on cubing logic. In addition to that, it also allows you to query large data set with very little delay in steps. The initial step is to identify star/snowflake schema on Hadoop, then, build a cube from analyzed table and finally, query with ANSI-SQL gets results in a quick time via ODBC or JDBC.

Apache kylin, pre - computes various combinations of dimensions to achieve speed. It also measures aggregates via hive queries and the results will populate the Hbase. Kylin’s user friendly UI gives access to kylin query engine via an API or JDBC. The strategy for fast lookup is achieved by the features of Hbase and apache calcite query processor. Moreover, because of the use of tree data structure for the dictionary of the data value, the Hbase rowkeys are of perfect size.

Apache kylin is one of the fastest OLAP engine at present. It is designed to minimize the query delay in Hadoop/spark for billions of data. ANSI SQL interface on Hadoop/spark supports most of the query functions. Kylin has an interactive query capability by which users can interact with Hadoop data very quickly compared to hive query for the same dataset. MOLAP cube helps the user to explain a model and prebuild in apache kylin with billions of raw data. With the help of Business Intelligence tools like tableau and Power BI, kylin is offering integration capability.

Apache kylin has an eco-system; the core is the fundamental framework of OLAP engines, which consists of job engine, metadata engine, storage engine and the query engine. For sake of service claim request, it also consists of rest server. Plugins are used to give access to addition

(12)

function and features. Alerting system, Job scheduler, monitoring and ETL are integrated and managed by lifecycle management support. The user interface allows third party users to build and customize the kylin core. Different tools and product such as tableau are supported by ODBC and JDBC drivers.

Apache kylin is been used by top corporate companies like ebay, Cisco, Samsung, etc. because of its pros. There are many advantages with this tool; like, the algorithm is very stable, failure of mapper and reducer are rare. Even if your hadoop cluster is small or busy, it can finish at the end. It also has a premium API support to build and manage cubes. Most importantly, it is integrated with most Business Intelligence tools. On the other hand, there are few disadvantages as well, the most important disadvantage is, it only with hive table as an input source. Kylin supports only star scheme, which means we are limited to a single fact table for every cube. For many dimensional cubes, many MR jobs are needed, which in turn increases hadoop’s overhead cost. Thus, the performance is not good when the cube has many dimensions [12][13].

III. CONCLUSION

OLAP Tools is of immense value in terms of Business Intelligence and can be utilized in a number of applications such as the analysis of sales forecast, financial reporting etc. So it is essential to choose an appropriate OLAP tool for one’s business considering its features.

When an enterprise commits to acquire an OLAP-based BI system, it is critical that the system adheres to the current and potential future requirements of the enterprise. With the extensive variety of OLAP technologies available, it has become critical to know the distinctions between them.

The previous sections discussed in more detail the four popular OLAP tools in the market based on their features, deployment options, availability, pros and cons. Table 1 shown below depicts a comparison

(13)

of the OLAP tools based on some of the preliminary features to give an overview of what was discussed in much more detail in the previous section.

Although each OLAP tool is different, the type employed by the user depends on the needs of their business and functionality of it. But some of the major features that are most commonly and fundamentally considered by business are front-end flexibility, performance and security features.

Among its competitors the IBM Cognos tool offers the best option in terms of a well-balanced criterion in terms of ease of use, efficiency, data integrity, performance and security which make it a widely popular and preferred tool in the IT market.

Table1. Comparison of OLAP tools

Tool Visualization Operating Systems Availability Type Mondrian Available Windows, Linux, UNIX, z/OS Open Source Multidimensional OLAP IBM Cognos Available Windows, Linux,

UNIX Proprietary License Multidimensional OLAP MicroStrategy Dossier, Dashboard Reports Windows, Linux, UNIX

Licensed Relational OLAP

Apache Kylin Superset, Zeppelin, Tableau, Qlik, Redash Linux, UNIX Open Source Multidimensional OLAP

(14)

References

1. IBM Cognos Analytics, https://en.wikipedia.org/wiki/IBM_Cognos_Analytics 2. What is IBM Analytics? https://www.ibm.com/products/cognos-analytics 3. Rudolf, Strijkers. A Survey on On-Line Analytical Processing,”. 16 June 2004.

4. “Chapter 2. Mondrian: A First Look · Mondrian in Action.” Livebook.Manning.Com, livebook.manning.com/book/mondrian-in-action/chapter-2/6.

5. . “Mondrian vs Elasticsearch: What to Choose | Flexmonster.” Www.Flexmonster.Com, www.flexmonster.com/blog/mondrian-vs-elasticsearch-what-to-choose-for-your-project/. 6. “What Are Online Analytical Processing Tools? In 2020 - Reviews, Features, Pricing,

Comparison.” PAT RESEARCH: B2B Reviews, Buying Guides & Best Practices, 1 Aug. 2019, www.predictiveanalyticstoday.com/online-analytical-processing-tools/.

7. “Why Developers like Mondrian.” StackShare, stackshare.io/mondrian. 8. MicroStrategy, Basic Reporting Guide 2015

9. MicroStrategy Review, https://comparecamp.com/microstrategy-review-pricing-pros-cons-features/

10. MicroStrategy vs Tableau: Which BI Software Is The Winner?, https://www.selecthub.com/business-intelligence/tableau-vs-microstrategy/

11. MicroStrategy Report Types,

https://doc-archives.microstrategy.com/producthelp/10.8/BasicReporting/WebHelp/Lang_1033/Cont ent/BasicReporting/Report_types.htm

12. Apache Kylin Technical Blog, http://kylin.apache.org/blog/2015/08/15/fast-cubing/ 13. Apache Kylin, https://en.m.wikipedia.org/wiki/Apache_Kylin

References

Related documents

Bachelor of Technology, Biomedical Engineering, Chemical & Biomolecular Engineering, Civil & Environmental Engineering, Electrical & Computer Engineering, Engineering

Wall sawmill at the mouth of Park (Pass) Creek with the Nemeyer/Triplett (Mountain Home) homestead in the background. In the early 1890’s August Sasse had the home- stead and it

Data collection included pre-operative de- mographic data (sex, age, smoking history, body mass index [BMI] – kg/m 2 ), comorbidities, radiological assessment (diameter of

Repeat-visit tourists have had from repeat-visit tourists; (2) to ascertain the type of the experiences visiting the travel destination and they are destination brand image

Naloxone is a potent opioid antagonist and is used in reversal of opioid- induced respiratory depression, either in overdose or in those patients who have suffered exaggerated

Hammer-ons and pull-offs are indicated by slurs (the context tells

Authentication protocols for the IoT may be improved in terms of (1) addressing both the authentication and privacy problem, (2) developing efficient IDSs, (3) improving the

Some of these are illustrated in the four case studies presented here (Food Research Collaboration, Brighton and Sussex Universities Food Network, Middlesbrough Environment