• No results found

Information Architecture

N/A
N/A
Protected

Academic year: 2021

Share "Information Architecture"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

A

ctiAn

And

t

he

B

ig

d

AtA

(2)

The Actian Big Data Information Architecture

Originally founded in 2005 to acquire the Ingres database business from CA, Actian has gradually built a portfolio of products, most of which squarely target the “Big Data” market. It includes several types of database products and a whole suite of data integration and analytics software, all of which culminate into the Actian Analytics Platform.

The Actian Analytics Platform

The Actian Analytics Platform is rich in database capability that can and surely will be used in Big Data projects and to build Big Data architectures. Its core functionalities can be briefly described as follows:

• It includes a very high performance column-store database that was built to deliver extreme scale-up performance on a single server. It is one of the few technologies that has been engineered to take maximum advantage of on-chip vector instructions of x86 chips. It has been proven in database implementations up to tens of terabytes. It currently benchmarks as the world’s fastest query-oriented database by a wide margin. • Rather than choosing to build a more conventional MPP version of its column-store

database, Actian preferred to implement it over Hadoop’s HDFS and name the product Hadoop SQL Edition. While this product is in its first release at the time of writing, it has, nevertheless, benchmarked as considerably faster (by multiples) on the TPC-DS benchmark. Currently it can scale to 30 Hadoop nodes, but this will likely increase in future releases.

• It offers a scale-out analytical database which can be deployed across hundreds of server nodes to process extremely large collections of data at and beyond the petabyte level. It has many built-in analytical functions in the engine and thus parallelizes both queries and analytical calculations.

• It offers an object database which is often deployed as a graph database for traversing data networks rather than tables. That type of workload would be its most likely role within a Big Data environment, although it could also be used as a document database.

Data in Motion

The Actian Analytics Platform includes a software building and execution capability that processes data in flight. It can be used to build data workflows where data is processed as it is piped from one source to another. In terms of Big Data architecture, it is a key feature for Actian as it complements Actian’s variety of database products. The important aspects of this capability are:

• It processes data in parallel using both pipeline parallelism and data segmentation parallelism. As such, it is extremely fast, and when used with HDFS, it is far faster than Hadoop’s native MapReduce framework.

• The underlying parallelization engine auto-configures to make optimal use of the available computer resources on which it is deployed.

(3)

• For users and software developers, it provides a codeless drag-and-drop prototyping environment for building data workflows.

• It scales out across multiple server nodes, and it can span Hadoop and non-Hadoop environments. It can also interface to data streams.

• In respect to analytics, it is directly integrated with the open source KNIME suite of machine learning software and can execute routines written in the R language.

If one considers the broad field of business intelligence (BI) and data analytics, which will be the primary application area for Big Data, it is clear that many activities (data access, metadata capture, data cleansing, data transformation and organization prior to ingest into a database) are not database applications. They are, however, suitable applications for the workflow development and data processing features built into Actian’s platform.

Clearly the Actian Analytics Platform can also be used to carry out analytical processing and to query Hadoop directly (using SQL via Hive or, of course, Actian’s own Hadoop SQL Edition). Thus, in many scenarios, the platform is an alternative as well as a complement to an analytical database.

Actian and Big Data Architecture

In our research paper entitled The Big Data Information Architecture (June 2014) we describe an event-driven architecture that we expect to supersede the traditional data warehouse architecture that has dominated the IT industry for almost two decades. The Actian Analytics Platform fits the described architecture very well.

We illustrate this in Figure 1 on the following page, which depicts what we refer to in our research paper as a Data Refinery and Processing Hub. This Hub is responsible for both ingesting data into an organizations’s data layer and providing a processing service that may involve data queries and analytical calculations on collections of data. The Data Hub is an arrangement of hardware and software that replaces the collection of ETL jobs, data staging areas, data warehouse and operational data stores that constitutes the traditional BI environment. Additionally, it exceeds the capability of the traditional BI environment in being able to handle data streams and unstructured data, as well as large data volumes.

If we consider the Actian Analytics Platform from the database perspective, it is clearly well-equipped to provide a comprehensive database capability for the Data Hub. The platform’s support for SQL workloads and its Hadoop SQL Edition for larger data volumes can deliver excellent performance, and its analytical database can handle analytical queries. Its object-based database is equipped to store data in the form of connected graphs or documents and can process the associated workloads.

The Actian Analytics Platform can be deployed to provide a continuous data flow service from Hadoop to any of Actian’s data stores, including Hadoop SQL Edition. As the data hub gradually expands over time, the ETL capabilities can be maintained and augmented.

(4)

Figure 1: Actian Analytics Platform Deployed in a Data Refinery and Processing Hub

Just as the Actian Analytics Platform would be deployed for data flow within the Hub, it would also be used for data pulled from external data sources or received directly as data streams. Similarly, it will be used for data export from The Hub, directly from Hadoop or any database within The Hub to feed data marts and export data to other environments.

By employing the Actian Analytics Platform in this manner, all data movements to, from and within The Hub can execute in parallel.

A fundamental idea of The Data Hub is that, as far as possible, all SQL queries that run on corporate data would execute there. There may be pragmatic reasons for exporting data from The Hub to data marts to feed other databases (for example, supplying data to an IBM mainframe environment), but these would be minimized. Because The Data Hub is built to be a fully scalable environment, as workloads grow, more commodity servers are configured into the environment to handle the expanding demand.

BI and analytics applications that simply wish to access data would do so directly, connecting to one or another of the databases within The Hub to launch SQL queries, or possibly, directly harvesting the data.

(5)

Because the Actian Analytics Platform offers a development environment it can also be used both to develop and execute other activities that may take place within The Data Hub, such as data cleansing, metadata discovery and so on. It can also orchestrate the activities of other software tools that might be used within The Hub.

The Actian Analytics Platform is an extraordinarily versatile solution, and organizations who select Actian to provide the foundation of their Big Data information architecture will no doubt make extensive use of it.

Actian in Summary

As far as we are aware, Actian is the only vendor that currently provides a broad line of software capabilities that include both a suite of database products that cater for multiple query types (SQL queries, analytical queries, graph queries, document queries) and also a data flow development environment and engine.

As such, it has all the requisite components for building a Data Hub of the type described in our research report, and hence provides the foundation for a Big Data environment to initially supplement and ultimately replace the traditional data warehouse environment and support an extensive analytical capability.

About The Bloor Group

The Bloor Group is a consulting, research and technology analysis firm that focuses on open research and the use of modern media to gather knowledge and disseminate it to IT users. Visit both www.TheBloorGroup.com and www.InsideAnalysis.com for more information.

References

Related documents

GERTRUDE STEIN RECENT SNAPSHOT TENDER Objects BUTTONS Food (1) Rooms STEIN (2) By GERTRUDE OBJECTS A Carafe, That is a Blind Glass A kind in glass and a cousin, a spectacle and

The anomalous existence of sections in two separate Acts providing in similar though not identical terms for appeals to the Supreme Court of Canada was corrected

Bretton, Financial Assistance in Share Transactions 32 The Conveyancer 6, and F. Hennessy, Provision of Financial Assistance by a Company for the Purchase of its Own

That within each program plan tactics that location, setting a great success is the messages should be improved marketing strategy down into loyal buyers.. When to influencing the

What Does Revocable Mean Living Trusts and Testamentary Trusts Can insert Create better Trust on out Own Information Necessary then Create a rule Trust Funding.. La Poll Associates

None of reference is invalid, drug information to opioids under the references from a group of the technologies we cannot show you already on our pdr.. Just some electronic access

Existence of 4-factors in star-free graphs with high connectivity Yoshimi Egawa Department of Mathematical Information Science Tokyo University of Science Shinjuku-ku, Tokyo,

 Using symmetric algorithms for payment card data invites