In September 2011 SAP announced its intentions to partner with EMC and VMWare to enable a HANA based application infrastructure cloud.[29] This platform as a service (PaaS) offering includes HANA DB-as-a-service in conjunction with a choice of either a Java-based or ABAP-based stack. Applications built for either stack will have access to HANA DB through a variety of APIs. The Java based approach, codenamed Project River, is based on the NetWeaver 7.3.1 Java application server. The ABAP-based approach is designed more for SAP's existing user base - for example in the SAP Business ByDesign suite of business applications including ERP, CRM and supply chain management.[30]
On October 16, 2012 SAP announced general availability of two SAP HANA options delivered in the cloud:[] • SAP NetWeaver Cloud (now called SAP HANA Cloud[]) – an open standards-based application service and • SAP HANA One – a deployment of SAP HANA on the Amazon Web Services cloud on an hourly basis. Only
60GB option is available and a 24/7 instance costs $30,572/year,[] though an upfront commitment with Amazon can substantially reduce the hardware portion of the cost.
Technology
Architecture
At its most basic, the architecture of the HANA database system has the following components.[] •• Four Management services
•• The Connection and Session Management component manages sessions/connections for database clients. Clients can use a variety of languages to communicate with the HANA database.
•• The Transaction Manager component helps with ACID compliance by coordinating transactions, controlling transactional isolation and tracking running and closed transactions.
• The Authorization Manager component handles all security and credentialing (see Security below). •• The Metadata Manager component manages all metadata such as table definitions, views, indexes and the
definition of SQL Script functions. All metadata, even of different types, is stored in a common catalog. •• Three Database Engine components
•• Calculation Engine component executes on calculation models received from SQL Script (and other) compilers.
•• Optimizer and Plan Generator component parses and optimizes client requests.
•• Execution Engine component invokes the various In-Memory Processing Engines and routes intermediate results between consecutive execution steps based on the optimized execution plan.
•• Three In-Memory Storage Engines
• Relational Engine (see Column and row store below) • The Graph Engine (where should this go?)
• Persistency Layer (see Storage below)
Column and row store
The Relational Engine supports both row- and column-oriented physical representations of relational tables. A system administrator specifies at definition time whether a new table is to be stored in a row- or in a column-oriented format. Row- and column-oriented database tables can be seamlessly combined into one SQL statement, and subsequently, tables can be moved from one representation form to the other.
The row store is optimized for concurrent WRITE and READ operations. It keeps all index structures in-memory rather than persisting them on disk. It uses a technology that is optimized for concurrency and scalability in multi-core systems. Typically, Metadata or rarely accessed data is stored in a row-oriented format.
Compared to this, the column store is optimized for performance of READ operations. Column-oriented data is stored in a highly compressed format in order to improve the efficiency of memory resource usage and to speed up the data transfer from storage to memory or from memory to CPU. The column store offers significant advantages in terms of data compression enabling access to larger amounts of data in main memory. Typically, user and application data is stored in a column-oriented format to benefit from the high compression rate and from the highly optimized access for selection and aggregation queries.
Business Function Library
The Business Function Library is a reusable library (similar to stored procedures) for business applications embedded in the HANA calculation engine. This eliminates the need for developing such calculations from scratch. Some of the functions offered are
• Annual depreciation •• Internal rate of return •• Net present value
Predictive Analysis Library
Similar to the Business Function Library, the Predictive Analysis Library is a collection of compiled analytic functions for predictive analytics. Among the algorithms supported are
•• K-means clustering •• ABC analysis •• C4.5 algorithm •• Linear regression
R integration
R is a programming language designed for statistical analysis. An open source initiative (under the GNU Project) R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[31] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript.[][32]
Storage
The Persistency Layer is responsible for the durability and atomicity of transactions. It manages data and log volumes on disk and provides interfaces for writing and reading data that are leveraged by all storage engines. This layer is based on the proven persistency layer of MaxDB, SAP’s commercialized disk-centric relational database. The persistency layer ensures that the database is restored to the most recent committed state after a restart and that transactions are either completely executed or completely undone. To achieve this efficiently, it uses a combination of write-ahead logs, shadow paging, and savepoints.
Logging and transactions
HANA's persistence layer manages logging of all transactions in order to provide standard backup and restore functions. The same persistence layer manages both row and column stores. It offers regular save points and logging of all database transaction since the last save point.[]
Concurrency and locking
HANA DB uses the multiversion concurrency control (MVCC) principle for concurrency control. This enables long-running read transactions without blocking update transactions. MVCC, in combination with a time-travel mechanism, allows temporal queries inside the Relational Engine.[][]