• No results found

of 9614.2 Data Access Layer

The layer is responsible for implementing and supporting various data access technologies/features from general to specific and from batch services to real time services. The indicative list of such data access technologies/features are as following:

i) It should support a scalable data processing platform to run SQL queries on the data stored in HDFS.

ii) It should provide scripting capabilities to hadoop. High level language which boosts the parallelization in the hadoop ecosystem which in turn helps in handling the very large data sets.

iii) It should facilitate real time data processing in the hadoop. Typically used in the conditions like cyber security analytics, threat detection, data monetization etc where speed is of paramount importance.

iv) Column oriented NoSQL database should be allowed to be integrated with hadoop. It should support hosting very large tables to store multi-structured or sparse data. v) Low-latency storage and retrieval system which provides cell level security should be

provisioned which in turn enables the intermingling of different data sets with access control policies to allow/disallow the users view data.

vi) It should be able to store the metadata and information about the location of the data stored in the hadoop which helps in understanding the structure of data stored.

vii) Provision for a scalable, fast and fault-tolerant publish-subscribe messaging system should be made available. It is to be used for website activity tracking, metrics collection and monitoring etc. It should be compatible with other services present in the hadoop ecosystem like Storm, Hbase and Spark.

viii) It should provide a library of auto learning algorithms. The auto learning algorithms may be based on artificial intelligence focused on enabling machines to learn without being explicitly programmed. It is to be used to improve future performance based on previous outcomes.

ix) Framework for deployment and management of long-running data access applications in Hadoop should be made available. The framework should allow users to create and run different versions of heterogeneous long-running applications in Hadoop with YARN. It should be able to expand or shrink application instances while they are running. x) Provision for highly reliable, scalable and fault tolerant search engine should be made

available. Its feature includes advanced full-text search, near real-time indexing and support to interfaces like XML, JSON, HTTP, HTML etc.

xi) Support for fast, in-memory processing with development APIs in Scala, java and python which in turn would help in implementing quick algorithms for advanced data analytics.

14.3 Integration and management Layer:

The layer is responsible for quickly and easily loading data, and managing according to policy configured. The indicative list of services/features present in this layer is as following:

i) Provision for importing and exporting data to and from external structured databases should be made available. It should support popular databases like Oracle, MySQL, Microsoft SQL Sever, PostgreSQL, Teradata etc.

ii) Provision for efficiently collecting, aggregating and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS).

Page 93 of 96

iii) Data management framework integrated with YARN to be used for centrally managing the cluster’s data and enabling pipeline processing should be made available.

14.4 Security Layer

The layer is responsible for addressing requirements of Authentication, Authorization, Accounting and Data Protection. Typical features include:

i) Provision for single point of access and authentication for hadoop services in the cluster. It should simplify the hadoop security for the users who access the cluster data and execute queries. It would help in maintaining compliance with enterprise security policies

ii) Provision for comprehensive approach to security for hadoop cluster. Can be used for audit tracking and policy analysis. Also provides an option to delegate administration of certain data to other group owners, helping in secure decentralization of data ownership.

14.5 Operations/Configuration Layer

The layer is responsible for provisioning, managing, monitoring and operate Hadoop clusters at scale. Main features are as following:

i) An Operational framework for provisioning, managing and monitoring Apache Hadoop clusters should be made available. It should be able to provide web interface to control the lifecycle of Hadoop services and components, modify configurations and manage the ongoing growth of the cluster.

ii) Provision for a web application to schedule hadoop jobs. It should combine multiple jobs sequentially into one logical unit of work. It should be integrated with the Hadoop stack, and support Hadoop jobs for MapReduce, Pig, Hive and Sqoop.

iii) Provision for a distributed configuration service, a synchronization service and a naming registry for distributed systems. Distributed applications should use these services to store and mediate updates to important configuration information.

14.6 Research Platform

i) The research platform would be used for pre-production analysis of hadoop data to be fed to advanced analytics layer supplied by SAS.

ii) RISL would use the existing SAS for analytics; therefore research platform should be easily accessible from SAS. The research platform should also provide connectors and interface with Hadoop; this will enable RISL SAS users to access Hadoop from research platform and will not need to learn technical details of Hadoop.

iii) The Platform should have capabilities that allow the entire research process (data acquisition, data preparation, data analysis, and data visualization) to be completed within one nested query.

iv) The platform should help in creating different analytical models quickly so that the same can be productionized for the benefit of the state.

v) The Platform should discover new patterns and trends of interest and any potential correlations or triggering events leading to Fraud Investigation and other analysis

Page 94 of 96

vi) The Platform should allow creating multiple applications with pre built functions or code into an App that can be parameterized and run repetitively, thus simplifying and accelerating the app building, configuring, running and sharing experience.

vii) The Platform should be designed for rapid, on the fly discovery of insights that allows for iterations that keep enriching previously gained insights.

viii) The Platform should support multiple functions to be utilized in a single query without the need of juggling through multiple engines and should be able to integrate with the various open source libraries like R, Fuzzylogix etc.

ix) The Discovery Platform should support different kinds of engines including o Graph Analysis

o Path and Pattern Discovery o Statistical and Machine learning o Text and Sentiment Analysis x) Temporal and time series Analysis

xi) The Platform should have the capability to build networks based on defined specific relationships like household, common promoter, shareholder, common director, transactions etc.

xii) The Platform should provide visualization for navigation & drilling into networks and chronological view as well as scaling from 10s to 100s to lakhs and crores of entities and their relationships and should be able to store those Graphs for doing further analysis.

xiii) The Platform should have the capability to score the network based on risk rules, predictive models.

xiv) The Platform should use Natural Language Processing (NLP) for parsing and to identify subject matter of interest.

xv) The solution should SQL interface to access or query stored data in HDFS system. The interface should have industry standard SQL commands.

Page 95 of 96

ANNEXURE- 15: Configuration of SAS advanced analytics platform deployed at RISL

S.

No. Product Description Configuration

1 SAS® Enterprise Content Categorization 8 cores

2 Additional SAS® Text Data Language Pack - Hindi

3 SAS® Sentiment Analysis 8 cores

4 Additional SAS® Sentiment Analysis Language Pack - Hindi

5 SAS Enterprise Miner - Desktop Desktop

6 SAS Text Miner - Desktop Desktop

7 SAS Analytics Pro - Desktop Desktop

8 Platform Suite for SAS 8 cores

9 SAS Office Analytics 8 Cores

10 SAS add-on / Access Interface to Oracle 8 Cores

11 SAS add-on/ Access Engine (ODBC) 8 Cores

Page 96 of 96

Related documents