• No results found

BDA Technologies & Selected Case Studies

N/A
N/A
Protected

Academic year: 2021

Share "BDA Technologies & Selected Case Studies"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

19th January 2015

BDA Technologies & Selected

Case Studies

Ettikan Kandasamy Karuppiah (Ph.D),

Principal Researcher & Director of Accelerative Technologies Lab MIMOS Berhad

SEMINAR INTERNET COMPUTING TECHNOLOGY “Theme: Delivering Values From Hyperconnectivities”

(2)

2 Big data is defined by the high volume, velocity, variety, veracity and value of data which are

generated every second, minute, hour, day….by device, human etc

Turning big data into

Value

ECONOMIC BENEFITS GOVERNMENT BENEFITS SOCIETAL BENEFITS

VOLUME

Growing data 90% of world’s data generated over last 2 years

VELOCITY

Increasing data 175,000 tweets per second

VARIETY

Broadening data

80% of the world’s data is unstructured (text, geospatial, audio, video)

VERACITY

Establishing the of big data

sources

Big Data technology allows us to establish quality and

accuracyespecially in unstructured data

(3)

Big Data Computing in ICT Sector

The Malaysian ICT services sub-sector has huge potential growth, with a

projected share of

35%

in the nation’s Digital Economy in 2020...

Requires Transformative

Platform

Source: MDEC, as taken from APeJ Big DataMaturityScape Assessment 2013 by IDC

Software Solutions and Support is the Key GDP Contributor

B u s i n e s s V a l u e

(4)

Data Modeling & Visualization

for PDRM Workforce

Planning & GPGPU Data Security Library

MIMOS BigData Technologies R&D

Establish work on General Purpose

Graphics Processing Unit for text manipulation, Hadoop Trainings MultiCore Java Compiler Acquire Train Conducted Workshop, Hadoop Programming training to Malaysian Research Community Collaboration R&D MiAccLib Cleansing MiAccLib Finance Data Cleansing Engine for PERKESO & Data Warehouse for PERKESO MiAccLib Algo/Map nVidia COE for

GPGPU Established MiAccLib Crypto Sentiment Analysis Model & Data Modeling & Data Warehouse for PIK MOH & GPGPU Video

Data Analytics Library R&D Data Encryption/Decr yption for National Data Protection MiAccLib Video GPU Accelerated Libraries for Data Cleansing & Financial Risk Modeling MiAccLib BigData Accelerated Libraries for Database Accelerator Library (Galactica)

2014 MIMOS Berhad. All Rights Reserved. 4

GE’13 Electoral Roll Analysis with Hadoop & GPU

MiAccLib Cleansing

ESRI Inc/US Mou Established

Acquire Train Intel Malaysia

/US MoU /US/Europe MoU AMD Malaysia

High Risk Profiling, Illicit, Taxable & Drugs

Detection (PoC)

MiAccLib Image

RM10 ->

Foundation & Early Adaptation for Heterogenetic Computing

RM11 -> Maturation & Progressive

Deployment of Scalable Heterogenetic Computing

(5)

Assisting Both Government & Private Sector

Needs

Private Sector to Go Global

National Public Sector

Source : MDeC

DECISIONS REQUESTED

FCC is requested to:

1. Take note of data science upskilling for civil servants

2. Take note of MAMPU developing the Government Open Data framework by 2015

3. Endorse the DG Lab on BDA to identify use cases and pilot projects that address societal wellbeing

4. Take note of MIMOS defining and developing the Big Data technology platform for Government by 2015.

5. Mandate opening up of all relevant data (Open/Non-Open) to the DG Lab on BDA for the pilot projects

Rahsia Besar Rahsia Sulit Terhad Terbuka

Opening Up Non-Sensitive Government Data

Policy for all government agencies to open up data categorised under terbuka

o E.g. - non-sensitive data like meteorology, transport timetables and pricing of essential goods based on Open Data criteria

+

(6)

Developing BDA Open Innovation Platform

An open-innovation platform between Government, businesses and Rakyat to improve e-participation and user satisfaction. Prioritization through the development of high impact, low-cost, demand driven life-event solutions

POCs, pilots &

apps

Secure environment (sandbox) for Government Data

BDA DG (Digital Government) LAB Exper

tise - Community Data - Government Data Project Sponsor

Sector-specific use cases /life-events: eg. Welfare, Education, Healthcare, Transportation

BDA Technology Platform

DATA OUTCOMES

Open Data

(7)

DATA Community Government

Research & Development on KEY Data Extraction, Processing & Analytics Components

i. National Data Sovereignty

ii. Trusted Data

iii. Secured Data

Localized Entity (ie. MIMOS, Cybersecurity) Key Values Data Visualization Data Staging Cleansing Harmonisation Anonymisation

Data Model & Analytics Security Infrastructure Management Data DB Store Data Extraction Traceability Machine Learning - Malaysian Context - (BM, English, Chinese, Tamil) Accelerated Computing Secured Cloud Services Visualization - Malaysian Perspective

(8)

8 Mi-Cloud Mi-Harmony Mi-UAP Mi-Mobile Mi-MOCHA Mi-Helio Mi-Morphe Mi-Harvester Mi-CLIP

Mi-Doc Mi-Scrambler Mi-Portal

Mi-BIS Mi-ARMC

Mi-Trust

Mi-SP (Video Analytics) Mi-STP Mi-Target Mi-HPDW Mi-AccLytics Mi-DSS Mi-AccLib Mi-Trace Mi-ROSS Mi-DW Mi-Market Galactica Customization

3

rd

Party Systems & Hardware

Data Security

Data Extraction

Data Staging

Data DB Store

Data Visualization

Data Model & Analytics

Security Data Management Infrastructure Management Traceability Cleansing Harmonisation Anonymisation Data Source Structured + Open Linked Data Unstructured

Applications

(9)

Extracting Value from Data

Data Sharing Data Visualization Scrambled database & Datamarts Granular Primary Database Data Anonymisation Published Data Marts Harmonisation Data Harmonisation Harmonisation Terminologies Cleansing Data Cleansing Data Correction Staging Data Data Harvesting UnStructured Data Sources Structured Data Sources

Virtualized Platform & Integrity Manager

Mi-CLOUD + Mi-Mocha Unstructured Data Collector Mi-Clip Data Harmonisation Mi-Harmony + Mi-Semantics Detect Correction Exception Mi-Morphe + Mi-AccLib Data Anonymisation Mi-Scramble + Mi-Crypto + MiAccLib Authentication & Authorization Mi-UAP Mi-ARMC

Data Warehouse Platform

(Mi-Galactica, Mi-AccConnect, Mi-HPDW) Data Modeling

2014 MIMOS Berhad. All Rights Reserved. 9

Data Statistics Mi-AccStat Sentiment Analytics Mi-Intelligence; Mi-NLP Data Visualization Mi-HELIO;

Mi-BIS Data Analytics

Mi-Portal Social Network Analytics Mi-Visualitic Knowledge Harvester (LOD) Mi-Harvester Data Analytics Mi-HPDW Data Analytics Data Analytics Mi-Target

(10)

10 Mi-Cloud Mi-Harmony Mi-UAP Mi-Mobile Mi-MOCHA Mi-Helio Mi-Morphe Mi-Harvester Mi-CLIP

Mi-Doc Mi-Scrambler Mi-Portal

Mi-BIS Mi-ARMC

Mi-Trust

Mi-SP (Video Analytics) Mi-STP Mi-Target Mi-HPDW Mi-AccLytics Mi-DSS Mi-AccLib Mi-Trace Mi-ROSS Mi-DW Mi-Market Galactica

New Platforms & Revisions

Technology Challenges Ahead (11

th

Malaysia Plan)

NEWER Channels

of Consumption

(eg. Omni channel data market)

NEWER Sources

of Data

(eg. high speed streams)

NEWER Methods

of Visualization

(eg. Multi dimensional view)

NEWER Paradigms

on Computing

(eg. Dockers) Technology Pull Techn olog y Pu sh

(11)

11 IoA Internet of Anything II Industrial Internet IoE Internet of Everything IoT Internet of Things

(12)

12 IoA Internet of Anything II Industrial Internet IoE Internet of Everything IoT Internet of Things Software Defined Network Big Data Processing Mobile Systems Wearables Cloud Computing Cyber-biological systems Cyber-physical systems Internet of Humans

(13)

Open Platform & BDA Middleware Architecture

Data Extraction Flume Mi-Clip Mi-Harvester Mi-Morphe Structured, Semi-structured & Un-structured Data Sources Open Linked Data Web & Social Media RDBM S Files Sqoop Data Model Mi-HPDW Kafka Data Cleansing Mi-Morphe Mi-AccLib Data Anonymisation Mi-Scramble Data Harmonization Mi-Harmony Data Source Mi-Crypto Mi-AccLib RDBMS Galactica FS HDFS, NoSQL

Galactica Hadoop Data warehouse / Data mart Data Storage Mi-HPDW STORAGE Infrastructure Mi-Cloud Mi-Mocha Galactica YARN Mi-AccConnect

Pig Hive Impala Shark

Galactica Connector

R Mahout ML-Lib (Spark) Mi-NLP

Mi-AccStat Mi-Helio

Mi-BIS Mi-Portal

Data Visualisation

Data Analytics Tools (Machine Learning)

Mi-UAP Data Security Mi-HPDW Mi-HPDW Mi-HPDW Mi-Target GIS

Apache Drill | Spark/Shark | Hue Cloudera

Search & Solr

RDF Graph DB Mi-Intelligence Cloudera Manager/ Falcon Zoo Keeper Oozie Sentry Data Management Data Staging MIMOS Solution 3 rd Party Solution Mi-Trust Mi-Visualitics

(14)

(Data Sources Type)

RDBMS

Streaming(twitter, logs, etc) NoSQL Data Type

Stream

Spark | Kafka | Spring XD & Storm

Search Cloudera Search & Solr

Application Program Interface Thrift | REST | Java API | AVRO

Management YARN (resource management)| Big Data Orchestration Engine/Layer | Zookeeper (configuration and synchronization) Oozie (work flow scheduler) | Cloudera Manager | Management for Luster

Storage HDFS | HPDW-Storage |Galactica FS | NoSQL (Hbase)

Distributed Database (Cassandra) |RDBMS (Postgress, MySQL)

Visualization Mi-Helio | Mi-Portal | Mi-BIS

(Mi-AccConnect)| 3rd Party Apps

Batch Query

MapReducev2 | Pig | Hive

Real Time Query

Mi-BIS with Impala throughMi-AccConnect

Hue| Galactica | Apache Drill | Spark/Shark |

HPDW-BigData DB

Machine Learning

Mi-BIS (Weka) | Accstats (R and Cloudera C++)

ML-LIB(Spark) | Revolution R, Weka

Processing

Mi-Morphe | Morphlines| Mi-Acclib MapReducev2(Accelerated ETL)

HPDW Data Model Plugin

(For MiMorphev3/Pentaho)

Analytics

Simulator | Planning Tool | Predictive Prescriptive | Prediction Algorithm

Mi-BIS (Mi-Accstats) Mi-BIS (Data Mining)

Revolution R 3rd Party GIS 3rd party Legend: Security a nd Authent ica tion Sent ry | Mi -UAP | Mi -A R MC | Mi -T rust Data Management Sqoop | Flume

MIMOS BigData Stack With Reference to Hadoop Stack

Multi & Many Cores Processors(CPU + GPU)

Complete 3rd Party 3rd Party & MIMOS Offering MIMOS Technologies

(15)

15

Proof of Concepts

Selected Use Cases

(16)

16

Proof of Concepts

-Mixed Scenario-

(17)

17

Challenges to be Addressed

During Initial Roll-Outs

(18)

Data Challenges (Stage 1)

• Data is stored in partial & distributed locations

• Format of data both in digital & non digital while some are in paper based format

• Incomplete data set (Q issues) • Cleanliness of the data

– Missing values, Random, Non-Random, CR, Noise – Cleaning while maintaining integrity & ‘value’

• Extracting the ‘features’

• Data in plural languages (at least English & Malay) • Structured has longer historical value to be acquired

– Data storage media & format for extraction and usage

• How to authenticate the key values? Where is the reference point?

• As for unstructured data (e.g social media), current technology is adequate to support the pre-processing, analytics…

– With some local challenges

• Who are the data owner? How to ensure the security level of the data for sharing? PDP compliance confusion ….

(19)

Analytics Challenges (Stage 2)

• Tools are available but right approach is still critical for evaluation • Which are the best/right algorithms to be used?

• Can you identify the right ‘domain expert’ within the organization? • Who are the local ‘domain experts’ to be consulted for the

methods/algorithms selection?

– You may not have data scientist in specific gov. organization, but how to form one (external + internal) -> ‘analytics team’

• What exactly are the data owners ‘business needs’?

– Why do they need to do this?

– Headache for them…best to leave the data to ‘rest in peace’  !!

• Which data to be included and which to be excluded, what to be ‘anonymized’?

– concern of ‘meaning/trend’ extraction

• Plurality of languages & interpretation accuracy

– ‘Semantification’ of the language specific analytics

• Bottlenecks to be identified and accelerated approach required for the specific processing

(20)

Results Challenges (Stage 3)

• Visualization of the results in simple, ‘action-able’ and ‘communicable’ • how to handle continuously changing analytics (and the results) due to

– New data inclusion

– New ‘domain expert’ inclusion

– New additional factors to be considered

• Who validates the results?

• How to translate results to value – for (gov) organization • How to translate the ‘value’ to actions?

(21)

Benefiting Humanity Through Technology

References

Related documents

In Chapter 3, we studied microstructural and mechanical properties of MMNCs processed via two in situ methods, namely, in situ gas-liquid reaction (ISGR) and

The first term in ) captures the supply response of deposits to changes in the bank’s risk profile. Basically this means that payments to the deposit insurance system increase

(2018) How Will the Chocolate Industry Approach Cocoa Farmer ‘Living Income’?, 3 May, www.confectionerynews.com/Article/2018/05/03/

In keeping with the ILO’s global estimate classifications, child labour in domestic work statistically includes: (i) all children aged 5-11 years engaged in domestic work;

Rather, this thesis seeks to underscore policymakers’ need to recognize and understand the impact of three new challenges on Sub-Saharan African food security: land

The business strategy governing the selection of requirements to implement in the next release of a packaged software product may very well have the result that some

The article begins by describing and evaluating the key nongovernmental labor regulatory programs in the United States and Europe: the Fair Labor Association, the Worldwide

Rm4 UG06 (=No. 95eb) Pen arm with guidance. 95ed) Float with guide rod, spare. 95eg) Float vessel with float pen arm. 95k4) Collecting vessel, spare, capacity 4.5