• No results found

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

N/A
N/A
Protected

Academic year: 2021

Share "Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)

Operates more like a search engine

than a database

Scoring and ranking IP allows for

fuzzy searching

Best-result candidate sets returned

Contextual analytics to correctly

disambiguate entities

Embedded inside the database

No need for Hadoop or

custom-code analytics

True real-time analytics – done

per transaction and in aggregate

On-the-fly linking IP

A new kind of in-memory platform,

built for in-memory applications

Proprietary compression enables

in-memory at scale

Datasets reduced to 16% of

original size

(4)

1M documents to petabyte scale; streaming, constantly changing data, or more of same type of data

Questions are unique to users; analytics driven by the information

that comes through on the query

Looking for the “best” answer, not a definitive one. Consider how/if/to

what extent data changes.

Need flexibility in the query formation and fuzzy search; DBMS must perform like a search engine

as well as a database

Finch = up to 16% of original size

Need sub-second response times; enabling analytics per transaction.

Need embedded models.

Need storage costs reduced; must run on commodity hardware

(5)

Fraud Detection Monitoring financial transactions to identify patterns that could indicate fraud

Internet of Things Collecting high- volume, high velocity sensor and telemetry data to improve performance, meet customer needs or support new product development

Digital Communication/

Message Traffic Monitoring streaming feeds of message traffic to identify patterns, risks, trends

CRM/Customer Service Engagement Aggregating customer information from multiple sources with different data models to improve the customer experience

Personalization Ingesting clickstream data at high throughput rates to create and refine visitor profiles, serving up relevant content upon each return site visit

Real-Time Big Data Ingesting a streaming feed of data to perform real-time analytics that inform business-criticaldecisions

Cyber Security Protecting data from breaches, theft or misuse

(6)
(7)

Answer

Query Best Answer (derived from

analytic processing) Aggregate Analytics (optional) Candidate Set Compression IP: Makes in-memory feasible at scale

On-the-Fly Linking IP: Enables true real-time

analytics inside Finch

(8)

Analytics Outside the Database

Batch Processing

(Look Up Known, Precomputed Info)

*Predetermined answers to predetermined questions… about things you know you want to know

(9)

Search Today:

(HP Autonomy, Solr, and even commercial search engines)

Query

Candidate Set

Ranked Results

Not in-memory

But FinchDB is.

No analytics

But FinchDB does.

(10)

A question we often encounter is how FinchDB handles streaming data – in

addition to static data – and how it differs from the popular Apache Spark product.

The primary difference is our ability to apply transactional, predictive analytics on

the fly, inside the database – using all available data.

Below is a side-by-side

comparison.

Source: https://spark.apache.org/docs/latest/streaming-programming-guide.html

• Apply predictive models

• Analyze on the fly

• Compute answers

• Go beyond look-up

Models inside the database

(11)

KB Inserts Wires Original Content Corporate Blogs Online Media

Stream Processing

Entity Extraction Queries 33

(12)

Running on a

four-node cluster in AWS

Processing a streaming feed of news with

800,000 documents per day

Disambiguating roughly

10 entities

per document

Leveraging a Person-KB of

500M features

describing 3M unique people

A Geo-KB with more than

30M+ unique places

in the world

And an Org-KB of more than

380M features

describing more than 1.3 million

unique companies, non-profits, governments and criminal organizations.

(13)
(14)
(15)
(16)
(17)
(18)
(19)

Every query has

search specifications

and

scoring/ranking specifications

.

We look at both to return a candidate set.

In an entity disambiguation use case, to do that, we calculate a disambiguation

score, based on:

Name Score

Topic Vector Score

Context Vector Score

Prominence Score

And we do that in less than a millisecond around every event. In this use case,

an “event” is a new document coming into the system.

The same would be true in other use cases. In a cybersecurity usecase, an

“event” would be an attack. In this scenario, you could take what’s happening in

your environment and put that data as part of the query.

Answer

Query Best Answer

Aggregate Analytics

(20)

JSON-style, doc database Not in-memory, no embedded analytics, open-source In-memory, multiple deployment models,

distributed architecture, No embedded analytics

In-memory, HTAP processing use cases Only works on structured data

In-memory, handles unstructured text

As a “data fabric” GridGain takes in SQL, NoSQL and Hadoop-analytic data. FinchDB does on-the-fly analytics inside the database –

meaning the need for Hadoop for could be eliminated altogether. HTAP processing use cases

Only works on structured data. Not true in-memory: uses a built-in, on-demand caching scheme. All transactional operations are done on in-memory data.

Doc database Open source, cannot be cloud deployed/DBaas

JSON-style, doc database, distributed

(21)

References

Related documents

The single bus connection is adopted at the main transformer high voltage side (15 kV), which will then be equipped with an outdoor high voltage vacuum circuit breaker and a

Dr Srivastava: Intravitreal sirolimus 440 µg is being developed as a local treatment for noninfectious uveitis involving the posterior segment.. It is a mammalian target of

Parental anxiety is often a combination of fearfulness about dangers in the world and fears that the child is not yet ready to handle the common tasks which involve separation

U ovom radu istražili smo ishod i preživljenje grafta nakon pojave BK virusne replikacije s razvojem nefropatije nakon desetogodišnjeg praćenja u odnosu na kontrolnu

Once you run the above script hive read data from the Cassandra storage and summarize it, then the summarized data will persist into RDBMS storage to visualize via

This paper presents the EPIC (Evaluating Privacy violation rIsk in Cybersecurity systems) methodology, that is composed of four steps and guides a privacy expert, with the

Extrapolating from the 2012 revision time line, the 2017 revi- sion process is likely to commence as early as 2013 (Figure 1), giving the industry ap- proximately two years

The electrical wiring is carried out to distribute current from a single source of supply to various circuits, such an arrangement is made inside an enclosure called