Operates more like a search engine
than a database
Scoring and ranking IP allows for
fuzzy searching
Best-result candidate sets returned
Contextual analytics to correctly
disambiguate entities
Embedded inside the database
No need for Hadoop or
custom-code analytics
True real-time analytics – done
per transaction and in aggregate
On-the-fly linking IP
A new kind of in-memory platform,
built for in-memory applications
Proprietary compression enables
in-memory at scale
Datasets reduced to 16% of
original size
1M documents to petabyte scale; streaming, constantly changing data, or more of same type of data
Questions are unique to users; analytics driven by the information
that comes through on the query
Looking for the “best” answer, not a definitive one. Consider how/if/to
what extent data changes.
Need flexibility in the query formation and fuzzy search; DBMS must perform like a search engine
as well as a database
Finch = up to 16% of original size
Need sub-second response times; enabling analytics per transaction.
Need embedded models.
Need storage costs reduced; must run on commodity hardware
Fraud Detection Monitoring financial transactions to identify patterns that could indicate fraud
Internet of Things Collecting high- volume, high velocity sensor and telemetry data to improve performance, meet customer needs or support new product development
Digital Communication/
Message Traffic Monitoring streaming feeds of message traffic to identify patterns, risks, trends
CRM/Customer Service Engagement Aggregating customer information from multiple sources with different data models to improve the customer experience
Personalization Ingesting clickstream data at high throughput rates to create and refine visitor profiles, serving up relevant content upon each return site visit
Real-Time Big Data Ingesting a streaming feed of data to perform real-time analytics that inform business-criticaldecisions
Cyber Security Protecting data from breaches, theft or misuse
Answer
Query Best Answer (derived from
analytic processing) Aggregate Analytics (optional) Candidate Set Compression IP: Makes in-memory feasible at scale
On-the-Fly Linking IP: Enables true real-time
analytics inside Finch
Analytics Outside the Database
Batch Processing
(Look Up Known, Precomputed Info)
*Predetermined answers to predetermined questions… about things you know you want to know
Search Today:
(HP Autonomy, Solr, and even commercial search engines)
Query
Candidate Set
Ranked Results
Not in-memory
But FinchDB is.
No analytics
But FinchDB does.
A question we often encounter is how FinchDB handles streaming data – in
addition to static data – and how it differs from the popular Apache Spark product.
The primary difference is our ability to apply transactional, predictive analytics on
the fly, inside the database – using all available data.
Below is a side-by-side
comparison.
Source: https://spark.apache.org/docs/latest/streaming-programming-guide.html
• Apply predictive models
• Analyze on the fly
• Compute answers
• Go beyond look-up
Models inside the database
KB Inserts Wires Original Content Corporate Blogs Online Media
Stream Processing
Entity Extraction Queries 33
Running on a
four-node cluster in AWS
Processing a streaming feed of news with
800,000 documents per day
Disambiguating roughly
10 entities
per document
Leveraging a Person-KB of
500M features
describing 3M unique people
A Geo-KB with more than
30M+ unique places
in the world
And an Org-KB of more than
380M features
describing more than 1.3 million
unique companies, non-profits, governments and criminal organizations.
Every query has
search specifications
and
scoring/ranking specifications
.
We look at both to return a candidate set.
In an entity disambiguation use case, to do that, we calculate a disambiguation
score, based on:
Name Score
Topic Vector Score
Context Vector Score
Prominence Score
And we do that in less than a millisecond around every event. In this use case,
an “event” is a new document coming into the system.
The same would be true in other use cases. In a cybersecurity usecase, an
“event” would be an attack. In this scenario, you could take what’s happening in
your environment and put that data as part of the query.
Answer
Query Best Answer
Aggregate Analytics
JSON-style, doc database Not in-memory, no embedded analytics, open-source In-memory, multiple deployment models,
distributed architecture, No embedded analytics
In-memory, HTAP processing use cases Only works on structured data
In-memory, handles unstructured text
As a “data fabric” GridGain takes in SQL, NoSQL and Hadoop-analytic data. FinchDB does on-the-fly analytics inside the database –
meaning the need for Hadoop for could be eliminated altogether. HTAP processing use cases
Only works on structured data. Not true in-memory: uses a built-in, on-demand caching scheme. All transactional operations are done on in-memory data.
Doc database Open source, cannot be cloud deployed/DBaas
JSON-style, doc database, distributed