• No results found

Summary of Alma-OSF s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013

N/A
N/A
Protected

Academic year: 2021

Share "Summary of Alma-OSF s Evaluation of MongoDB for Monitoring Data Heiko Sommer June 13, 2013"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Summary of Alma-OSF’s

Evaluation of MongoDB

for Monitoring Data

Heiko Sommer

June 13, 2013

Heavily based on the presentation by

Tzu-Chiang Shen, Leonel Peña

ALMA Integrated Computing Team Coordination & Planning Meeting #1

(2)

Monitoring Storage Requirement

„

Expected data rate with 66 antennas:

¾150,000 monitor points (“MP”s) total.

¾MPs get archived once per minute

• ~1 minute of MP data bucketed into a “clob”

¾~ 7000 clobs/s ~ 25 - 30 GB/day, ~10 TB/year

• 2500 clobs/s + dependent MP demultiplexing + fluctuations

¾~ equivalent to 310KByte/s or 2,485Mbit/s „

Monitoring data characteristic

¾Simple data structure: [ID, timestamp, value] ¾But huge amount of data

(3)

Prior DB Investigations

„ Oracle: See Alisdair’s slides.

„ MySQL

¾ Query problems, similar to Oracle DB

„ HBase (2011-08)

¾ Got stuck with Java client problems

¾ Poor support from the community

„ Cassandra (2011-10)

¾ Keyspace / replicator issue resolved

¾ Poor insert performance: Only 270 inserts / minute (unclear what size)

¾ Clients froze

„ These experiments were done “only” with some help from archive operators,

(4)

„ no-SQL and document oriented.

„ The storage format is BSON, a variation of JSON.

„ Documents within a collection can differ in structure.

¾ For monitor data we don’t really need this freedom.

„ Other features: Sharding, Replication, Aggregation

(Map/Reduce)

Very Brief Introduction of

MongoDB

SQL mongoDB Database Database Table Collection Row Document Field Field Index Index
(5)

Very Brief Introduction of

MongoDB …

A document in mongoDB:

{

_id: ObjectID("509a8fb2f3f4948bd2f983a0"),

user_id: "abc123",

age: 55,

status: 'A'

}

(6)

Schema Alternatives

1.) One MP value per doc

„

One MP value per doc:

(7)

„

A clob (~1 minute of flattened MP data):

Schema Alternatives

2.) MP clob per doc

(8)

„ One monitor point

data structure per day

„ Monthly database

„ Shard key = antenna + MP,

keeps matching docs on the same node.

„ Updates of pre-allocated

documents.

Schema Alternatives

(9)

„

Advantages of variant 3.):

¾Fewer documents within a collection

• There will be ~150,000 documents per day

• The amount of indexes will be lower as well.

¾No data fragmentation problem

¾Once a specific document is identified ( nlog(n) ), the

access to a specific range or a single value can be done in O(1)

¾Smaller ratio of metadata / data

(10)

„

Query to retrieve a value with seconds-level

granularity:

¾Ej: To get the value of the

FrontEnd/Cryostat/GATE_VALVE_STATE at 2012-09-15T15:29:18. db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29.18': 1 } );

How would a query look

like?

(11)

„

Query to retrieve a range of values

¾Ej: To get values of the

FrontEnd/Cryostat/GATE_VALVE_STATE at minute 29 (at 2012-09-15T15:29) db.monitorData_[MONTH].findOne( {"metadata.date": "2012-9-15", "metadata.monitorPoint": "GATE_VALVE_STATE", "metadata.antenna": "DV10", "metadata.component": "FrontEnd/Cryostat”}, { 'hourly.15.29': 1 } );

How would a query look like

(12)

„

A typical query is restricted by:

¾Antenna name ¾Component name ¾Monitor point ¾Date db.monitorData_[MONTH].ensureIndex( { "metadata.antenna": 1, "metadata.component": 1, "metadata.monitorPoint": 1, "metadata.date": 1 } );

Indexes

(13)

„

A cluster of two nodes were created

¾CPU: Intel Xeon Quad core X5410.

¾RAM: 16 GByte ¾SWAP: 16 GByte „

OS:

¾RHEL 6.0 ¾2.6.32-279.14.1.el6.x86_64 „

MongoDB

¾V2.2.1
(14)

„ Real data from Sep-Nov of 2012 was used initially, but: „ A tool to generate random data was implemented:

¾ Month: 1 (February)

¾ Number of days: 11

¾ Number of antennas: 70

¾ Number of components by antenna: 41

¾ Monitoring points by component: 35

¾ Total daily documents: 100.450

¾ Total of documents: 1.104.950

¾ Average weight by document: 1,3MB

¾ Size of the collection: 1,375.23GB

¾ Total index size 193MB

(15)
(16)
(17)
(18)
(19)

Schema 1: One Sample of

Monitoring Data per Document

(20)
(21)

„

For more tests, see

https://adcwiki.alma.cl/bin/view/Software/HighVolu

meDataTestingUsingMongoDB

(22)

„

Test performance of aggregations/combined

queries

„

Use Map/Reduce to create statistics (max, min,

avg, etc) of range of data to improve performance

of queries like:

¾i.e: Search monitoring points which values >= 10

„

Test performance under a year worth of data

„

Stress tests with big amount of concurrent queries

(23)

„

MongoDB is suitable as an alternative for

permanent storage of monitoring data.

„ Reported 25,000 clobs/s ingestion rate in the tests.

„

The schema + indexes are fundamental to achieve

milliseconds level of responses

(24)

„

What are the requirements going to be like?

„ Only extraction by time interval and offline processing? „ Or also “data mining” running on the DB?

„ All queries ad-hoc and responsive, or also batch jobs? „ Repair / flagging of bad data? Later reduction of

redundancies?

„

Can we hide the MP-to-document mapping from

upserts/queries?

„ Currently queries have to patch together results at the 24 hour and monthly breaks.

References

Related documents

– Statutory Warranty change to “Major Defect” in a “Major Element” will have retrospective application to building contracts entered into after 1 February 2012 where

This paper focuses on the implementation of the ITIL guidelines at the operational level, improvement of the service desk, and incident, problem, change, release, and

• MongoDB is a document database which stores data in JSON like documents. It is a document database with all the scalability and flexibility required by the developer

Mongo will help me any command gives you are not be imported using saving by specifying a collection schema mongodb from source but on your database easily export.. How to Get

Printed Rowville Library Access Key available at customer service counter.. Pen and paper for exchanging information available at customer

Sign up queries return a variable holds for an object that, videos that they would likely to mongodb use schema variable as field name, status code and add to mongodb driver..

mongostat is a command-line tool that displays a summary list of status statistics for a currently running MongoDB instance: how many inserts, updates, removes, queries,

After the introduction of the 7-valent pneumococcal conjugate vaccine (PCV7) in Alaska, the incidence of invasive pneumococcal disease (IPD) due to non-vaccine serotypes,