• No results found

visual interface dependency or relationship model of systems workflow engine comparator function poller

N/A
N/A
Protected

Academic year: 2021

Share "visual interface dependency or relationship model of systems workflow engine comparator function poller"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Metric Store

poller

dependency or relationship model of systems

agent discovery

engine

visual interface

comparator function workflow

engine

alert / eventalert /

eventalert / event

agent agent

(2)

Model of system

web interface

web application interface

polling/threshold/state system

event management system

http (REST) interface

Poller state repository (time series data)

http (REST) interface

Poller state repository (time series data)

Poller state repository (time series data)

Poller state repository (time series data)

Poller state repository (time series data)

email notification ticket creation

(3)

Model of system

(logical and physical connectivity represented as relationships)

http (REST) interface

entities in the ststem model have addressable references to pollers and state data

Poller state repository (time series data)

http (REST) interface

Poller state repository (time series data)

Poller state repository (time series data)

Poller state repository (time series data)

Poller state repository (time series data)

Entities modeled include multiple relationships including depends-on, contained-with, etc. Entities also have references to anything that we're interested in storing externally - typically that data will be set to be gathered by a poller (pull data) or just stored (push data). The polling system should be set up to be able to dynamically create, delete, and update pollers as desired. Templates of pollers for a common configuration may able be invoked and set up on an entity in the system model.

The system model would include things like operating system image, java application, apache, mysql, network interface, disk file system, etc. Ideally the model should be discoverable and build-able automatically. The system model is expected to be reasonable complex, but not have significant needs for scale. It is the object database of all "interesting" elements in our computing system that we're paying attention too - from operating system metrics to application

performance. Since other components of our overall system are expected to need to scale horizontally, we reference those components through a REST style interface.

(4)

pass-through state repository (time series data) Poller state repository

(time series data) Poller state repository

(time series data) system object

(application, service, etc)

http (REST) interface

agent operating system

(container)

http

(REST)

interface

entities in the ststem model have addressable references to pollers and state data

A poller and it's associated state data should be an addressable entity (REST interface) that is referenced and resolved from a system that manages the architecture of the environment being monitored. The pollers may also have identified thresholds or other means of generating events that are expected to flow external to this monitoring system.

The polling system is expect to exhaust internal resources at some point and therefore needs to be expandable to multiple instances that can all be equally referenced. Bonus would be to have the polling systems have knowledge of each other and be able to forward references/data as needed, migrating pollers and associated state repositories to balance load across multiple systems.

event flow

The pollers may have identified thresholds or other calculations which trigger events - or perhaps every request triggers an event. In either case, there will be an expected event stream leaving the polling systems. Potential protocols for sending include XMPP or open source message queues.

The pass-through mechanism is a special case enabling the basic storage of values/state data without having this polling system request it.

(5)

mon req TIME state/persistence Poller state/persistence mon req state/persistence Poller Poller mon req mon req mon req mon req mon req mon req mon req mon req

The poller needs to inquire to the state of all "monitors" that need to be fulfilled, which will grow over time (i.e. the

expectation that you update a given monitor every 5 minutes).

When the poller is complete with a given "monitor" request, it should return the state to the persistence element.

If the poller goes offline, or dies, then any elements that it's working on won't get fulfilled, and the number of

outstanding items to be updated will grow.

If we populate a queue with all elements to be updated, then we'll need to

manage that queue so that we don't end up with duplicate elements in that queue in the event of poller failure. We need something more like a common "ticket board" than a queue structure.

Most queue systems are a push model: beanstalkd, celery, gearmand. What's needed for this use case is a pull model. Celery, although the most complex, comes closest to providing that basic mechanism.

(6)

Poller

SNMP

JMX

HTTP

script

a poller (monitor) is a block of code that runs concurrently with other components within the system, iterating on an expected cycle time. Common poller interfaces should be expected to include SNMP, JMX. Taking advantage of Nagios plugins as an

implementation component would speed adoption and provide a quick set of flexible plugins.

The poller should ideally be able invoke and run any arbitrary script - ruby, python, perl, bash, java, etc. in order to get its results.

state repository (time series data)

the state repository should be a fast dumping ground for the results of the poller - whether it's an enumerated state or a value. The values

should be able to be very quickly updated and localized to be associated with the poller to

minimize latency and potential failure. The overall interface of the polling systems should be expected to create, delete, and modify aspects of the poller as well as request data from the state repository, including potentially a graph of the state. Being able to export state data in bulk for a predefined period of time would allow the export into a batch-processing environment for further correlation and detailing (i.e. Hadoop/Hive/Pig) and potential correlation with other data sources (logging). Otherwise a polling system should be treated as a sharded database of time series information, with the system requesting data asking all relevant systems what it needs to know and fusing the resulting data appropriately.

The REST interface for the polling subsystem should include a basic web interface sufficient to interact with the system through a web browser, including functionally testing the components.

(7)

The event management system is the least fleshed out, but where the most opportunity exists from a value standpoint outside of convenience and scale. In existing solutions, the event management system is where events are displayed with some priority,

typically tagged in various ways to be more useful. Events other than polling/threshold results can be processed, including system logging (syslog).

Many event management systems include a mechanism to create external notifications (send email, page, etc) and potentially manage escalation related to that event. In the event that the system was expected to handle escalations, the event management system would need to also manage event state.

Advanced monitoring would enable a triggering of a bit of specialized code (specific and customized to any given instance) that would enable the checking of state of other

related components (using the system model and associated state referenced from it) to produce more actionable events, including supressing duplicate events and the

generation of tickets into an existing ticketing system for resolution.

event flow

email notification email notification

email notification email notification

ticket creation

?

The potential exists to have the event management system either ride atop or simple be a big-data style blackboard (NoSQL - HBase/Hadoop, Voldemort, etc) enabling

interesting correlations and system analysis capabilities with another component external to the event management system. For basic purposes, simple external notification and basic deduplication/alert supression based on system dependencies would be sufficient.

(8)

poller

alert / event raw

data agent

log

comparison

log poller

alert / event raw

data agent

log

comparison

log

escelation engine log

function

flap detection simple dependency

analysis

poller

alert / event raw

data agent

log

comparison

log

escelation engine function

event correlation dependency analysis

flap detection event deduplication environment

(9)

poller

alert / event raw

data agent

log

comparison

log log

escelation engine

(10)

poller

alert / event raw

data agent

log

comparison

log

escelation engine function

flap detection simple dependency

(11)

poller

alert / event raw

data agent

log

comparison

log

escelation engine function

event correlation dependency analysis

flap detection event deduplication environment

(12)

gearmand beanstalkd

MySQL

django application (cron) get monitors

needing to be polled

polling work requests

polling result data

Event Processor

Poller

Poller

RRD RRD

RRD RRD

monitor result state

performance information & time series data

MSG MSG

MSG MSG

event messages (email, sms, etc)

This arrangement adds a bit more complication - the work and result queues - but enables the poller code to be anywhere and agnostic. The benefit is scaling out pollers onto other systems requires no state infrastructure so the pollers can be completely agnostic and exist only when needed. The queue mechanism allows the state to be stored off separately and dealt with as desired.

The core application

A few potential optimizations:

1) enable work request queue's for specific/custom pollers. The first to come

immediately to mind would be something like "fping", which deals with ping results in a far more efficient manner, but would need a different input & output structure than your stock "monitor/poller".

2) the event process also enables a modular interface that can be distributed and use whatever data state mechanisms that are effective. You could store time series data into RRD files, shove everything into HBase, or stash some components in one data store, other in another. The interface is also perfect for accepting the raw values and doing additional computational logic - either predictive functions based on earlier state, simple (stateless) thresholds, or even event correlation with the additional of an

(13)

MySQL

django project

(cron) get monitors

needing to be polled

Event Processor

RRD RRD

RRD

performance information & time series data

MSG MSG

MSG MSG

event messages (email, sms, etc)

Another take on the application structure, setting up django

"applications" (subcomponents) as the elements and combining them into a single virtual machine that exports all the relevant API's as RESTful services.

The main components include:

* Work "queue" board - a view into the monitors and state stored within the application that shows what's pending an update. The queue board is set up something like an amazon SQS server or ghetto queue where an item can be reserved for something external to process, hiding it from view until an expiration period has elapsed.

* Poller - the API is a very lightweight control and information system to the status of the poller, the main task of which is offline processing outside of the scope of user

requests. The poller will check the work-queue board for anything pending, pull down a set and process away, returning the results through the event processor. Note that the poller itself is an independent process from the Django application.

* Event processor - the API by which events returned through the work-queue board can interact logically. The event processor can be a worker process for an internal queue from the work-queue board to process the results as they're received by a poller.

cache (memcache) work 'queue'board poller control

event control/api

References

Related documents

Further, by showing that v τ is a modular unit over Z we give a new proof of the fact that the singular values of v τ are units at all imaginary quadratic arguments and obtain

UPnP Control Point (DLNA) Device Discovery HTTP Server (DLNA, Chormecast, AirPlay Photo/Video) RTSP Server (AirPlay Audio) Streaming Server.. Figure 11: Simplified

An analysis of the economic contribution of the software industry examined the effect of software activity on the Lebanese economy by measuring it in terms of output and value

– Acid activation forms chlorine dioxide in batch Acid activation forms chlorine dioxide in

We believe that a third category for extremely detrimental, high risk firms should be charged higher fees and levies to reflect the FCA’s disproportionate supervision

2013 Graduate Association,African American History, University of Memphis, October 24-25 2013 Third Global Conference on Gender and Love, Oxford University, September 13-15 2011

12 Data Science Master Entrepreneur- ship Data Science Master Engineering entrepreneurship society engineering. Eindhoven University of Technology

[r]