3.4 Applications
3.4.3 Cloud Fusion Center
The Cloud Fusion Center (CFC) performs the fusion-level hypothesis test defined in (3.2) and administers the network. In devising a system to serve as a logically central location, we evaluated building our own server network, using virtualized remote servers, having collocated servers, and building our application to work on Google App Engine. App Engine was chosen as the platform for the CFC for several reasons: easy scalability, built in data security, and ease of maintenance.
The App Engine platform is designed from the ground up to be scalable to an arbitrary number of clients. As we expect to grow our sensor network to a very high sensor density, this element of the platform’s design is very important. What makes the scalability of the platform easily accessible is the fact that incoming requests are automatically load-balanced between instances that are created and destroyed based on current demand levels. This reduces algorithmic complexity, as the load balancing of the network is handled by the platform rather than by code written for our CSN system.
A second consideration in our selection was data security. With the other solutions we had available to us, if the data we collected was stored on the server network we were using, then, without redundant servers in separate geographies, we risked losing all of our data to the very earthquakes we hoped to record. App Engine solves this problem for us by automatically replicating datastore writes to geographically separate data centers as the writes are processed. This achieves the level of data redundancy and geographical separation we require, without forcing us to update our algorithms. Other network storage solutions would have been possible as
well, but having it built into the platform meant that latency for code accessing the storage would be lower. A final compelling reason to select the App Engine was its ease of maintenance. Rather than spending time building server images and designing them to coordinate with each other, we were able to immediately begin working on the algorithms that were most important to us. Server maintenance, security, and monitoring are all handled by the App Engine team and do not take away from the time of the research team members.
App Engine also includes a number of other benefits we considered. First, it utilizes the same front ends that drive Google’s search platform, and, consequently, greatly reduces latency to the platform from any point in the world. Since we plan to expand this project beyond Southern California, this is very useful. Second, the platform supports live updates to running applications. Rather than devising server shutdown, update, and restart mechanisms as is commonly required, we can simply redeploy the application that serves our sensors and all new requests to the CFC will see the new code instead of the old code with no loss of availability.
All of these features do not come without a price, however. We will discuss what we perceive as the two largest drawbacks of the platform: loading requests and design implications.
Loading Requests. Because App Engine dynamically scales the number of available instances available to serve a given application as the volume of requests per unit time changes, it creates a phenomenon known as a loading request. This request is named in this manner because it is the first request to a new instance of the application. That is, when App Engine allocates a new instance to serve increasing traffic demands, it sends an initial live request to that instance. In Java, this results in the initialization of the Java Virtual Machine, including loading all of the appropriate libraries.
Over the last three months, we experienced loading requests with a median frequency of 9.52% of all requests. While this means that 90.48% of requests did not experience increased latency as a result of the platform, the remaining requests experienced a median increased processing duration of 5,400 ms. Because of the extreme penalty paid by loading requests, when examining average request duration, their presence dominates the figures. This results in a unique property of App Engine, which is that the system performs much better at higher request loads.
Fig. 3.7(a), shows that, as the request volume increases, the average duration of each request decreases. This is a result of a reduced impact of loading requests. This data leads to the conclusion that if we avoid potential bottleneck points such as datastore writes, we can expect our performance to stay the same or get better for any increased future load imposed on the system (e.g., as the number of sensors scales up).
Design Implications. When designing an algorithm to run on App Engine, the algorithm has to fit inside of the constraints imposed by the architecture. There are a few factors to consider. First, as a result of the automatic scaling done by App Engine, every request to the system exists in isolation. That is, the running requests maintain no shared state, nor do they have any inter-process communication channels. Additionally, since there are no long running background processes, maintaining any form of state generated as a result of successive calls is more difficult. In order to accurately ascertain the number of incoming picks in a unit time
!" #" $" %" &" '!" '#" !" (!" '!!" '(!" #!!" #(!" )!!" )(!" !" #$ %& #' () *+' , -$ %. /0 '12 3' 4-56#$'/7'()*+'8#9-#2:2')0'%0';/-$'
(a) Pick duration (b)Associator (c)Android Client
Figure 3.7: (a) Average duration of a pick request as a function of system load. (b) Model of dispersed sensors using a hash to a uniform grid to establish proximity. (c) A picture of the CSN android client in debug mode, capturing picks.
over a specified geography, we had to surmount these hurdles.
The only common read/write data sources provided are memcache (a fast key value store) and datastore. The datastore is a persistent object store used for permanent data archiving for future analysis or retrieval. Long term state which changes infrequently, such as the number of active sensors in a given region, is stored and updated in the datastore, but cached in the memcache for quick access. Due to its slower performance, particularly in aggregating writes for immediate retrieval by other processes, it is unsuitable for short term state aggregation.
Short term state, such as the number of picks arriving in an interval of time in a particular region, is stored in memcache. While memcache is not persistent, as objects can be ejected from the cache due to memory constraints, operations that utilize the memcache are much faster. Memcache is ideal for computations that need to occur quickly, and, because memcache allows values to set an expiry time, it is also perfect for data whose usefulness expires after a period of time. That is, after a long enough period of time has passed since a pick arrived, it can no longer be used in detecting an event; therefore, its contributed value to the memcache can be safely expired.
Memcache operates as a key value store, effectively a distributed hash table. In order to determine how many sensors sent picks in a given period of time, we devised a system of keys which could be predictably queried to ascertain the number of reporting sensors. We used a geography hashing scheme to ascribe an integer value to every latitude/longitude pair, which generates a uniform grid of cells whose size we can control, with each sensor fitting into one cell in the grid (see Fig. 3.7(b)). Incoming picks then update the key corresponding to a string representation of the geographical hash and a time bucket derived by rounding the arrival time of the pick to the nearest second.
In this manner, independent processes aggregate their state, and each process runs the hypothesis testing algorithm of Section 3.2 in the cell whose state it updated to determine the value ofEbt. IfEbt= 0, then no
action needs to be taken. IfEbt= 1a task queue task is launched to initiate the alert process; the task is named
using the hash values that generated the alert. Each named task creates a ’tombstone’ (a marker in the system) on execution which prevents additional tasks with the same name from being created, so even if successive picks also arrive at theEbt= 1conclusion, we know that only one alert will be sent out for a given set of
inputs.