Classification Based IP Geolocation Approach to Locate Data in the Cloud Datacenters

(1)

Classification Based IP Geolocation Approach to Locate Data in the

Cloud Datacenters

Biswajit Biswal College of Engineering Tennessee State University,

Nashville, TN, USA Email: [email protected]

Sachin Shetty College of Engineering Tennessee State University,

Nashville, TN, USA Email: [email protected]

Tamara Rogers Computer Science Tennessee State University,

Nashville, TN, USA Email: [email protected] ABSTRACT

Cloud subscribers would like to verify the location of outsourced data in the cloud datacenters to en-sure that the availability of data satisfies the Service Level Agreement. Cloud users may not have access to their outsourced data in the event of operational failures in datacenters or occurrence of natural disas-ters and/or power outages. Recently, IP geolocation techniques have been proposed to locate data files in cloud datacenters. However these techniques exploit relationships between Internet delays and distance and are not extensible to incorporate different net-work measurements, which may be used along with Internet delay to improve accuracy. Also, most of the existing techniques have only been validated with one cloud provider (Amazon Web Services). In this paper, we propose a classification based IP geoloca-tion algorithm, which incorporates multiple network measurements to improve the accuracy of geolocat-ing data files in datacenters in four commercial cloud providers. To demonstrate the accuracy of our ap-proach, we evaluate the performance on Amazon Web Services, Microsoft Azure, Google App Engine and Rackspace. Our experimental results demonstrate that our approach is geolocating data files accurately and more closely to the true location.

I INTRODUCTION

Cloud data owners wish to audit how their data is being handled at the cloud and, in particular, ensure that their data is available all the time without be-ing affected by cloud outage. There is no such formal auditing tool available to cloud users to verify the lo-cation of data in cloud data centers. Recently, there have been numerous instances wherein cloud outage rendered business services inaccessible to cloud users for a significant period of time. For instance, on Dec 24, 2012, Netflix services, hosted on Amazon Web Services (AWS), were unavailable to customers for more than 12 hours [14]. Similar data outages were reported by Dropbox users [15] and Xbox live

users [16]. Cloud users ﬁles on Dropbox are stored on Amazon’s Simple Storage Service (S3) in multi-ple data centers located across the United States. Xbox live users data and saved games are accessi-ble through Windows Azure storage service. Hence, there is a increasing interest among cloud users for an auditing tool to geolocate cloud data to ascertain the availability of outsourced data.

Typically, cloud users specify the QoS requirements of their outsourced data in a Service Level Agree-ment (SLA). In addition to common QoS require-ments (bandwidth, delay, etc), cloud SLA also specify the geographic region of a service at various granu-larities (county, city, state, country). Cloud users can benefit from an auditing tool which can ensure that the location restrictions on the services are not violated. Availability of outsourced data is one of the critical QoS requirements. The auditing tool can verify if the availability of data is in accordance with the SLA, especially in situations like natural disas-ters and /or power outages and more over most of the cloud datacenters located in cities which are most vul-nerable to natural disasters. The availability can be verified by geolocating cloud user data in datacenters. Recently, few efforts have been proposed for geoloca-tion of cloud data. Gondree et al. [4] proposed the constraint-based data geolocation(CBDG) approach to geolocate data hosted on Amazon S3. Benson et al. [1] developed a model to verify how many copies of their data are present in cloud datacenters. CBDG extends CBG geolocation technique, whose average accuracy is less as compared to Enhanced Classifier approach [2]. Benson’s model does not focus on ge-olocating data on multiple cloud datacenters and has been verified on only one service on AWS.

In this paper, we present an approach based on a machine learning framework, which improves the av-erage accuracy of geolocating datacenters as com-pared to prior measurement-based approaches. We verify the accuracy of the approach by evaluating the geolocation results of cloud user data for multi-ple services in datacenters across multimulti-ple commercial

(2)

cloud providers. We extend our enhanced classifier [2] called Classification Based IP Geolocation (CBIG) to geolocate Internet nodes to geolocate cloud data. To the best of our knowledge, this is the first effort to verify the location of cloud data for storage and web services on multiple cloud datacenters: AWS [6], Microsoft Azure [7], Rackspace [9] and Google Ap-pEngine [8].

The rest of the paper is organized as follows. In Sec-tion II we provide an overview of the state-of-the-art active cloud data geolocation schemes. In Section III we present our approach based on the CBIG. In Sec-tion IV we present the experimental setup. In SecSec-tion V, experimental results of verifying our approach on multiple cloud datacenters is presented. We ﬁnally conclude in Section VI.

II BACKGROUND AND RELATED WORK Several geolocation schemes have studied for ﬁnd-ing various geographic granularity such as latitude and longitude, street address, zipcode and city or county of a given IP address. IP geolocation have approached through either Database query (passive method) or active network measuments to geolocate IP addresses. There are available public database such as ARIN [24] and proprietary databases such as Maxmind [25] and Neustar(formerly Quova) [26]. In-formations obtained from database query sometime are not to our expectation and may be not suitable for security-sensitive applications.

Active network measurements have adapted in many research geolocation algorithms. Topology based ge-olocation scheme proposed by E.Katz-Bassett et al. implements traceroute measurements (i.e. hops and RTT between hops along the path to the destination). The accuracy of the solution is not reliable due to net-work dynamics. More accurate and real time delay-based geolocation algorithms are studied for geolo-cation such as CBG, Statistical and Learning-based. CBG [21] uses known landmarks to measure latency to a target, a bestline method for conversion of la-tency to distance and multilateration technique to geolocate the target. However, CBG has geolocation errors when monitoring landmarks are far from tar-get. Also this method fails to consider the statistical variations in network delay measurements. Statistical geolocation [20] employs a joint probability desnsity function of network latency and geographic distance to geolocate target. Learning-base [3] uses network delay and hop measurements as input to a machine

learning Naive Bayes framework to geolocate inter-net hosts. All the delay-based geolocation schemes assume that network delay is well correlated with geo-graphic distance but this assumption is voilated when network traﬃc takes indirect route between hosts. This results low accuracy. Improved accuracy is ob-tained by combining delay with other network mea-surements.

Recent reported instances of cloud outages affecting availability of cloud services have increased the in-terest in cloud data file geolocation. For example, in the past year, there have been instances when cloud users were not able to access services offered by online movie streaming provider Netflix and stor-age provider DropBox. Similar outages have been reported by several small and large scale businesses who use the cloud to offer services to customers for their business and were compensated by the cloud providers. K.Benson et al. [1] proposed a method to verify how many copies of data files are stored in the cloud. They mentioned that there exists a posi-tive linear relationship between latency and the dis-tance from a computer node to the location of the actual datacenter whereas no positive linear relation-ship exists with other locations. In their model, Plan-etLab [5] nodes are forced to download data from a particular cloud datacenter located in a city by changing the DNS server of that city on all Plan-etLab nodes. Their prediction of storage is best if distances between PlanetLab nodes and cloud dat-acenters are under 500km (312.5 mi). However, to detect the new location of data the distance should be more than 3000km between Planetlab nodes and data origin. Their focus is to find diversity rather exact locations of the data.

M.Gondree et al. [4] proposed a framework based on Constrained-based Geolocation(CBG) algorithm to geolocate cloud data storage. Their framework uses a combination of latency-based geolocation technique and probabilistic proof of data possession(PDP) scheme, with an average error distance of 626km. Un-fortunately, the IP geolocation algorithms reported in the literature have larger average error distances as compared to CBIG approach [2]. Albeshri et al. [10] proposed GeoProof, a method for geolocation assur-ance. GeoProof combines the proof of storage (POS) scheme and distance-bounding protocol [11]. In their architecture, to geolocate data a GPS enabled device is placed at provider to access position information but the idea is not feasible enough when data moves and also position information can be easily manip-ulated. Moreover this scheme was evaluated on a

(3)

university network rather than a real world cloud en-vironment. Watson et al. [12] presented the Location Based Storage (LoST) to geolocate the location of ﬁles in cloud. Their method includes a geolocation scheme and proof of location scheme. The geoloca-tion scheme is combinageoloca-tion of a time-distance func-tion which translates times to distance using best-ﬁt lines and geometric trilateration. The average geolo-cation error distance, however, is 1100km.

None of the papers discussed above mention geolo-cating other web services rather than Amazon S3. Most of the papers have geolocated results on Ama-zon S3 only and never experimented their geolocation scheme with other cloud providers.

Paladi et al. [19] presented a survey of IP geoloca-tion techniques used for locating cloud storage nodes. Measurement-based and topology based IP geoloca-tion algorithm have been proposed to geolocate In-ternet nodes [20–23].

III APPROACH

We extend ourCBIG approach [2] to geolocate data files in cloud datacenters for various cloud services. CBIG approach trains a Naive Bayes classifier by us-ing network measurements as features to geolocate Internet nodes. We extend this approach by evalu-ating the accuracy of the classifier on network mea-surements (average delay, standard deviation delay, mean & median delay, hop count) and societal char-acteristics (population density) features collected be-tween cloud users (PlanetLab nodes) and data centers from four commercial cloud providers. Fig. 1 illus-trates the system model. The system model describes the process of how cloud users access various services in multiple datacenters managed by cloud providers. Each cloud user uses their accounts with multiple providers to upload data and access available services. In our system model, we have restricted the services to storage and content delivery network (CDN). Plan-etLab nodes, distributed across multiple locations in United States, act as landmarks and are involved in the collection of network measurements. To geolo-cate storage service, network measurements between PlanetLab nodes and cloud storage datacenters are collected. To geolocate CDN service, network mea-surements between PlanetLab nodes and cloud CDN servers are collected. The CBIG model is trained and tested with the network measurement data. The classifier’s predicted city output is compared with the true cloud datacenter location and error distance is

calculated to verify the accuracy. We now present an overview of the CBIG and the extension of the approach to geolocate data ﬁles in cloud datacenters.

1 OVERVIEW OF CLASSIFICATION BASED IP GEOLOCATION

CBIG approach adopts a machine learning based tech-nique by training a model using network measure-ments and societal characteristic features. The net-work measurements are collected between landmark and target nodes. The landmark nodes are selected PlanetLab computers in the United States. The tar-get nodes are Internet routers discovered by conduct-ing a mesh traceroute between groups of PlanetLab computers. The CBIG employs a Naive Bayes frame-work to convert netframe-work measurements between the landmark and target nodes to distances.

Let the network measurements from landmarks to single target be set M = {m1, m2, ..., mj}, where

mj = {mjk} and k = 1,2,3,4,5,6 (where the to-tal number of measurements to the target IP address is given by M = 6j). A set of non-negative estima-tion weights {λk} is introduced for each feature to reflect the importance and contribution of that fea-ture in the overall classification process [3]. Another set of estimation weights {γk} is introduced to or-der the landmarks. A landmark with the smallest feature measurement values weighs most and informs the classifier more accurately than a landmark with the largest values. The weight parameter values will be chosen by the least squares parameter estimation method. The method minimized the sum of squared distance errors between the training set of IP ad-dresses known locations and the estimated locations. Using Naive Bayes framework, the CBIG estimates the county (ˆc) of the target IP addresses as:

ˆ c= arg max c_∈C ( 5 ∑ k=1 λk J ∑ j=1

exp(−j.γk)log p(mjk) +λ6log p(c))

where J is the total number of measurements from landmarks to the target IP.

2 CBIG TO GEOLOCATE CLOUD USER DATA IN DATA CENTERS

We present the implementation of the CBIG approach to geolocate cloud user data

(4)

Figure 1: Experimental setup

• Setup Cloud User Accounts: Multiple cloud user accounts were setup for two categories of ser-vices (storage and web serser-vices) in four cloud providers (AWS, Google AppEngine, Rackspace and Microsoft Azure). The storage service on each account allowed use to upload user data ﬁles of various sizes. Similarly, the web services allowed hosting web pages on each of the four accounts.

• Cloud Users: Automated scripts were setup on PlanetLab nodes in various cities in the United States were selected to download web pages and user data ﬁles from the four cloud providers. The scripts were setup to mimic the activity of cloud users who wish to access the two cloud services from each of the four cloud provider accounts.The scripts accessed the cloud services at random points in time throughout each day for 2 months.

• Landmark nodes: PlanetLab servers will be cho-sen to serve as landmark nodes. We have chocho-sen responsive PlanetLab nodes since the landmark nodes need to be stationary and reachable from each other.

• Cloud Users to Datacenter Network Measure-ments: Network measurements between clouser users and cloud datacenters will be generated when each cloud user downloads web pages or user data ﬁles while accessing either of the two

cloud services. As the cloud user accesses the cloud services, we record the Internet protocol (IP) addresses, average latency and hop count. The IP address changes on every subsequent download request from the cloud user.

• Landmark nodes to Datacenter Measurements: The IP addresses of datacenters collected from the download activities of cloud users were used to generate similar network measurements are collected between landmark nodes and datacen-ters.

• Training dataset: We identify the network char-acteristic features (average, standard deviation, median and mode of network delay and hop count) and societal characteristic feature (pop-ulation density) from the MaxMind database. The training dataset is extracted from network measurements between landmark nodes and In-ternet routers which were discovered by per-forming a full mesh traceroute between groups of PlanetLab nodes in United States. As the lo-cation of the Internet routers encompass all the cities where data centers for the cloud providers reside, our training dataset will be suﬃcient to estimate the location of user data in cloud dat-acenter.

• Classifier Training: CBIG approach based on the Naive Bayes framework is implemented as described in 1 and the classiﬁer is trained using

(5)

the training set generated in the previous step.

• Classifier Evaluation: Finally, we evaluate the accuracy of the classiﬁer by using a 5-Fold Cross Validation approach.

IV EXPERIMENTAL SETUP

To assess the accuracy of the CBIG approach, we evaluate the classiﬁer on a real-world environment involving commercial cloud providers such as Amazon Web Services [6], Microsoft Azure [7], Google App Engine [8] and Rackspace [9]) and PlanetLab [5]. In our experiments, we only considered CDN services on AWS and Azure and storage services on AWS, Rackspace, and Google AppEngine.

1 CLOUD SERVICE PROVIDERS

Cloud users have several commercial cloud providers to choose from depending on the type of service. In our experimental setup, we chose four popular com-mercial cloud providers: Amazon Web Services, Rackspace, Google App Engine and Microsoft Azure. To emulate various cloud services, we created cloud data stores on each of the cloud providers and up-loaded diﬀerent size of data ﬁles and used content delivery network to host data over the internet.

2 LANDMARK NODES

To evaluate the accuracy of our approach from geo-graphically disparate and diverse locations, we down-load cloud data ﬁles from reliable and suitable land-marks. We chose the landmark nodes from Planet-Lab, a group of computers available at diﬀerent parts of world for computer networking and distributed sys-tems research. PlanetLab is a live test-bed and cur-rently has 1172 nodes available worldwide [6]. There are 450 PlanetLab nodes available in the United States. Out of the 450 nodes, we used 191 alive and respon-sive nodes for landmarks. We tested the alive and responsiveness of the landmarks based on ICMP echo responses.

3 CLOUD DATACENTER IP DISCOVERY Datacenter IP addresses are critical to our approach because all network measurements (i.e. used as fea-tures) are carried between PlanetLab nodes and data-center IPs. We gather unique datadata-center IP addresses

by repeatedly downloading data ﬁles from cloud dat-acenter. For example, one of the website URLs pro-vided by Amazon CDN service was d361j3gdmlvl6k. cloudfront.net. Every web page download from the website resulted in a diﬀerent IP address associated with the URL. Datacenter location information can be obtained by ICMP echo (Ping) requests to data-center IP. The ICMP reply message includes datacen-ter location (i.e. server name) information. The data center location information contains the city airport code where it is located. For instance, a reply from Microsoft Azure has server name cds119.iad9.msecn.net (65.54.81.122), iad airport is in Dulles, Virginia and another from Amazon AWS has server-54-240-190-150.jfk6.r.cloudfront.net (54.240.190.150), jfk is in New York, NY. We employed not only domain name reso-lution to verify the location information is correct [1] but also checked if the provider has a datacenter the same city.

4 GENERATING TRAINING DATASET To generate the training dataset, we collect instanta-neous delay and hop count measurements from each of the 23,843 routers to the 67 landmarks [2]. For the instantaneous delay data, we send 40 ICMP echo re-quest from each landmark to all the routers. Based on the instantaneous delay measurements, we calculate the average, standard deviation, mode and median of delay for each router from each landmark, which results in 67×23,843×4 = 6,389,924 measurements. Using traceroute to collect hop count measurements causes excessive overhead on the core routers. To avoid this overhead we send a single ICMP echo re-quest from each landmark to all targets. We then use this request to calculate the hop count of the reverse path [3, 18].

V EXPERIMENTAL RESULTS

The trained CBIG was evaluated by test datasets generated with measurements obtained from differ-ent cloud providers. Prior knowledge of datacdiffer-enter location helps to verify more accurately the predicted location of the test cloud datacenter from where a test dataset (network measurements) is obtained. To show our model’s flexibility, we extend our verifica-tion process to two cloud services, such asstorageand content delivery network (CDN).

(6)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 600 700 Cumulative Probability

Error Distance (mi) Avg Avg,Std,Hop (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 600 700 800 900 1000 Cumulative Probability

Error Distance (mi) Avg Avg,Std,Hop

(b)

Figure 2: Geolocation accuracy of cloud CDN Servers. (a) Amazon Web Services and (b) Microsoft Azure

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 30 35 40 Cumulative Probability

Error Distance (mi) Avg Avg,Std,Hop (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 1200 1400 1600 Cumulative Probability

Error Distance (mi) Avg Avg,Std,Hop (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 200 400 600 800 1000 1200 Cumulative Probability

Error Distance (mi) Avg Avg,Std,Hop

(c)

Figure 3: Geolocation accuracy of cloud Storage. (a) Amazon S3, (b) Rackspace and (c) AppEngine

1 GEOLOCATING CLOUD CDN SERVERS Content Delivery (CDN) servers are usually large dis-tributed infrastructure deployed in multiple cloud dat-acenters. Cloud content delivery networks such as Amazon’s CloudFront and Azure’s CDN are used to distribute content (data) across internet. In cloud CDN service, client’s data gets distributed across the servers in the cloud network. As the location of these servers spans the world, there is a possibility of cloud users data being stored in cities which are more prone to natural disasters and power outages.

Fig. 2 illustrates the cumulative distribution of the er-ror distance. Cumulative distribution function (CDF) is used to map values to their percentile rank in a distribution. For each IP address, an error distance between the predicted location and actual location is calculated using haversine distance formula [17] (lat-itude and long(lat-itude of datacenter locations are used as input in the haversine formula). Based on error distance of each IP address, these are sorted ﬁrst and then assigned with cumulative probability values. Ta-ble 1 shows the impact of combination of features on classiﬁcation accuracy for Amazon AWS and Mi-crosoft Azure. It is clear that combination of features

(7)

Table 1: Classiﬁcation Accuracy Cloud Service Provider IP Ad-dresses Ave Ave Std Hop Amazon Web Services 3500 60% 75% Microsoft Azure 43000, 63% 85%

Figure 4: Distribution of PlanetLab Nodes in USA

such as average (Avg) delay, standard deviation (Std) delay and hop count (Hop) has more classiﬁcation signiﬁcance over feature Average (Avg) alone. For Amazon AWS, out of 3500 IP addresses, 60% of IPs reported less than 20 miles error distance only with one feature (average delay) and 75% of IPs reported less than 20 miles error distance with combination of three features (average, standard deviation delay and hop count). For Microsoft Azure, out of 43,000 IP ad-dresses, 63% of IPs reported less than 20 miles error distance only with one feature (average delay) and 85% of IPs reported less than 20 miles error distance with combination of three features (average, standard deviation delay and hop count).

2 GEOLOCATING CLOUD STORAGES Online storage providers like Dropbox, SkyDrive and Google Drive use cloud storage to store ﬁles. To geolocate cloud storages we have considered Ama-zon S3, Rackspace and Google App Engine. In the veriﬁcation process of cloud storages, for each cloud provider’s storage IP address, the error distance is calculated between the predicted datacenter location and nearest datacenter of that cloud provider using haversine distance formula [17]. The total number of

Table 2: IP Geolocation errors Cloud Provider Datacenter

Loca-tion Average Error (mi) AWS Ashburn, VA 0 New York, NY 0 Dallas, TX 11 Miami, FL 264 Jacksonville, FL 5 Los Angeles, CA 9 San Jose, CA 12 Newark, NJ 10 Palo Alto, CA 64 Seattle, WA 669 Rackspace Grapevine, TX 10 Ashburn, VA 0 Chicago, IL 15 Azure Los Angelees, CA 10 Fairfax, VA 0 San Antonio, TX 63 Chicago, TX 9 Google Ap-pEngine Lenoir, NC 381 Douglas County, GA 0 Mayes County, OK 14 The Dalles, OR 230 Council Bluﬀs, IA 375

storage IPs veriﬁed for Amazon S3 is 900, Rackspace is 1900 and Google App Engine is 7000. Fig. 3 compares the geolocation accuracy for Amazon S3, Rackspace and Google App Engine respectively.

3 GEOLOCATION ACCURACY OF CLOUD DATACENTER CITIES

Fig. 4 shows the distribution of PlanetLab nodes in United States (source http://ars.els-cdn.com/content /image/1-s2.0-S1389128611002490-gr5.jpg). To show the impact of PlanetLab nodes distribution on the accuracy of datacenter geolocation, we present the geolocation of cloud datacenters at the city level gran-ularity. Table 2 presents the average error distance across diﬀerent cloud datacenter cities. Rich concen-tration of PlanetLab nodes in cities produce low error distance when compare to cities with very few Plan-etLab nodes.

(8)

VI CONCLUSION

We presented CBIG, a classification based IP Geolo-cation approach to locate cloud user data in cloud data centers to verify availability. CBIG is based on a Naive Bayes machine learning framework and utilizes six features from network measurements and societal characteristic. CBIG appears promising in geolocating datacenters for the four commercial cloud providers. The experimental evaluations demonstrate that high classification accuracy is obtained by utiliz-ing only three features (average delay, standard de-viation delay and hop count). The remaining three features do not pose a significant increase in geolo-cation accuracy. For future work, we would like to investigate in features which can further improve the geolocation accuracy. We also plan to enhance CBIG to detect violation of location restrictions in SLA. In this paper, we presented an approach to accu-rately geolocate cloud datacenters on the base of city level granularity. The approach assumes that the net-work measurements are not tampered or maninpu-lated and the experiments were only conducted on available cloud datacenters located in United States. For future work, we will evaluate the accuracy of the proposed approach geolocate cloud data centers in presence of network measurements which have been deliberately modified by adversaries external or in-ternal to the cloud provider.

ACKNOWLEDGMENT

This work was partially supported by National Sci-ence Foundation (NSF) Grant HRD-1137466, NSF HBCU-UP Targeted Infusion Grant HRD-1137544, AFOSR grant FA9550-09-1-0165, Department of Home-land Security(DHS) SLA grant 2010-ST-062-0000041, 2011-ST-062-0000046. It is also based upon work supported by the AFRL/RI Information Institute VFRP No. R730719.

References

[1] K. Benson, R. Dowsley and H. Shacham, “Do You Know Where Your Cloud Files Are?,” In Proceedings of CCSW 2011. ACM Press, Oct. 2011.

[2] H. Maziku, S. Shetty, K. Han and T. Rogers “Enhancing the Classiﬁcation Accuracy of IP Geolocation”, MILITARY COMMUNICA-TIONS CONFERENCE, 2012 - MILCOM 2012.

[3] B. Eriksson, P. Barford, J. Sommers, and R. Nowak, “A learning-based approach for IP geolo-cation,” Passive and Active Measurement Work-shop, 2010.

[4] Gondree, Mark and Peterson, Zachary N.J., “Geolocation of data in cloud,” CODASPY ’13 Proceedings of the third ACM conference on Data and application security and privacy, 2013. [5] A. Bavier, M. Bowman, B. Chun, D. Culler, S. Karlin, S. Muir, L. Peterson, T. Roscoe, T. Spalink, and M. Wawrzoniak,“Operating System Support for Planetary-Scale Network Services,” in USENIX NSDI ’04, March 2004.

[6] Amazon Web Services,Cloud Computing(AWS) https://aws.amazon.com.

[7] Microsoft Windows Azure https://windowsazure.com [8] Google App Engine

https://cloud.google.com [9] Rackspace

https://www.rackspace.com

[10] Albeshri, Aiiad Ahmad, Boyd, Colin, & Gonza-lez Nieto, Juan M., “Geoproof : proofs of ge-ographic location for cloud computing environ-ment”, 3rd International Workshop on Security and Privacy in Cloud Computing (Part of the 32nd International Conference on Distributed Computing Systems Workshops (ICDCS 2012). [11] G. Hancke and M. Kuhn. A RFID distance

bounding protocol. In IEEE/Create-Net Se-cureComm, pages 67-73. IEEE Computer Soci-ety Press, 2005.

[12] Watson, Gaven J. and Safavi-Naini, Reihaneh and Alimomeni, Mohsen and Locasto, Michael E. and Narayan, Shivaramakrishnan, “LoST: Location Based Geolocation”, CCSW ’12 Pro-ceedings of the 2012 ACM Workshop on Cloud computing security workshop.

[13] U.S. Census Bureau http://www.census.gov [14] Online http://www.forbes.com/sites/kellyclay/2012/12/ 24/amazon-aws-takes-down-netﬂix-on-christmas-eve/ [15] Online http://www.zdnet.com/dropbox-users- experiencing-upload-problems-aws-questioned-again-7000009680/

(9)

[16] Online

http://techcrunch.com/2013/02/24/microsoft- to-refund-windows-azure-customers-hit-by-12-hour-outage-that-disrupted-xbox-live/

[17] Haversine Distance formula

http://www.movable-type.co.uk/scripts/gis-faq-5.1.html

[18] H. Wang, C. Jin, and K. Shin,“Defense against spoofed IP traﬃc using hop-count ﬁltering,” IEEE/ACM Transactions on Networking, 2007. [19] N. Paladi, C. Gehrmann and F. Morenius,“State

of The Art and Hot Aspects in Cloud Data Storage Security,” SICS technical report, March 2013.

[20] I. Youn, B. Mark, and D. Richards, “Statisti-cal geolocation of Internet hosts,” International Conference on Computer Communications and Networks, 2009.

[21] B. Gueye, A. Zivian, M. Crovella, and S. Fdida, “Constraint-based geolocation of Internet hosts,” IEEE/ACM Transactions on Network-ing, 2006.

[22] S. Laki, P. Matray, P. Haga, T. Sebok, I. Csabai, G. Vattay, “Spotter: a model based active geolo-cation service,” IEEE INFOCOM, 2011. [23] P. Gill, Y. Ganjali, B. Wong, and D. Lie, “Dude,

where’s that IP? Circumventing measurement-based IP geolocation,” USENIX Security Sym-posium, 2010.

[24] American Registry for Internet Numbers (ARIN)

https://www.arin.net/

[25] MaxMind - IP Geolocation and Online Fraud Prevention

https://www.maxmind.com/en/home

[26] Neustar - IP Intelligence & IP Geolocation Ser-vice (formerly Quova)