3 M ETHODOLOGY
3.3 Exploratory Research based on a Single Case Study Strategy
3.3.4 Mobile Crowdsourced Primary Data Collection
Crowdsourcing refers to the strategy to solve large-scale problems by utilising existing resources from the masses. Crowdsourcing approaches are considered to be a very effective when a solution relies on performing tasks on a larger scale (Faggiani et al., 2013). There are currently three notable mobile active Internet periphery measurement applications to capture Internet structural data from an Internet periphery perspective (see section 2.3.1):
• NetRadar (2015), focusing on the mobile broadband operator coverage and the comparison of accessing devices.
• OpenSignal (2015), which aims to capture signal strengths of mobile broadband operators.
• Portolan (2015), which aims to discover the topology and structure of the Internet through utilising Paris traceroutes.
While traditional Internet Topology mapping efforts often rely on a top-down and passive data collection approach, the data collection using the Portolan (2015) provides Internet measurements using a unique bottom-up, or active Internet periphery (end-user) perspective (Faggiani et al., 2012). We chose the Portolan (2015) application over its competitors’, due to its focus on capturing traceroutes, allowing us to measure the upstream Internet market structure through utilising Network Analytical Methods, which represents the main interest of this dissertation (Portolan (2015) selects, for every traceroute, randomly-chosen destinations). Moreover, the applicability is already tested in a preliminary pilot experiment by Giovannetti and Sigloch (2015). The necessary preparatory steps taken for collecting the traceroute data using Portolan (2015) are described in the section after the next below. Figure 3-1 below illustrates the flow of the data collection in detail. Here, the data collector (researcher) first arranged the data collection campaign with the Portolan (2015) Network Tools Administrator (step (i) in Figure 3-1). This is followed by an event storing of the data collection campaign in the software of the Portolan (2015) server by the Network Tools Administrator. The server orchestrated the measurement campaigns (Faggiani et al., 2012), assigning so-called measurement campaigns to the specified Android smartphones that automatically collected the traceroute data on the set campaign dates. The traceroute data was then automatically collected by the Portolan (2015) Server, fused together and stored on a database where the respective Network Tools Administrator was able to obtain the
collected traceroute data (step (vi) in Figure 3-1). Next the collected traceroute data were sent to the data collector by the Network Tools Administrator (step (vii) in Figure 3-1), including information on how to classify the obtained files (see section 4.1, Figure 4-1).
Key
Figure 3-1: Overview of data collection.
Materials and Equipment
For the purpose of collecting the upstream connectivity traceroute raw data using a mobile crowdsourced data collection approach, the data collector (researcher) had to organise three Android smartphones since the Portolan (2015) application was (at the time) solely available for devices operating the Google Android Software > Version 4.0. These smartphones were of varying prices, and with decreasing cost order the brands were Sony, Micromax, Lava and Karbon. Unfortunately, only the former two were able to maintain a stable configuration of Portolan (2015). Furthermore, to collect the
traceroute data for the purpose of exploring the stated Working Hypotheses, the data
collector (researcher) had to organise Tamil Nadu mobile broadband operator SIM cards. As mentioned in the Literature Review above (section 2.2.2), the Mobile Service Area of Tamil Nadu is separated into two Mobile Service Areas covered by four Indian mobile
broadband operators (Aircel, Bharti Airtel, BSNL, and Vodafone), three of which the data collector was able to obtain local SIM cards for, namely Aircel, Bharti Airtel, and Vodafone. A SIM-card for the fourth mobile broadband operator BSNL (Bharat Sanchar Nigam Ltd.) could not be obtained due to local regulations for the issuing of SIM cards to locals and foreigners. This choice of materials and equipment may result in a selection bias and has therefore been taken into consideration when reporting the results (see section 3.6.1 below). Some of the chosen low-end smartphones in particular did not seem to properly collect the traceroute observations at times, which potentially indicates real- world connectivity situations.
Data collection preparations
Once the Android smartphones and SIM cards of the local Tamil Nadu mobile broadband operators were obtained, the next preparation tasks for the researcher were:
• To collaboratively organise the study plan for 01st March – 05th March 2015 with
the Network Tools Administrator at the Instituto di Informatica e Telematica (IIT) at the University of Pisa in Italy by email. This organisation included the setting of the measurement campaign given the anticipated planning. Moreover, the organisation included the transmission of necessary information for the Portolan Server to trace the correct smartphones. Lastly, we organised the data transmission of the collected traceroute data from the Portolan Server from the Network Tools Administrator to the data collector.
• To organise travel and accommodation from Cambridge, UK to the Indian Institute of Technology Madras (IITM) campus in Chennai, Tamil Nadu, India. • To collect the respective Android smartphones and SIM cards before being on-
site in Chennai, Tamil Nadu.
• To organise a local driver for the purpose of travel assistance for each day of the chosen data collection period.
• To download and install the Portolan (2015) from the Google Play Store.
• To prepare the settings of the Portolan (2015) Network Sensing Architecture Android application for our chosen traceroute data collection purposes.
Location and Data collection times
The data collection took place in the period of 01st March 2015 – 05th March 2015, while
of Chennai, through the rural areas of Tamil Nadu, 45 miles to the distant historical city of Kanchipuram (see Figure 3-2). The researcher, hence, collected traceroutes from both urban and more rural areas in Tamil Nadu, India. Before each of the planned daily commutes by car and foot, the researcher made sure that the batteries of the three Android smartphones were charged during night-time hours. This represented a normal end-user behaviour and prevented unforeseen smartphone shutdowns due to flat batteries. During the data collection commutes, it was important to mimic an end-user’s usual smartphone usage behaviour. Hence, our case study covered the following use-cases of locals or tourists that are commuting or travelling to Chennai, the urban outskirts or the city of Kanchipuram (for religious events or ritual traditions such as the holy pilgrimage).
Figure 3-2: Traceroute hop observations as obtained through Portolan (2015) plotted on a Google Maps.
While carrying the smartphones during the travel commutes, the Portolan (2015) application automatically collected 57,122 unique traceroute observations. These
traceroute observations contained a total number of 731,200 Internet Protocol (IP)
address hop observations since each traceroute observation contains a multitude of Internet Protocol (IP) addresses that are traversed from any connection measurement source to a random-assigned destination. Table 3-1 below lists the distribution of collected traceroute hop observations per data collection day.
Observations per data collection day Date of data collection
(YYYY, MM, DD)
Number of collected
traceroute IP address hop
observations In percentage of total collected IP address hop observations 2015-03-01 236,805 32.39% 2015-03-02 113,431 15.51% 2015-03-03 119,373 16.33% 2015-03-04 134,621 18.41% 2015-03-05 126,970 17.36% Total 731,200 100% Key
IP: Internet Protocol.
YYYY, MM, DD: Year, Month, Day.
Table 3-1: Observations per data collection day.