6.2 WiFi Dataset
6.2.1 Environment Characterization
The dataset used in this study is composed by the access records of all Access Points (APs) of the eduroam Wi-Fi network of the Lisbon Polytechnic Institute (IPL) generated between January 1, 2005 and December 31, 2013. A total of 76479 devices and 45363 distinct users accessed the network during this time frame, producing about 43 million records, an average of 9.2 access requests per minute.
IPL is the 7thlargest teaching institution in Portugal with approximately 1300 teach-ers and 15000 students registered on one of the 88 bachelors and mastteach-ers offered. IPL is distributed over 10 distinct sites in Lisbon metropolitan area (see Fig. 6.1). The eduroam network is supported by 236 Cisco Systems APs, covering a total of 26 build-ings and inter-building areas. Records are originated from all the users accessing the network, thus also including visitors from other institutions.
Figure 6.2 depicts the evolution of the number of APs, distinct users and devices.
The growth of the AP number is justified, in the vast majority of the cases, by the need to increase the network capacity in order to satisfy the demand. The figure also shows
0
Figure 6.2: Devices, users and access points
a continuous growth of the number of users and devices although at distinct rates, specially since 2010. This is coincidental with an increase in the sales of smartphones observed at the national level and confirms our expectations that the number of users accessing the network with more than one device has been increasing.
The proportion of devices per manufacturer of the wireless network interface is depicted in Fig. 6.3. The manufacturer of each wireless network interface was obtained from the Organisationally Unique Identifier (OUI) component of its link layer address.
Results are approximate, as link layer addresses can be forged and the OUI may not be respected by some manufacturers. Figure 6.4 depicts the total dispersion of the devices by the most popular operating systems. These results are only estimates given that: i) users are free to change the data sent by their DHCP client and, ii) recent Apple devices don’t send vendor information. Devices with multiple operating system capabilities are represented once per operating system detected.
The DHCP (Droms, 1997) message fields vendor, parameter request list and hostname were used to identify the device operating system. Unfortunately, no DHCP records could be found before 2009. Results for years between 2005 and 2008 include only the devices that connected at least once since 2009. This strategy allowed the identification of 65160 (85%) of the 76479 devices that connected to the network between 2005 and 2013.
6.2. WIFI DATASET 71
Precision Samsung AzureWave Liteon ASKEY <2000 Devices
Network interfaces
Figure 6.3: Wireless network interface manufacturer
432 1514
Figure 6.4: Detected operating systems (by ascending order)
Our study arranges devices in two classes: Small Mobile Devices (SMD) include those that tend to be always on, are small and can be used on the move. Examples of SMDs are smartphones, PDAs and tablets, using Windows CE, iOS and Android.
The second class, Laptops, group the larger devices usually running over a classical operating system (Linux, Windows or OS X). The class of each device was determined by its operating system. Figure 6.5 shows that the share of SMDs is increasing rapidly, reaching more than 20% in the last year of our study.
Data analysis is centred on RADIUS (Rigney, 2000) logs. This service is responsible for the authentication of all eduroam network access requests. The RADIUS service generates a log entry every time a user associates or de-associates to an AP, as well as
0%
20%
40%
60%
80%
100%
2005* 2006* 2007* 2008* 2009 2010 2011 2012 2013
* Values extrapolated to devices who were observed a5er 2009
Small Mobile Devices Laptops
Figure 6.5: Laptops vs small mobile devices
when keep-alive messages are exchanged. Keep-alive messages confirm the presence of a user already associated. Log entries reproduce the RADIUS session concept thus considering the association of each user to a single AP and therefore ignoring user mobility. We will use the term session to refer to these records. Log entries contain the device MAC address, AP, user name, session start and stop times and total traffic sent and received during the session.
In ideal conditions, each session should represent the association of a device to an access point. However, the number of sessions observed is slightly amplified due to: i) automatic handover between APs, triggered by variations in signal strength; ii) incompatibilities between client drivers and protocol versions running in the AP and;
iii)operating system energy saving mechanisms that may turn off the radio interface when it is not in use. Interpretations of the results which rely on the number of sessions should therefore be made with some caution and take into account these factors. To mitigate some obvious anomalies, logs have been edited by:
• merging in a single record consecutive sessions between the same device and AP with an interval of less than 5 seconds. These sessions are attributed to network card or driver problems;
• removing concurrent sessions of the same device to distinct APs. This is an
im-6.2. WIFI DATASET 73
0 2 4 6 8 10 12 14 16
2005 2006 2007 2008 2009 2010 2011 2012 2013
Sessions (x106)
Years
Figure 6.6: Sessions
possibility that can only be explained if the device did not disassociate correctly from one AP before associating to the next and the former artificially defined the session stop time upon a timeout. In this case, the session stop time of the earliest session was corrected to happen immediately before the start time of the latest;
• removing sessions with the stop time equal to the start time. Sessions with these characteristics are created when a user has some problem while connecting to the network, although the network considers the user authenticated (thus creating the RADIUS record).
The evolution of the total number of sessions with time is presented in Fig. 6.6.
However, the temporal evolution on the absolute number of sessions must consider the gradual capacity growth of the eduroam network (cf. Fig. 6.2), which in case of user mobility can increase the number of sessions established on the same path but taken in different years.
Figure 6.7 depicts the number of distinct devices that appear on RADIUS logs per day. As expected, the plot exhibits an irregular pattern consistent with the different ac-tivity levels that can be found on workdays, weekends and summer and winter breaks in the campus.
0
Figure 6.7: Wifi Devices Connected Per Day
0
Upload Download Total
Figure 6.8: Network traffic