2.2 Individual-Based Approaches to Accessibility and Segregation
2.3.2 Acquiring Data
All the methods for building activity spaces previously discussed rely on data on people’s trajectories, activities, and time constraints in different levels of detail. This section will present an overview of the data collection methods found in the literature. A summary of those techniques is presented in table 2.2.
Table 2.2: Summary of data sources used for defining activity spaces.
Method Characteristics
Travel diary surveys High Level of Detail
Detailed socioeconomic and travel behaviour data on each participant can be collected, as much as needed.
Low Scalability
Usually small samples, due to manual processes of data collection and processing.
GPS High Level of Detail
Detailed socioeconomic data on each participant can be collected. GPS data is a more reliable source of travel behaviour information than travel diaries.
Low/Medium Scalability
Providing GPS tracking devices to research subjects can be costly and limited to small scale studies. Provid- ing a GPS tracking app that participants can install on their own smartphones increases the scalability of the research.
Participatory GIS Medium/High Level of Detail
High flexibility regarding the kind of information that can be collected. Potentially, detailed socioeconomic data on each participant can be obtained.
Low/Medium Scalability
Scalability depends on recruiting participants.
Social Media Low Level of Detail
Socioeconomic data on users are not available and may be inferred from census data when needed.
High Scalability
Social media APIs (such as Twitter) provides direct ac- cess to the activities of many users who decide to share their activity and location data. Issues in self-sampling of users, which may not represent the whole population.
Mobile phone Low Level of Detail
Socioeconomic data on users are not available or very limited and may be inferred from census data when needed. Spatial resolution may be limited.
High Scalability
Access to a significant proportion of the population can be achieved with this method.
Flow Data (OD) Medium Level of Detail
Socioeconomic and flow data on a large (if not the whole) parcel of the population can be obtained. How- ever, there is no temporal information, nor trajectories.
High Scalability
Access to the entire population, in the case of Census OD data.
Travel diaries are the most common approach for collecting individual activity data for time-geographic studies. By using this technique, researchers are able to collect a vast array of socioeconomic and travel behaviour data on each participant, in a level of detail specifically tailored to the research needs. However, such studies are difficult to replicate or to expand to a larger population beyond the original sample. Kamruzzaman and Hine (2012) reviewed the sample sizes of several travel diary surveys used in studies carried out between 1999 and 2011. They found significant variation between the sample sizes, which ranged from 19 to 755 diaries collected. While most studies collected data for 1 to two days, up to a week, one particular study by Sch¨onfelder and Axhausen (2003) collected travel activity information for 6 weeks. Although not comprehensive, the evidence seems to indicate there’s no consensus regarding the size requirements of travel diary surveys. Although new software tools (Safi et al. 2017; Prelipcean, Gid´ofalvi, and Susilo 2018) can help automate travel diary collection and reduce issues such as low response rates and incomplete trip declaration by participants, those are still very new, rarely used, and their impacts on the field are yet to be seen.
One alternative to travel diaries is the use of large scale travel surveys carried out by government agencies and planning bodies. Examples are the U.S. National Household Travel Survey (Santos et al. 2011), Mobidrive for German cities (Sch¨onfelder and Axhausen 2003), and the SIRS survey in France (Vall´ee and Chauvin 2012). A shortcoming of this approach is that the data may have been collected with focus on different objectives, which may lead to issues as some important information be missing, or the sample not being representative of the research’s target population.
Origin-destination matrices (OD, also known as flow data) from censuses can be a useful datasource in time-geographic studies. Farber and colleagues (Farber et al. 2013; Farber et al. 2015) use flow data to calculate potential oppor- tunities for social interaction among different population groups for metropolitan regions. The advantage of this technique is that socioeconomic data and trips’ origins and destinations of a large parcel of the population can be obtained, in many cases, from the census. However, OD matrices usually contain only origins and destinations of trips, without temporal information nor the actual trajectories of trips.
Data collection can be partially automatised with the use of GPS tech- nologies, both in dedicated devices or in apps installed in modern smartphones. Yip, Forrest, and Xian (2016) used an Android app to track the location of partic- ipants and to identify places of activities (defined as places where the participant stayed longer than 30 minutes). Greenberg Raanan and Shoval (2014) used this technique to compare perceived territorial boundaries, in the form of mental maps sketched by the participants, to the participants’ actual trajectories collected via GPS receivers, finding a strong relationship. This technique brings many advan- tages over travel diaries and interviews, such as the elimination of the burden on the participant of remembering past activities and filling the diary, as their locations are recorded automatically by the app. However, this technique is still
limited to voluntary participation, and reaching out to possible participants can be time consuming.
Participatory GIS (PGIS) techniques allow surveying individuals re- motely through the combination of online mapping and questionnaires. Huck et al. (2019) collected data in Belfast, Northern Ireland, using an online PGIS platform for a study on religion segregation. Participants identified as Catholics, Protestants, and Other, were asked to divide the city in areas perceived as used mainly by a single group, or shared by more than one group. The authors used a spray-can technique so users could demarcate regions with fuzzy limits rather than the hard boundaries of traditional areal units such as census tracts. The sample used was limited to 33 recruited participants, but the technique has the potential to be scaled to larger samples depending on recruiting efforts.
Passive mobile positioning data, made available by mobile operators, allow the collection of data in very large scales. Silm and Ahas (2014a, 2014b), for example, were able to reach half the population of Estonia in their study of segregation among ethnic Russians and Estonians in the country. The informa- tion supplied by the operator can vary, and in their specific case the researchers were able to obtain, together with the call detail record, the sex, birth year and language the mobile phone user preferred to communicate with the service provider. With this information, the authors could identify the ethnicity of the users, assuming the language chosen is the individual’s first language. Among the shortcoming of this technique are the spatial resolution of the data, which is restricted to the location of the cellular antenna closest to the user, and limited demographic information due to data privacy issues.
Another alternative for mass data collection is from social media feeds. Netto et al. (2018) collected data from Twitter users in the city of Rio de Janeiro, Brazil, generating a database and 20,029 geotagged tweets belonging to 2,543 users over the course of 56 hours. The place of residence of each user was esti- mated based on the location of their first tweet in the morning, and the income of the user was estimated by the characteristics of the census tract they live in. This technique allows gathering data of a large number of subjects when no other sources are available. The data collected are also more recent and frequently updated. As a shortcoming, the user base may not be representative of the whole population, and the socioeconomic characteristics of the user has to be estimated.