PVA deployment density is surging due to its usefulness and the convenience they bring to users. It is safe to say that we are always in range of at least one PVA which exist in one of the possible forms (e.g., smart speakers, intelligent assistants on smartphones or smart watches). With this level of popularity, PVAs’ natural intimacy with users, and their critical roles in Internet of Things (IoT) eco-system, security and privacy issues accompanying PVA usage arise.
People probably are not fully aware of the existence of PVAs in their surrounding. Even though they realise the PVA presence, they may not be able to influence their behaviours, let alone deactivate them (i.e., PVAs that belong to others may be activated and start recording). Privacy violation related to PVA usage has become the most prominent concern for users as they are capable of monitoring and understanding speech [78]. I believe that in a world observed by numerous PVAs it is important to understand how people could express privacy requirements about how their conversations will be handled if these conversations are
recorded accidentally by surrounding PVAs. Privacy requirements could include not giving
recording consent, recording locations and recording time stamps, etc.
Besides the obvious capability of listening to conversations, PVAs systems can also identify user activities such as laughter, crying and eating [21, 110] due to rich information embedded in speech signals. Another example is that it might also be possible to identify room characteristics such as room size or shape by just analysing recordings. These information can be revealed by processing the audio samples recorded by microphones on a PVA, which is categorised as PVA passive sensing in this thesis. In addition, PVA speakers can be used to emit (inaudible) sound and reflections can be received by microphones [30], which is categorised as PVA active sensing in this thesis. I believe due to these PVA sensing capabilities it is important to understand how much information about user daily routine and their living environment are exposed as these information are highly related to user information security and privacy.
As PVAs are an interface to computer systems and smart environments, it is necessary to police access. Malicious commands can be injected to PVAs without a user’s awareness in var- ious ways: using a malware to monitor the environment to seek a proper attack moment [42], transmitting inaudible commands [158] or exploiting psychoacoustic characteristics to hide abnormality in a benign audio command [118]. Speaker identification can be employed to authenticate user interactions, so a PVA can be trained to recognise an individual registered beforehand [14]. However, such identification mechanism can be circumvented by replaying recordings [163] or using speech synthesised [38]. I believe it is important to understand how to practically circumvent PVA access control with methods for breaking authentication
1.1 Problem Statement and Thesis Goals 3 mechanisms or with novel methods by exploiting specific PVA software or hardware. On the other hand, it is also important to design robust user authentication methods immune to these bypassing attacks and defence methods against malicious speech commands on PVAs.
Resilient PVA operations are necessary in many scenarios. However, preventing a PVA from operating normally or stop it from operating at all is possible by using Denial of Service (DoS) methods and interfering with audio processing [31] (or speech recognition [76]). I believe it is important to understand what kind of DoS attack methods are possible against PVAs and what security and privacy implication of these methods may be for PVAs.
Apart from these security and privacy concerns from security research perspective, some PVA market statistics are shown below. Industry and public views of security and privacy in the PVA domain are also summarised to enhance the understanding more comprehensively.
1.1.1
PVA Markets and Cybersecurity
PVAs are becoming a main interface for digital environments. Reports show that 21% of the US population have at least one smart speaker, and the the total number of smart speakers has reached 118.5 million [97]. The 2019 Australia Smart Speaker Consumer Adoption Report [138] shows 29.3% of the Australia adult population (i.e., 5.7 million Australians) owned smart speakers. Strategy Analystics [144] shows that the UK, Ireland, Canada, South Korea, Australia, Germany and France will reach the 50% adoption threshold within the next four years. However, concerns about PVA security and privacy arise as the PVA market grows [78]. 2019 Voice Report [98] conducted by Microsoft shows that 41% of PVA users are concerned about trust, privacy and passive listening. Obviously it is necessary to advance our understanding of PVA cybersecurity.
PVAs applies voice control as a natural and convenient interface. Although PVA is only one amongst many possible interaction solutions, there is a trend that it is becoming the dominant interface for future systems , even for safety critical systems. In this thesis, I use the term safety critical systems to refer to systems which on failure may cause serious injury or even lead to loss of life. PVAs have the potential to become a major interface for safety critical systems. For instance, Amazon and the UK National Health Service (NHS) have announced a partnership to enable users to obtain NHS advice via PVAs [44].
When using PVAs on such a large scale as anticipated and applying them for safety critical applications, it is absolutely essential to improve our understanding of existing and potential risks.
4 Introduction
1.1.2
Public PVA Security and Privacy Perception
Recently there are a good many highly visible news and articles about PVA security and privacy. These news have drawn great attention from the public. In April 2019 Amazon admitted their workers listen to user recordings collected from PVAs regularly for improving services [18]. In July 2019 Google admitted their contractors listen to voice recordings obtained by their PVAs [135] regularly [135]. There have also been frequent reports on incidents where PVAs record or trigger actions without user intent. A prominent case on the headline was the UK MP Gavin Williamson got interrupted by Apple’s Siri when he was addressing the House of Commons [96]. Industry has started to tackle this growing public concern. Amazon Echo, Google Home and Siri have integrated speaker authentication/iden- tification function. However, their main focus is to distinguish multiple speakers (except Siri) in a household sharing a PVA. In 2019 Amazon introduced the command "Delete everything I say today" to provide users with more privacy control. Other than the mainline PVA manufacturers, third-party companies are proposing solutions as well. For instance, Project Alias [68] is a device that feeds a smart speaker constant white noise to disable it, providing users with an extra "safe belt" to control when to activate the PVA. Mycroft [119] is a privacy-prioritised PVA focusing on local speech command processing to avoid cloud analysis of recordings.
Legislators in some countries are investigating the legal context of PVA systems. Cur- rently there are debates about the necessity of new laws and what their forms should be. In Germany the Parliament investigated legality of PVA data collection, and concluded that there remain questions on how third parties and minors can be excluded from data collection to comply with laws [145]. Furthermore, it was also unclear how third parties may use the collected data in the future. In California Assembly Bill 1395 is proposed which would pro- hibit smart speaker operators from retaining or distributing voice recordings or transcriptions without the user’s consent [22].
The general public is concerned about PVA security and privacy, and industry and legislators are starting to react. However, as this thesis later shows in Chapter 3, research has already identified much more sophisticated and serious security challenges than the ones currently triggering public debate.
The goal of this thesis is to improve the understanding of acoustic-channel related security and privacy challenges of PVA usage from the four security research perspectives mentioned above. I first present my work which builds a taxonomy and related state-of-the-art studies are surveyed. Each of the main parts of this taxonomy is about one of these perspectives. Then I present my empirical work which contribute to each part.