Device study: WideNoise
6.3 Design: WideNoise transformed by testing
This section describes the way evaluating WideNoise did not result in any physical changes but emotionally transformed the way the researcher related to the app, suggesting that calibration is largely a way of building and mediating relations.
The Red team had committed to evaluating WideNoise for the consortium with the idea that the test data would allow the calibration to be improved. I made an appointment
to use an anechoic chamber and invited a technical member of the team to make sure we were correctly carrying out the procedure. We had brought different smartphones to check how the app would behave on different hardware and operating systems. Be-fore I arrived at the lab, I received a text message from the scientist who ran the facility, asking me to buy AA batteries for powering the main reference meter. I had imagined the anechoic chamber would be a hygienic white space, but instead we were led into a small dusty chamber with a broken office chair hanging from netting. The speakers in the chamber didn’t work, so the scientist propped a single large speaker in the corner of the chamber and told us that we would have to generate our own calibration audio from our laptop. We taped the smartphones onto a wooden board using duct tape that was precariously balanced on the office chair that was swaying in the netting. The door of the chamber could not be sealed since we had to leave the laptop outside and run an audio cable to the speakers in the chamber.
The setup of the chamber felt primitive but we tried to follow a rigorous test procedure laid out by D’Hondt et al. (2013). The problem was that there was no clear measure-ment standard that we could use to associate the app with the reference meter. The app claimed to measure decibel but didn’t specify any psychoacoustic weighting. We tried using unweighted decibel, dB(z) but the discrepancy to the reference meter was huge. Af-ter some experimentation, it became clear that WideNoise had actually been calibrated against dB(a) weighting without this being stated in the app or documentation. Once we used this standard, the app and reference meter became relatable. The second issue was that the smartphone microphones were extremely directional, making the position-ing of the hardware and speakers very tricky. Turnposition-ing a phone a millimetre to one side or the other would radically alter the measurements. This directionality would make it hard to create accurate readings in a real world context. We repeated the testing procedure but each time a few devices fluctuated wildly and we had to create an average of the readings. The graph we produced (Figure 6.5) shows that WideNoise responded very dif-ferently running on the different smartphone hardware with strong discrepancies as large as 20dB(a) with quiet sound. To put this in context, an increase of 3dB(a) is considered to be twice as loud for the human ear, meaning that some hardware measured values 6 times louder than others. Above 50dB(a) the difference between the hardware was lower but below that threshold the measurements fluctuated unpredictably. Crucially the
two identical hardware phones had very similar readings. This suggested that the mea-sured data was not entirely random and that it would be possible to create a hardware profile for the different smartphones and thus calibrate WideNoise. This multiple profiles approach had already been successfully demonstrated by the NoiseTube app (D’Hondt et al. 2013), where it had allowed high quality noise measurements. The evaluation in the chamber demonstrated that the WideNoise app used a crude calibration algorithm but also that it would be possible to make WideNoise more accurate by implementing hardware profiles.
Figure 6.5: Test data comparing the WideNoise app running on a variety of smartphone platforms and hardware against a Class 1 reference meter (black bar). The eight set of results (A-H) show the response at different sound pressure levels.
When we presented these results to the EveryAware consortium, we proposed that the app could be improved by adding hardware profiles. However, none of the consortium teams wanted to take charge. One of the team leaders explained that graphical changes to the visualisation of the data were easy but that calibration was difficult since new smart-phone hardware would keep being released, meaning that new profiles would continually have to be created. It is interesting to compare WideNoise with the AirProbe device.
There the calibration process involved thousands of man hours spread across years of de-velopment and involved the whole consortium in detailed discussions about the minutiae of calibration (section 5.2). So why was calibrating WideNoise not deemed important?
The key difference was that the Green team felt responsible for air pollution and AirProbe,
while none of the consortium partners felt any ownership of WideNoise. As the AirProbe study showed, the Blue and Yellow teams were focused on behavioural data, which meant that calibrating WideNoise to comply with environmental noise standards was not crucial.
When I confronted one of the researchers from the Blue team, they suggested that calibra-tion only mattered for participants in as far as it demonstrated that the researchers ‘care about this problem’, yet they didn’t think it was important. During a consortium meeting when the decision not to calibrate was taken, the question was framed as:
“We as a project need to make a decision whether it is worth the effort or whether we can take a more realistic approach, understand we have an error and communicate it”.
The researchers understood that calibration mattered to participants but that it would take a lot of effort for the researchers to implement. In this way the decision to com-municate the level of error rather than fix it can be seen as a tradeoff that indicated the consortium’s priorities. I suggest it was a choice between different environmental realities:
one was a distant and abstract reality where WideNoise might be meaningful for some participants and an academic reality where the app was simply a research object and not so important. This made it easy for the consortium to choose the ‘realistic approach’ of least effort and not calibrate the app. While the consortium released additional versions of the app with minor changes, the calibration algorithm itself was never improved.
While the surreal ritual in the anechoic chamber did not result in any physical changes to the app, it had transformed the assemblage of the device. The consortium members had suggested that they were not surprised by the test data, saying, “we knew that. There is no calibration being done by WideNoise”, nevertheless showing and discussing the data with the consortium had an emotional effect on the way the teams related to WideNoise.
Some of the members seemed pleased when we presented the test data, suggesting that WideNoise had finally been calibrated. For them the procedure in the chamber had
‘calibrated’ the app even without making any actual improvements. Others, on the other hand, were frustrated by the poor test results. During one informal chat I had with one of the researchers from the Green team, they described WideNoise like a crude electronic birthday card with an inbuilt sound chip and proceeded to sing me a deliberately tuneless rendition of ‘Happy Birthday to you’. At the final consortium meeting, when the teams were preparing the microphone and speaker setup, there was some screeching feedback.
When the harsh noise died down one of the team members joked that the noise had been
‘WideNoise’ to which others responded with laughter. Showing the test results seemed to allow the consortium to talk more openly about the app. By evaluating it and forcing the consortium to take an explicit decision on the calibration issue, the Red team had made the priorities of the consortium explicit and made the app more transparent. In my field notes I described the evaluation as an active transformation of the app:
“We are actually building the device by adding a whole new level to WideNoise that from now on will not be removable. It now has an error margin attached to it, even if the technical testing procedure was ridiculous”.
The Red team felt a surprising sense of relief, since the evaluation had confirmed their concerns about the app and transformed it into a known and predictable entity. A member of the Red team argued that WideNoise is ‘arbitrary not random’ meaning that while the app data does not relate to any noise standard, it could be used to indicate low, medium and high sound levels. The evaluation thus became a way of ontologically redesigning the device by setting new expectations amongst the consortium and the Red team about which sound realities the app might be able to sense.