4.4.1 Far-end Video Component
A useful application to optimise the user experience was the inclusion of a video element allowing users to monitor the control of any remote audio processing hardware. This ultimately would replace the need for organising a concurrent Skype connection between the server and a client computer.
58 YouTube Live-Video
An immediate focus was to utilise one of the popular social media live streaming platforms that support real-time video. Facebook and YouTube are two of the largest social media providers that allow their users to produce one-to-many live video feeds for remote audiences. During this stage of development, Facebook’s infrastructure to support live videos was primarily based on the use of mobile devices. YouTube, however, showed potential to be a viable video option, especially with its offering of HTML code that can allow videos from their platform to be embedded into an external webpage. In order to live stream a video feed from a desktop computer or non-mobile device, additional encoding software was required to capture the camera feed and send it to YouTube. To achieve this, the open source software Open Broadcaster Software (OBS) Studio was used. The encoder software needed to be provided with a stream key from an active YouTube account, where afterwards the YouTube streaming server encodes the PCs audio and video feed for playback. Once the desired encoding properties for a video stream are set and the media is encoded, OBS provides a URL where the streamed content can be observed in addition to being watched from YouTube’s livestream dashboard.
Figure 4.14 YouTube live media stream test (https://youtu.be/qt1UXeNCdtE).
The video quality was sufficient for hardware monitoring, however, the latency of the stream provided less than desirable results, as roughly 10 seconds of delay was observed and was deemed unsuitable for real-time control applications in this research.
59 WebRTC
Another option for media streaming was a fairly new concept called WebRTC (Web Real Time Communication), which enables real-time communication protocols for audio, video, and data transfer through web browsers and the Internet (WebRTC, 2011a). One of WebRTC’s noteworthy qualities is that it enables Skype-like video conferencing with very low latency self-contained within a web browser. WebRTC utilises APIs that allow a user to capture media streams from input devices, built around the getUserMedia() command, and shares these streams with remote devices using RTCPeerConnection (Dutton, 2012). While WebRTC enables the functionality to access media inputs on a computer and share the media information with a peer, the actual connection between peer devices is facilitated by a process called signalling, which is not specified within the WebRTC standards so that developers have the flexibility of implement their desired mechanisms for connecting peers.
Signalling sets parameters for peer computing devices to find and identify each other behind secured networks and establish connections to exchange data. This process “involves network discovery and NAT [Network Access Translation] traversal, session creation and management, communication security, media-capability metadata and coordination, and error handling” (Castrounis, 2015). Due to security protocols, most computers connected to large enterprise networks (such as school or business networks) sit behind a firewall which regulates Internet traffic between the internal machine and external computers to help protect the internal network and computing devices from outside threats. To facilitate communication with the external network, a network access translation (NAT) device sets parameters allowing internal devices to be identified securely on the external network. Any device connected within a network is provided a private I.P. address for identification purposes. NAT devices, however, help establish a separate public I.P. address for computing devices that is displayed outside of network, allowing these devices to be identified to the outside world beyond the firewall (Castrounis, 2015). External devices can then use the public I.P. address to establish communication and send or request information to internal machines, where these requests are again managed by the firewall and NAT device and, if allowed, delivered to the internal machine using the private I.P. address.
For WebRTC communication to occur, devices rely on STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers that request identifying information about the machine or device and subsequently present this information to external devices. After identification, the signalling process occurs, creating the method to “negotiate and establish the network session connection with [the] peer” (Castrounis,
60 2015). As WebRTC does not provide APIs or support to handle network traversal and signalling, the PubNub API was used as it unified all the components of WebRTC and signalling into one package. Pubnub offered tutorials on establishing a one-to-many WebRTC video stream with options for embedding the video into a live webpage (Gleason, 2015). The code was modified to develop a video feed for this research and is found in Appendix O and P.
Figure 4.15 Pubnub-enabled WebRTC live stream test (https://youtu.be/FB9caB-OtAk).
WebRTC proved to be a suitable option for real-time video feedback, showing negligible latency speeds of less than a second when controlling a remote audio processor. Additionally, since the code was written using HTML and JavaScript, the video feed was able to be embedded into the web interface alongside the Websockets controls. A separate WebRTC video broadcasting page was used capture the video of a desired audio processor and that was broadcast to the web interface for viewing by the client.
4.4.2 Follow-up Evaluations to Real-time Audio Streaming
WebRTC offered an effective way to share media data over a webpage and had potential to consolidate control, video, as well as audio streaming into a unified interface, a desirable trait not currently applicable with JackTrip as it is a self-contained, external application. Observing this, new focuses concentrated on evaluating and comparing JackTrip and WebRTC to determine which would be a better solution for network-based music transfer.
61 A full assessment of JackTrip and WebRTC used for real-time audio streaming is detailed in Chapter 5. However, a comparison of both systems concluded that WebRTC provided processes ideal for voice data and video conferencing applications, whereas JackTrip was optimal in supporting a wider spectrum of sound frequencies better suited for musical instruments and audio files. Although WebRTC’s native development in HTML5 and JavaScript was convenient for developing a centralised audio, video, and control user experience, JackTrip provided greater reliability in audio streaming and better accuracy preserving musical sounds across the Internet.
4.4.3 Updating the Control Interface
A reoccurring issue within the design and build stage was the responsiveness of the control system; data was effectively transmitted from the web interface to remote audio devices, but the response of the virtual control system lacked precision. An original focus of the web interface was to model the controls after recognisable elements on a standard audio device, such as a rotary knob or dial. HTML5 accurately handles input data from many virtual objects, such as graphical buttons or sliders, but there were no simple solutions for developing a knob and this task ultimately required using external resources and amendable, open-source JavaScript code.
With regard to solutions, Baskar (2017) states, “typically, an IoT solution needs to handle multiple data types from multiple devices on a user interface (UI) that flows seamlessly across interfaces… With such diversity at many levels, UX design becomes incredibly complex for IoT solutions.” One of the main issues with the web interface was that the virtual dial implemented in the earlier design produced linear data to manipulate logarithmic tapered potentiometers with undetermined log ratios. Additionally, the dial needed to be controlled by screens of different resolution sizes, resulting in issues with the tactile experience. While the dial could often be rotated as desired, the response did not feel natural (ex: when the knob was rotated to 50% of its max output, 50% of the processing affect should occur, however, this was not the case with the logarithmic potentiometer). In addition, changes to the screen resolution, including zooming in or out of the web interface page, prevented the dial from responding accurately to user input. Baskar (2017) argues that choosing complex approaches to simple IoT solutions is often a problem, stating “once these multiple data types from multiple devices are together, the end user needs to access a simple yet informative visualization on any interface they want”. The initial tests controlling a motorised potentiometer using serial keyboard inputs
62 worked well at rotating a motorised potentiometer, so the rotary dial was scaled back to virtual push buttons that determined the direction of rotation for the motors. The push buttons provided better feedback than the rotary dial and more accurate movement of the motor.