SRIDesk: A Streaming based Remote Interactivity Architecture
for Desktop Virtualization System
Jiewei Wu, Jiajun Wang, Zhengwei Qi, Haibing Guan Shanghai Key Laboratory of Scalable Computing and Systems School of Software, Shanghai Jiao Tong University, Shanghai, China
{zachary21,aeris.j,qizhwei,hbguan}@sjtu.edu.cn
Abstract—In recent years, desktop virtualization trends to
be a new extension of virtualization framework. The existing desktop virtualization systems suffer performance degradation in terms of response time and video quality. However, pre-vious remote access approaches are designed for standalone architectures and require semantic information which is not transparent to OS. So they are not feasible in desktop vir-tualization systems. In this paper, we propose SRIDesk, a Streaming based Remote Interactivity architecture for Desktop virtualization system. SRIDesk resides in the host through intercepting virtual display device, which is transparent to guest OS and its applications. SRIDesk integrates server-push streaming mechanism with H.264 encoder into virtualization system, which provides high quality display with low bandwidth consumption and low latency of interaction.
We have implemented the SRIDesk prototype in a KVM system. Experimental results show that SRIDesk has low CPU-load, low bandwidth and good scalability. We compared SRIDesk with other popular platforms, including X, VNC, RDP and THiNC. SRIDesk outperformed other systems in bandwidth with no more than 2Mbps and 94% video quality. SRIDesk also achieved lowest latency in WAN environment among all systems.
Keywords-Desktop Virtualization; Remote Display;
Virtual-ization System
I. INTRODUCTION
Nowadays, IT organizations are struggling the widening gap of available resources and the demand for Informa-tion Technology resources. Personal desktop computers are ubiquitous in large enterprise, educational institutions and government organizations, while the cost of maintenance and upgrade turn out to be enormous and unmanageable. Under the circumstance, virtualization technology has become a feasible solution for resource consolidation and produced demonstrated cost saving results.
By applying similar framework, Desktop virtualization is rising as an alternative to classical desktop delivery [1]. In desktop virtualization environment, all applications and operating system code are executed in a server in remote data center. End user only needs a thin client which han-dles display, keyboard and mouse combined with adequate processing power for graphical rendering and network com-munication. The client no longer has to keep user state and communicate with server by using a remote protocol. The
protocol allows graphical displays to be virtualized, and transmit user input from the client to the server. Desktop virtualization offers a cost-efficient paradigm shift to ease management complexity, because operating systems, appli-cations and data are kept in a data center. Moreover, since thin client is stateless, it is easy to troubleshoot and replace. However, desktop virtualization still faces technical chal-lenges before being widely accepted. A key challenge is to provide high fidelity display and good interactive experi-ences for end users especially on multimedia applications which is commonly used in desktop computing. Current remote display protocols such as Remote Framebuffer pro-tocol (RFB) [2] and RDP [3] are widely used in desktop virtualization systems [4]. They are mainly designed for low-motion graphical applications, such as text editors. The low-motion graphic changes minor with low frequency. However, those protocols cannot effectively support high-motion scenarios such as video playback and real-time interactions. Because the transport of multimedia data over those protocols is inefficient, requiring high bandwidth to ensure deliver all frames to the client in time [5]. Previous works [6], [7] try to support high fidelity displays or im-prove user experience in thin-client computing architecture. These architectures are dependent on underlying hardware, or have to implement specific hardware driver. However, virtualization technologies make both operating system and application execution abstract from underlying hardware, which means these works cannot be applied to desktop virtualization directly.
We propose SRIDesk, a Streaming based Remote In-teractivity architecture for Desktop virtualization system. Combined with a server-push mode stream architecture and virtualization technology, SRIDesk provides high quality display and good interactive experiences without modifying Guest Operating System nor requiring specific hardware. Its server can also scale screen size to suit small screen devices. The SRIDesk prototype is implemented in KVM [8], a virtualization system. Experimental results show that the system could reach 94% video quality in a 1024x768 display resolution on video playback in LAN and WAN environmen-t, while classic remote systems only achieve no more than 20%. In a high interaction scenario, SRIDesk achieved a
Client Interactive Input Server Visual Interface Virtual Display Device Virtual Loopback Virtual Machine Guest OS RTP Virtual Machine Guest OS
....
Event Manager Video Stream Event Manager Host Control Connection(TCP) Video StreamFigure 1: Overall architecture of SRIDesk
shortest response time in WAN environment compared with other systems.
The remainder of this paper is structured as follows. Section II explores the related work on remote display tech-niques. Section III elaborates on our architecture. Section V presents experimental results measuring our performance and comparing it against other popular desktop virtualization systems. A brief conclusion is given in Section VI.
II. RELATEDWORK
Desktop virtualization is the combination of virtualization system and thin-client computing [1]. Virtualization allows multiple isolated user instances or desktops to run in a real server. Currently, desktop virtualization is designed on a virtualization system, such as Xen [9] and KVM [8]. KVM is a newly mainstream Linux virtualization solution. It consists of a hypervisor and a modified QEMU emulation software. In thin-client computing, many alternative designs have been proposed. X [10] system simply forwards application-level display commands to client for remote display func-tionality. It is cost efficient in server side, but leads to complexity in client. Software on client needs to be updated frequently, and the maintenance cost is inconsistent with the zero maintenance goal of desktop virtualization. In industry, Microsoft Remote Desktop [3] and Citrix XenDesktop [11] are two famous products used in desktop virtualization. Although they are improved in both performance and func-tionality, they also have shortcoming in maintenance cost for that their architectures are similar to X system. VNC [2] uses a virtual driver to maintain local copy of the framebuffer state used to refresh its display and forward user input directly to the server.
Many approaches leverage sematic information of desktop to improve performance of remote interaction. THiNC [6] intercepts low-level video driver commands and adopts a push mode to interactive with client. Although it supports native multimedia playback, it suffers from performance degradation over multimedia content encoding. Muse [12] uses a window-aware updating mechanism to reduce data
traffic and response latency. Since virtual desktop can have various Guest OS and the sematic information will be lost in Virtual Machine Monitor, those approaches are inappropriate to be used in desktop virtualization.
D. De Winter [7] developed a thin-client system that the graphical output was captured through a hardware framegrabber. The dedicated framegrabber device in the system can only support one user each time, meaning that the architecture is not scalable and flexible for supporting multi-users.
Classic remote access systems suffer performance degra-dation in due to inefficient mechanisms, while the aforemen-tioned remote access approaches improve performance and interaction experience by implementing standalone architec-tures which is high CPU-load and not transparent to OS. So they are not feasible in desktop virtualization system.
III. DESIGN OFARCHITECTURE
In order to provide high fidelity display and good user interactions in desktop virtualization system, we propose the SRIDesk to combine virtualization technology with videostreaming architecture. Figure 1 shows an overview of such architecture. Thin clients are connected to the desktop virtualization server through ethernet or wireless network. Users’ applications are executed in Guest OS virtualized by the server.
Transparency: In the server side, the display of a Guest OS is generated in Guest OS’s display driver and then rendered by a virtual display device. To be transparent to Guest OS, SRIDesk modifies the virtual display device that sits below the Guest OS proper so that SRIDesk requires no modification of drawing functionality in Guest OS, resulting in a simpler system that can work seamlessly with existing virtualization systems. SRIDesk intercepts displays rendered by virtual display device and redirects them to virtual loopback, instead of sending to framebuffer. Virtual loopback is a module to create virtual video devices. With virtual loopback, a process can read these devices as if
they were ordinary video devices, which makes multimedia architecture easier to implant in virtualization system.
Push Mode Streaming:The multimedia architecture is a videostream pipeline mode. The videostream in the server is mainly responsible for encoding original display. Encoders such as H.264 and WebM, have high performance in com-pression so that network traffic can be reduced to suit low bandwidth network environment. To improve response time for interactions in desktop virtualization systems, SRIDesk uses a low-latency server-push display update mode to minimize synchronization costs between client and server, because the traditional client-pull mode needs client to send updating requests to server. It leads to higher latency that video frames are generated faster than the rate at which the client can send reqeusts to the server.
Diversified Displays: It becomes important to promise ubiquitous computing access in cloud environment. The client may have different screen sizes and computing ca-pabilities. To deliver on this promise, SRIDesk decouples the original framebuffer size from the display size of which the client is capable. The display resizing is fully supported by the server. The server resizes display size automatically in videostream pipeline, whenever a client reports a specific size to it.
IV. IMPLEMENTATION
We have implemented a prototype server based on QEMU-KVM (version 1.2.0) and a client by extending a simple VNC Viewer. Since SRIDesk only hooks the virtual display device under Guest OS, no modifications are required to applications and operating systems. The rendered images are written into a virtual loopback by using standard Video4Linux interface. To support virtual loopback, we use a v4l2loopback [13] as a kernel module to create virtual video devices. The SRIDesk virtual loopback uti-lizes v4l2loopback’s interfaces and multiplexes its resources. Figure 2 shows the pipeline details of videostream. The server uses display scaler to resize the output of image in virtual loopback to fulfil client’s requirement of specific size. Then pipeline goes to the encoding. The codec we choose to encode is the x264 videocodec [14], an opensource H.264 encoder. Because encoding is in a live streaming environment, we have to minimize the delay of encoder. To reduce x264 codec’s latency, the tune of zerolatency is enabled, which disables rc-lookahead and b-frames. The two parameters are mainly used in offline encoding, which cost high latency. After encoder encodes a frame, the pipeline starts to encapsulate it into RTP packet. Then RTP packets are sent to the client through UDP.
The client depacketizes RTP packets received and writes into decoder. Since video decoding can be an extremely CPU-intensive task especially for higher resolutions, we use hardware-accelerated video decoding to allow CPU to concentrate on other tasks.
RTP Packetlize Encode Display Scale Decode RTP Depacketlize Network Framebuffer
Figure 2: Pipeline of videostream
The communication between the server and the the client is through TCP connections. The event manager in the client handles interaction inputs such as keystroke and mouse click and sends it to the server. Then the event manager in the server handles the request and gives commands to videostream and interactions to the Guest OS.
The remote display protocol is based on the modification of the RFB. The RFB is a client-pull updating mode. The display screen is updated and sent each time the client sends a FramebufferUpdateRequest. Since SRIDesk adopts server-push mode, the client no longer sends Framebuffer-UpdateRequest to the server. To support diversified displays in server, a extension message SetScreenSize is added into protocol. Whenever the client needs to change display size, SetScreenSize is sent from the client to the server.
V. EVALUATION
To demonstrate the effectiveness of SRIDesk, we first tested different scenarios on it and evaluated the overall performance. Then we conducted a direct comparison with a number of widely used platforms. There were X, VNC, Microsoft Remote Desktop and THiNC.
Client Laptop
Raspberry Pi SRIDesk Server Webserver
Figure 3: Experimental Testbed
A. Experimental Setup
We conducted the experiment on our system by using an isolated network. Figure 3 shows our testbed. Server machine had a 2.66GHz Intel Core i7-920 processor and 6Gbyte of RAM. The clients were a 2.0GHz Intel Core II laptop with 1Gbyte of RAM and a Raspberry Pi which was a credit-card-sized single-board computer with a 700MHz ARM11 processor and 512Mbyte memory. We considered two network environments: LAN and WAN. LAN is a 100Mbps network with ideal latency. WAN is a 10Mbps network with 66ms RTT emulated by ns-2.
0 % 10 % 20 % 30 % 40 % 50 % 60 % Office
(Linux)(Windows)Office Browse(Linux)(Windows)Browse (Linux)Photo(Windows)Photo (Linux)Video(Windows)Video
CPU Utilization(%)
Scenarios
Server Client:PC Raspberry Pi
(a) CPU Utilization
0 2 4 6 8 10 Office
(Linux)(Windows)Office Browse(Linux)(Windows)Browse (Linux)Photo(Windows)Photo (Linux)Video(Windows)Video
Bandwidth (Mbps)
Bandwidth
(b) Bandwidth
Figure 4: Resource requirements in server and client in four scenarios and two operating systems
We used four distinguished scenarios with a 1024x768 display resolution to represent low-motion, high-motion, low and high interactive scenarios. Since SRIDesk is transparent to Guest OS, we ran both Linux and Windows Guest OS on SRIDesk server in order to provide a fair comparison with other systems. Four scenarios are listed as below:
• Office: We did a sequence of actions in Openoffice in Linux platforms and Microsoft Office in Windows platforms. Actions included typing, creating objects, editing tables, etc.
• Browsing Web: We browsed a sequence of 30 web pages containing a mix of texts and images. The browser we choose were Mozilla Firefox, since both platforms support it. The web pages were saved on a local web server.
• Photo Editing:Photo editing used a sequence of filters like blur, red eyes removal, sharepen, etc. The GNU Image Manipulation Program and Adobe Photoshop were used in Linux platforms and Windows platforms. • Video Playback: We played an H.264 codec movie clip at 24fps in fullscreen. The video player used was MPlayer 1.0rc4 in both the Linux platforms and the Windows platforms. The clip’s original size is 853x480.
All scenarios were recorded with Xnee in the client to ensure that clients did the same inputs each time. Every scenario lasted at least 5 minutes and was tested 3 times. To perform scenarios on Guest OS, we chose Ubuntu 12.04 for Linux platform and Windows 7 for Windows platform.
To minimize application environment differences, we used common configuration options whenever possible. Display was set to 32-bit color. RDP was set to LAN settings when we tested in LAN environment and WAN settings in WAN environment. Any remaining settings were set to defaults.
0 20 40 60 80 100 0 50 100 150 200 250 300 CPU Utilization(%) time (sec.) 4 scenarios 2 scenarios
Figure 5: CPU utilization in server with 2 and 4 running scenarios
B. Overall Performance
We first ran one scenario at a time. Figure 4 shows the overall performance results of SRIDesk. Figure 4a shows the average CPU utilization in both server side and client side. In the server side, the Office scenario has the lowest CPU-load at 14.69% in Linux and 15.23% in Windows, because it contains a large number of low-motion display updates. The Video playback scenario as the representation of high-motion display has the highest CPU-load among all scenarios, which is around 10% higher than low-motion scenarios. Browsing Web and Photo Editing have some high intensive actions, but human interactions make CPU-load lower. It is also critical to keep CPU-CPU-load as low as possible for client device. Figure 4a also shows that CPU-load in client has no relevant to the scenarios and the average CPU-load of client is about 14.5%. Leveraging hardware-accelerated video decoding, Raspberry Pi’s CPU-load is lower than 50% in average, which means that a function limited thin client can be used in SRIDesk architecture.
0 500 1000 1500 2000 baseline (Windows) baseline (Linux) X11 VNC RDP THiNC SRIDesk (Windows) SRIDesk (Linux) Latency(ms) Platform LAN WAN
Figure 6: Latency in photo editing scenario
Figure 4b shows the average bandwidth cost for each scenario and demonstrates that SRIDesk requires no more than 2Mbps for control and image transmission. Even high-motion scenario needs more screen updates, Video Playback scenario requires only 1.56Mbps. Since Video Playback is the most bandwidth consumption scenario, a comparison of Video Playback bandwidth with other systems will be described in Section V-D.
To test the scalability of SRIDesk system, we ran 2 and 4 scenarios together in a server. We selected Linux system for Web Browsing and Video playback scenario, and Windows for the rest two. Photo Editing and Video playback were used in 2 scenarios, and all used in 4 scenarios. Figure 5 shows the CPU utilization in a server for 5 minute. The CPU-load of server is 46.38% and 75.79% for 2 and 4 scenarios, and running 4 scenarios has 1.7 times higher CPU-load than running 2, which shows that SRIDesk has a good scalability. Figure 5 demonstrates that SRIDesk server can afford at least four clients together in various scenarios without performance degradation in our testbed, since server’s CPU-load does not exceed 100% at any time. C. Interaction Response Analysis
Good response time is the key to overall satisfaction in user experience, especially in highly interactive scenario. So we mainly focused on Photo Editing Scenario. We recorded the time from mouse click or keystroke to a complete screen update of corresponding reaction. A full response time consists of processing time of server and client, transmission time of remote display protocol and application execution time. Because some actions in photo editing lasted for a relative longer application execution time, we also tested Photo Editing scenario on local PC as a baseline in order to exclude the impact of application execution time. Figure 6 shows the average latency of a sequence of actions on Photo Editing scenario on LAN and WAN environment. VNC has the slowest response time for LAN at 635ms, and X11 is the worst for WAN at 1935ms. They are much slower than
0 % 20 % 40 % 60 % 80 % 100 % X11 VNC RDP THiNC SRIDesk (Windows) SRIDesk (Linux) Quality Platform LAN WAN
Figure 7: Video quality in video playback scenario using the formula 1 0 1 10 100 1000 X11 VNC RDP THiNC SRIDesk (Windows)SRIDesk(Linux)
Transferred Data Size(MB)
Platform
LAN WAN
Figure 8: Total data transferred during video playback
others due to client-pull mode or bandwidth limitation. Figure 6 shows SRIDesk gets 300ms latency on LAN environment and does not suffer much performance degra-dation in WAN environment. Although RDP achieves nearly native performance in LAN environment, it suffers greatly degradation in WAN environment and is 1.8 times slower than SRIDesk. SRIDesk provides fastest response time in WAN environment in all platforms. THiNC also achieves a similar result with SRIDesk, because they also adopts a server-push mode.
D. Video Quality Analysis
Multimedia performance is measured using benchmark based on video quality [15], which takes both playback delays and frame drops into consideration. The video quality is calculated according to formula 1. 100% video quality is the optimal quality, which means all video frames are played at real-time speed.
V Q(P ) =
Data(P )/P laybackT ime(P ) IdealF P S(P )
Data(slowmo)/P laybackT ime(slowmo) IdealF P S(slowmo)
Figure 7 shows the video quality results on LAN and WAN environment. X11, VNC and RDP deliver a very poor video quality. RDP has the worst quality for LAN at only 14.83%, and X11 has the worst quality for WAN at no more than 2%. These systems suffer from their mechanisms and algorithms. These algorithms are unable to keep up with the speed of screen updates, leading to frame dropping or longer playback time. VNC drops third of video quality in WAN environment, because it uses a client-pull mode which makes client send screen update requests to server. In a higher latency WAN environment, the rate of update request will be slower than video playback rate, so that some of frames are dropped. However, THiNC and SRIDesk achieve almost ideal video quality. THiNC gets 97.62% and 71.42% video quality in LAN and WAN environment. SRIDesk gets 95% video quality in LAN environment, and 92.94% the highest one in WAN environment benefiting from using a server-push mode.
Figure 8 shows the total data transferred during the video playback for each system. SRIDesk is transparent to Guest OS, so the video performance and data size are almost same in two operating systems. Among all systems, SRIDesk is the most bandwidth for video playback. It sends 6.09MB and 6.08MB of data in LAN and WAN environment, with 94% video quality in average. Although THiNC’s video quality is slightly better than SRIDesk in LAN environment, it needs more bandwidth. Since THiNC intercepts drawing commands in the driver of Guest OS, it has to contain more semantic information of display which increases the size of data. Due to low efficient algorithm, other systems need to send much more data to the client.
VI. CONCLUSION
We presented SRIDesk, a Streaming based Remote Inter-activity architecture for Desktop virtualization system. By intercepting virtual display driver under Guest OS, SRIDesk works seamlessly with unmodified applications, desktops and operating systems. SRIDesk leverages videostream ar-chitecture and efficient video codec to provide high quality display updates with low latency and bandwidth consump-tion. With server-side display scaling, SRIDesk can also support small screen devices.
We have measured SRIDesk’s overall performance on various scenarios in different network environments and compared it with widely used remote desktop systems. Experimental results show that SRIDesk can deliver a good user experience with low CPU-load in both client and server. Moreover, SRIDesk shows good scalability that server can afford at least 4 clients without performance degradation. SRIDesk also provides good response time in both LAN and WAN environment and outperforms other systems in WAN environment. SRIDesk gives the same top video quality with THiNC system and uses fewer bandwidth.
VII. ACKNOWLEDGEMENTS
This work is supported by the Program for PCSIRT and NCET of MOENSFC (No. 61073151, 61272101), 863 Program (No. 2011AA01A202, 2012AA010905), 973 Pro-gram (No. 2012CB723401), the key proPro-gram (No. 313035) of MOE, and International Cooperation Program (No. 11530700500, 2011DFA10850).
REFERENCES
[1] G. Lai, H. Song, and X. Lin, “A service based lightweight desktop virtualization system,” in Service Sciences (ICSS),
2010 International Conference on. IEEE, 2010, pp. 277–
282.
[2] T. Richardson, Q. Stafford-Fraser, K. Wood, and A. Hop-per, “Virtual network computing,” Internet Computing, IEEE, vol. 2, no. 1, pp. 33–38, 1998.
[3] “Windows Remote Desktop Protocol (RDP),” http://msdn2. microsoft.com/en-us/library/aa383015.aspx.
[4] C. Border, “The development and deployment of a multi-user, remote access virtualiztion system for networking, security, and system administration classes,” 2007.
[5] L. Deboosere, J. De Wachter, P. Simoens, F. De Turck, B. D-hoedt, and P. Demeester, “Thin client computing solutions in low-and high-motion scenarios,” in Networking and Services,
2007. ICNS. IEEE, 2007, pp. 38–38.
[6] R. A. Baratto, L. N. Kim, and J. Nieh, “Thinc: a virtual dis-play architecture for thin-client computing,” in Proceedings
of the 20th ACM Symposium on Operating Systems Principles 2005(SOSP), 2005, pp. 277–290.
[7] D. De Winter, P. Simoens, L. Deboosere, F. De Turck, J. Moreau, B. Dhoedt, and P. Demeester, “A hybrid thin-client protocol for multimedia streaming and interactive gaming applications,” in Proceedings of the 2006 international
work-shop on Network and operating systems support for digital
audio and video. ACM, 2006, p. 15.
[8] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “kvm: the linux virtual machine monitor,” in Proceedings of
the Linux Symposium, vol. 1, 2007, pp. 225–230.
[9] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in ACM SIGOPS Operating Systems Review, vol. 37, no. 5. ACM, 2003, pp. 164–177.
[10] R. Scheifler and J. Gettys, “The x window system,” ACM
Transactions on Graphics (TOG), vol. 5, no. 2, pp. 79–109,
1986.
[11] “Citrix XenDesktop Desktop Virtualization, Virtual Desktop-s,” http://www.citrix.com/xendesktop.
[12] W. Yu, J. Li, C. Hu, and L. Zhong, “Muse: a multimedia streaming enabled remote interactivity system for mobile devices,” in Proceedings of the 10th International Conference
on Mobile and Ubiquitous Multimedia. ACM, 2011, pp.
216–225.
[13] “v4l2loopback - video for linux 2(v4l2) loopback device,” http://code.google.com/p/v4l2loopback.
[14] L. Aimar, L. Merritt, E. Petit, M. Chen, J. Clay, M. Rullgrd, C. Heine, and A. Izvorski, “x264-a free h264/avc encoder,”
Online, 2005.
[15] J. Nieh, S. Yang, and N. Novik, “Measuring thin-client perfor-mance using slow-motion benchmarking,” ACM Transactions
on Computer Systems (TOCS), vol. 21, no. 1, pp. 87–115,