Content Domain Artefacts - Developing a high definition video quality of experience model based

v. Synchronization (Lip Sync) (b) Display

i. Resolution ii. Refresh Rate iii. Number of bits iv. Display Technology

v. Form Factor (c) Environment

i. Ambient Light ii. Ambient Noise iii. Vibration iv. Wind

(d) Navigation and user behaviour i. Structure of Navigation ii. Multimodal inputs

Looking at these factors we realized that the end to end video communication chain can be divided into three groups or domains. Figure 2.2 shows these domains.

2.4 Content Domain Artefacts

Content domain consist of acquisition and coding. Acquisition artefacts can be avoided if skillful professional use high quality equipment. Coding artefacts are usually introduced as a compromise to the requirement of limited bandwidth and storage. The following discussion highlights the important factors.

2.4.1 Acquisition

There is an upper bound on the quality of the content as determined by the source of the original content. However, viewers may be accepting of poorer quality given the circumstances of the originating environment. One example may be for news, viewers would accept poor quality video if sophisticated equipment cannot be used to obtain the news footage. High commercial valued content such as sports coverage needs high performance optical systems and low noise image sensors for acquisition. Charge Coupled Devices (CCDs) frequently used in most digital cameras introduce a number of different noise like electronic fixed patterns and “dark” noise. Before acquisition, a digital camera performs dark reading and later subtracts it from the exposure signal.

Other reasons for noise introduction are the fill factor of pixels or the actual percentage of sensor elements. Smaller elements have a higher inherent noise ratio. Noise is also introduced during an analogue to digital conversion of image data. The noise level is high in situations with low light where the noise to signal ratio will be highest. Blooming or light spill-over is caused by photons spilling from one sensor element to another creating what can be a whole region of over-fill, resulting in highlight blow out and or weird colours in these areas. When a relatively small sensor array is used to create an image, pixilation becomes very apparent. Larger sensor arrays are more expensive but supply enough information to produce a more lifelike picture. “Christmas tree lights” is a variation of colour aliasing artefacts. On many sensors, filtration is applied with twice as many green pixels as red and blue and this is done to emulate human vision. This results, if blown up especially on diagonal lines, in an unreal mosaic of colours.

There are many different ways to expand an image’s size, such as Linear, Bilinear and Bicubic methods. Bicubic interpolation is widely regarded as the best method. Camera manufacturers create their own interpolation systems specific for the task and hence create some undesired effects. Some loss of perceived sharpness occurs at the capture stage with almost all sensing devices, and digital techniques are used for com- pensation. 24-bit colour information is inadequate in some cases. Photographing a rose with vivid red colours would certainly require more bits for proper representation as the 24-bit palette has, in effect, just 255 levels of pure red to represent the flower’s colours. That’s why a capture system that uses a 30 bit or 36 bit representation is better as it offers more colour and tone choices. Hence we conclude our discussion in this section by noting that selection of hardware is very important for end to end quality and these factors must be considered for measurement of quality.

2.4.2 Coding

Coding in this context refers to both source coding and channel coding. To achieve bandwidth efficiency, higher levels of compression need to be achieved. This leads to compression artefacts such as blockiness, blur, ringing, mosquito noise, jitter etc. Error correction coding is performed at the cost of influencing the effective data rate. Higher levels of error detection and correction need more redundant bits. Advanced video coding techniques based on MPEG-4 part-10 have built-in error correction ability. Recent advances in scalable video coding can offer the capability to code only once to meet a wide range of viewer display requirements. There are number of research papers that have reported on the evaluation of compression artefacts such as blockiness, blur, ringing, mosquito noise [30] [31] and jitter [32] [33]. Coding in this context refers to both source coding and channel coding.

2.4. CONTENT DOMAIN ARTEFACTS 21

need for online measurement systems. This avoids the usual costly (in terms of time and money) subjective techniques which are also a non-repeatable method for collecting subjective scores. Janowski et al. [34] used mapping models which were constructed using the Generalized Linear Model (GLZ). They are a generalization of the least squares regression in statistics for ordinal data. They were able to compute overall qualitative image distortions based on partial quantitative distortions from component algorithms operating on specified image features.

According to Wolf et al. [35] the current objective video quality measures achieved good prediction of subjective ratings. Cerqueira et al. [36] proposed a metric for mea- suring video artefacts and the results show that although correlation with subjective scores is quite high, the blockiness metric has a lower correlation. Maalouf et al. [37] propose a Reduced Reference (RR) perceptual Image Quality Measure (IQM) based on the grouplet transform and they claim that the proposed method performs well and has good consistency with subjective quality assessment. They performed ratio- nal sensitivity thresholding to obtain the sensitivity coefficients of both images based on human visual system properties. Narvekar et al. [38] presented work which is a no-reference objective sharpness metric based on a cumulative probability of blur detection. Their work also takes into account the Human Vision System (HVS) response to blur distortions and they claimed better performance for images that have back- ground and foreground blur distortions which are different. Whereas Zhu et al. [39] worked on sharpness metric detecting both blur and noise based on image gradients. Their proposed metric behaves as an indicator of the signal to noise ratio but there is a need for prior estimation of noise variance. Mosquito noise is a compression noise which has temporal aspects for which Mantel et al. [30] presented a spatio-temporal and compression independent method to remove mosquito noise.

Work done by Ninassi et al. [40] developed a perceptual full reference video quality assessment metric which was focusing on the temporal evolutions of the spatial distortions. The technique that they used assimilated temporal variations of spatial distortions at the eye fixation level and whole video sequence into short and long term temporal pooling. Rahayu et al. [41] concluded that there is no objective model that comes out as a best performer from a statistical point of view for high quality data. They compared Peak Signal to Noise Ratio (PSNR), Multi Scale Structural Similarity (MS-SSIM) and Single Scale Structural Similarity (SS-SSIM). Hence SSIM was not able to perform better than PSNR as it does in the case of standard or low quality images. Staelens et al. [42] tested full length movies through subjective testing. Their aim was to study the scalability effects on user perception while using scalable video coding extensions of MPEG 4. The study reveals that users favor temporal scalability over quality scalability. Reiter et al. [43] conducted a study to evaluate the PSNR’s

effectiveness in estimating relative subjective quality levels for different types of quality distortions. The results show that PSNR is not a reliable metric for doing such tasks. It can’t be relied upon for assessing the combined effects of compression and transmission artefacts.

The most notorious artefacts which occur due to different techniques of compression must be analyzed from an end to end perspective. Blur and blockiness are introduced due to low pass filtering and block processing respectively. Ghosting appears due to multipart effects because of the mismatch between connections either electrical or optical. Colour bleeding occurs due to colour sub sampling and quantization of colour information whereas mosquito noise appears due to the removal of high frequency components in temporal video data i.e. between frames and ringing occurs due to removal of high frequency components in the spatial domain i.e. from a single frame.

In document Developing a high definition video quality of experience model based on influential parameters : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North (Page 33-36)