Mismatch Control
RIGHT_BORDER. D: BOTTOM_BORDER
7.8 Generalized scalable decoding
7.8.1 Temporal scalability
Temporal scalability involves two layers, a lower layer and an enhancement layer. Both the lower and the enhancement layers process the same spatial resolution. The enhancement layer enhances the temporal resolution of the lower layer and if temporally remultiplexed with the lower layer provides full temporal rate.
7.8.1.1 Base layer and enhancement layer
In the case of temporal scalability, the decoded VOPs of the enhancement layer are used to increase the frame rate of the base layer. Figure 7-31 shows a simplified diagram of the motion compensation process for the enhancement layer using temporal scalability.
Framestore Addressing
Vector Decoding
Σ
Framestores
Half-pel Prediction
Filtering
Saturation Vector
Predictors From
Bitstream
Decoded samples
f[y][x] d[y][x]
p[y ][x ] vector[r][s][t]
Half-Pel Info.
Scaling for Colour Components
vector'[r][s][t]
Lower Layer Decoder Lower Layer
Bitstream
Figure 7-31 Simplified motion compensation process for temporal scalability.
Predicted samples p[y][x] are formed either from frame stores of base layer or from frame stores of enhancement layer. The difference data samples f[y][x] are added to p[y][x] to form the decoded samples d[y][x].
There are two types of enhancement structures indicated by the “enhancement_type” flag. When the value of enhancement_type is “1”, the enhancement layer increases the temporal resolution of a partial region of the base layer. When the value of enhancement_type is “0”, the enhancement layer increases the temporal resolution of an entire region of the base layer.
7.8.1.2 Base layer
The decoding process of the base layer is the same as non-scalable decoding process.
7.8.1.3 Enhancement layer
The VOP of the enhancement layer is decoded as either I-VOP, P-VOP or B-VOP. The shape of the VOP is either rectangular (video_object_layer_id is “00”) or arbitrary (video_object_layer_id is “01”).
7.8.1.3.1 Decoding of I-VOPs
The decoding process of I-VOPs in enhancement layer is the same as non-scalable decoding process.
7.8.1.3.2 Decoding of P-VOPs
The reference layer is indicated by ref_layer_id in Video Object Layer class. Other decoding process is the same as non-scalable P-VOPs except the process specified in 7.8.1.3.4 and 7.8.1.3.5.
For P-VOPs, the ref_select_code is either “00”, “01” or “10”.
When the value of ref_select_code is “00”, the prediction reference is set by the most recently decoded VOP belonging to the same layer.
When the value of ref_select_code is “01”, the prediction reference is set by the previous VOP in display order belonging to the reference layer.
When the value of ref_select_code is “10”, the prediction reference is set by the next VOP in display order belonging to the reference layer.
7.8.1.3.3 Decoding of B-VOPs
The reference layer is indicated by ref_layer_id in Video Object Layer class. Other decoding process is the same as non-scalable B-VOPs except the process specified in 7.8.1.3.4 and 7.8.1.3.5.
For B-VOPs, the ref_select_code is either “01”, “10” or “11”.
When the value of ref_select_code is “01”, the forward prediction reference is set by the most recently decoded VOP belonging to the same layer and the backward prediction reference is set by the previous VOP in display order belonging to the reference layer.
When the value of ref_select_code is “10”, the forward prediction reference is set by the most recently decoded VOP belonging to the same layer, and the backward prediction reference is set by the next VOP in display order belonging to the reference layer.
When the value of ref_select_code is “11”, the forward prediction reference is set by the previous VOP in display order belonging to the reference layer and the backward prediction reference is set by the next VOP in display order belonging to the reference layer. The picture type of the reference VOP shall be either I or P (VOP_coding_type = “00” or “01”).
When the value of ref_select_code is “01” or “10”, direct mode is not allowed. MODB shall always exist in each macroblock, i.e. the macroblock is not skipped even if the co-located macroblock is skipped.
7.8.1.3.4 Decoding of arbitrary shaped VOPs
Prediction for arbitrary shape in P-VOPs or in B-VOPs is formed from a forward reference VOP defined by the value of ref_select_code.
For arbitrary shaped VOPs with the value of enhancement_type being “1”, the shape of the reference VOP is defined as an all opaque rectangle whose size is the same as the reference layer when the shape of reference layer is rectangular (video_object_layer_shape = “00”).
When the value of ref_select_code is “11” and the value of enhancement_type is “1”, MODB shall always exist in each macroblock, i.e. the macroblock is not skipped even if the co-located macroblock is skipped.
7.8.1.3.5 Decoding of backward and forward shape
Backward shape and forward shape are used in the background composition process specified in section 8.1. The backward shape is the shape of the enhanced object at the next VOP in display order belonging to the reference layer. The forward shape is the shape of the enhanced object at the previous VOP in display order belonging to the reference layer.
For the VOPs with the value of enhancement_type being “1”, backward shape is decoded when the load_backward_shape is “1” and forward shape is decoded when load_forward_shape is “1”.
When the value of load_backward_shape is “1” and the value of load_forward_shape is “0”, the backward shape of the previous VOP is copied to the forward shape for the current VOP. When the value of load_backward_shape is “0”, the backward shape of the previous VOP is copied to the backward shape for the current VOP and the forward shape of the previous VOP is copied to the forward shape for the current VOP.
The decoding process of backward and forward shape is the same as the decoding process for the shape of I-VOP with binary only mode (video_object_layer_shape = “10”).