Implementation Of Cross Search Algorithm For Motion Estimation Using Matlab

(1)

(2)

i

IMPLEMENTATION OF CROSS SEARCH (CS) ALGORITHM FOR MOTION ESTIMATION USING MATLAB

RAUDZATUL ADAWIAH BINTI YUNOS

This report is submitted in partial fulfillment of the requirements for the award of Bachelor of Electronic Engineering (Telecommunication Electronics) With Honours

(3)

ii

DECLARATION

I hereby, declared this thesis entitled “Implementation of Cross Search (CS) Algorithm for Motion Estimation using MATLAB” is the results of my own research

except as cited in references.

Signature : ……….

Author‟s Name : RAUDZATUL ADAWIAH BINTI YUNOS

(4)

iii

(5)

iv

ACKNOWLEDGEMENTS

(6)

v

ABSTRACT

(7)

vi

ABSTRAK

(8)

vii

CONTENTS

CHAPTER TITLE PAGES

PROJECT TITLE i

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK vi

CONTENTS vii

LIST OF TABLE ix

LIST OF FIGURE x

LIST OF ACRONYMS xii

1 INTRODUCTION

1.1 Project Background 1

1.2 Objective Project 2

1.3 Problem Statement 2

1.4 Scope of Project 3

2 LITERATURE REVIEW

2.1 Video Compression and Coding Technique

Technique

(9)

viii

2.1.1 Introduction on Video Compression 4

2.1.2 Coding Technique 5

2.1.3 Video 6

2. 2 Motion Estimation 6

2.2.1 Identifies the True Motion 7

2.2.2 Removing Temporal Redundancy 8

2.3 Block Matching Algorithm 9

2.4 Searching Method 10

2.4.1 Full Search Algorithm 11

2.4.2 New Three Steps Search (NTSS) 12

2.4.3 Diamond Search (DS) 13

2.4.4 Cross Diamond Search (CDS) 16

2.4.5 Four Step Search (FSS) 19

3 METHODOLOGY

3.1 Project Planning 22

3.1.2 Data Acquisition on Literature Review 23

3.1.3 Development and Implementation 23

3.1.4 Performance Analysis 24

3.1.5 Presentation and Seminar Matlab

24

3.1.6 Thesis Writing Submission 25

3.2 Project Flow Chart 25

4 CROSS SEARCH ALGORITHM (CS)

4.1 Introduction to Cross Search Algorithm 26

4.2 CS Steps and Method of Search 27

4.4 CS Flowchart 31

(10)

ix

5 RESULTS AND DISCUSSIONS

5.1 Performance of CS for single frame sequence 32

5.1.1 Akiyo sequence for frame no. 1 to no. 2. 33 5.1.2 Claire sequence for frame no. 1 to no. 2. 33 5.1.3 Coastguard sequence for frame no. 1 to no. 2. 34 5.1.3 Foreman sequence for frame no. 1 to no. 2. 34 5.1.5 News sequence for frame no. 1 to no. 2. 35 5.1.6 Salesman sequence for frame no. 1 to no. 2. 35 5.1.7 Tennis sequence for frame no. 1 to no. 2. 36 5.2 Average Search Points and PSNR for 1 frame sequence. 37

5.3 Comparison of CS Against all Algorithms 38

5.3.1 Average Search Points for all Algorithms 39

5.3.2 Average PSNR for all Algorithms 44

5.3.3 Elapse Time for all Algorithms 49

5.3.4 Search Points Speed 49

6 CONCLUSION 51

7 REFERENCES 52

(11)

x

LIST OF TABLES

NO TITLE PAGE

5.1 Average PSNR and Search Points of CSA for 1 Frame 38 5.2 Average Search Points for 1st to 30th frame 43

5.3 Average PSNR for all Algorithms 47

5.4 Elapse Time for 1-30 frames simulation (s) 49

(12)

xi

LIST OF FIGURES

NO TITLE PAGES

2.1 Video Coding Layer 5

2.2 Predictive sources coding with motion compensation 8

2.3 Macro Block 9

2.4 NTSS Flowchart 13

2.5 Steps of DS 15

2.6 DS Flowchart 15

2.7 CDS steps 17

2.8 CDS Flowchart 18

2.9 Search patterns of the 4SS. 20

2.10 Two different search paths of 4SS. 20

2.11 4SS Flowchart 21

3.1 Flow of the Project 25

4.1 Illustration 1 for CSA steps 29

4.2 Illustration 2 for CSA steps 29

4.3 Illustrations 3 for CSA 30

4.4 CSA Flowchart 31

5.1

(a) Original Image 33

(b) Predicted Image 33

5.2

(13)

xii

5.3

5.4

5.5

5.6

5.7

5.8 Average Search Points for Akiyo (1-30) 39

5.9 Average Search Points for Claire (1-30) 39

5.10 Average Search Points for Coastguard (1-30) 40

5.11 Average Search Points for Foreman (1-30) 40

5.12 Average Search Points for News (1-30) 41

5.13 Average Search Points for Salesman (1-30) 41

5.14 Average Search Points for Tennis (1-30) 42

5.15 Average PSNR (dB) for Akiyo sequence 44

5.16 Average PSNR (dB) for Claire sequence 45

5.17 Average PSNR (dB) for Coastguard sequence 45

5.18 Average PSNR (dB) for Foreman sequence 46

5.19 Average PSNR (dB) for Salesman sequence 46

(14)

xiii

LIST OF ACRONYMS

AVI - Audio Video Interleave WMV - Windows Media Format MPEG - Moving Pictures Expert Group BDM – Block Distortion Measure BMA – Block Matching Algorithm CCB – Cross Centre Biased

CCITT – International Telegraph & Telephone Consultative Committee CDS – Cross Diamond Search

CS – Cross Search

DCT – Discrete Cosine Transform DS – Diamond Search

FS – Full Search

FSS – Four Step Search GOP – Group Of Picture

IDCT – Inverse Discrete Cosine Transform JPEG – Joint Photographic Experts Group LDSP – Large Diamond Search Pattern LSI – Large Scale Integration

(15)

xiv

MAE – Mean Absolute Error MBD – Minimum Block Distortion ME – Motion Estimation

MPEG – Moving Picture Expert Group MSE – Mean Square Error

MV- Motion Vector

NTSS – New Three Step Search PC – Personal Computer

PSNR – Peak Signal To Noise Ratio SDSP – Small Diamond Search Pattern VLC – Video LAN Client

(16)

1

CHAPTER 1

INTRODUCTION

1.1 Background

(17)

2

1.2 Objectives

The main aim of this project is to implement the Cross Search (CS) Algorithm that can overcome the problem faced when using the Full Search (FS) Algorithm in achieving high compression ratio in video coding. To achieve this main aim, the objectives of this project are as follow:

1. To study how the Block Matching Algorithm (BMA), FS Algorithm and Cross Search Algorithm works as they been implemented into MATLAB.

2. To understand and observe the difference between the FS and CS on their way of process, time-taken and the quality of output produced in various types of video. 3. To know and understand the basic functions of the others fast BMAs with CS

and compare their performances with CS in difference aspects.

4. To conclude and justify the best algorithm developed due to some aspects of assessments.

1.3 Problem Statement

(18)

3

1.4 Scope

(19)

4

CHAPTER 2

LITERATURE REVIEW

In this chapter, the background study of the project will be evaluated. The important features in this project such as video and the algorithm details are going to be described further.

2.1 Video Compression and Coding Technique

In this subchapter, the needs of video compression, the coding technique and some explanation about selected video also will be included.

2.1.1 Introduction on Video Compression

(20)

5

Image and video data compression has been found to be necessary in several important applications such as visual transmission and storage. This is because, the huge amount of data involved in these and other applications, usually very much exceeds the capability of existing hardware although the technologies in related industries are growing up.

Data representing information carried and the quantity of data exactly can be measured. In the context of digital image and video, data are usually measured by the number of binary units or bits. The bit rate which also known as the coding rate, is an important parameter in image and video compression and is frequently expressed in a unit of bits per pixel (bpp). The term pixel is an abbreviation for picture element as is sometimes referred to as pel. In information source coding, the bit rate is sometimes expressed in a unit of bits per symbol.

2.1.2 Coding Technique

(21)

[image:21.612.141.516.115.353.2]

6

Figure 2.1 Video Coding Layer [1]

2.1.3 Video

There are many formats of video that have developed. Some common types been uses are as follows:

i. Audio Video Interleave (AVI) format. Videos stored in the AVI format havethe extension .avi.

ii. Windows Media Format (WMV). Videos stored in the WMV format have the extension .wmv.

iii. Moving Pictures Expert Group (MPEG). Videos stored in the MPEG format have the extension .mpg or mpeg.

iv. Quick Time format. Videos stored in this format have the extension .mov.

(22)

7

For this project, the videos that have been chosen for implementation are in AVI format. The standard Common Intermediate Format (CIF) video sequences used in this kind of project are Akiyo.avi, Claire.avi, Coastguard.avi, Foreman.avi, Salesman.avi and Tennis.avi. All these videos have been used as the standard reference video in ME research.

2.2 Motion Estimation

ME is a process to estimate the pels or pixels of the current frame from reference frame(s). The temporal prediction technique used in video is based on ME. The basic premise of ME is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames.

These techniques is using the block matching technique which exploit different search patterns and search strategies for finding the optimum MV for particular motion estimation which reduced the number of search points. It efficiently removes the temporal redundancy between successive frames by BMA.

Block-based ME is the most practical approach to obtain motion compensated prediction frames. It divides frames into equally sized rectangular blocks and finds out the displacement of the best-matched block from previous frame as the MV to the block in the current frame within a search window.

(23)

8

In the early 1980s, some conventional fast algorithms were proposed, such as the Three Step Search (TSS), the 2D logarithmic search, etc.[4] Among the algorithms, TSS becomes the most popular one for low bit-rate video application, owing to its simplicity and effectiveness. However, TSS uses a uniformly allocated search pattern in its first step, which is not very efficient to catch small motion appearing in stationary or quasi-stationary blocks.

To remedy this problem, several adaptive techniques have been suggested to make the search more adaptable to motion scale and uncertainty. The uncertainty is estimated by the difference of block distortion measure among the checked points. A smaller difference indicates a large uncertainty and hence the search scope will be increased in the next step.

2.2.1 Identifies the True Motion

The first type of ME algorithms targets to accurately track the true motion of objects/features in video sequences. Video sequences are generated by projecting a 3D real world onto a series of 2D images. When objects in the 3D real world move, the brightness or pixel intensity of the 2D images change correspondingly. The 2D motion projected from the movement of a point in the 3D real world is referred to as the “true motion” [5]. One of the many potential applications of true motion is in computer vision, the goal of which is to identify the unknown environment via the moving camera.

2.2.2 Removing Temporal Redundancy

(24)

9

[image:24.612.114.527.320.533.2]

frame t determines predicted frame t from the frame (t-Δt) or from the frame (t+Δt). Motion estimation and compensation are used to predict frame t to be coded between successive frames. Motion compensation works by estimating motion between two image frames. The motion is described by motion field of motion vectors. Consequently, the prediction error is transmitted instead of the frame itself as shown in Figure 2.2. Along with the prediction error, the motion information is also transmitted to the decoder, for it to be able to estimate the motion. The very good proportion between motion overhead and prediction error has block-based motion representation. It uses one MV per one macroblock.

Figure 2.2 Predictive sources coding with motion compensation [6]