• No results found

Efficient implementations of machine vision algorithms using a dynamically typed programming language

N/A
N/A
Protected

Academic year: 2020

Share "Efficient implementations of machine vision algorithms using a dynamically typed programming language"

Copied!
216
0
0

Loading.... (view fulltext now)

Full text

(1)

L e a rn in g a n u H iiu m ia u « .i ww. Adsetts Centre, City Campus

Sheffield S1 1WD______ 1 0 2 0 8 3 1 8 2 0

Sheffield Hallam University Learning and Information Services

Adsetts Centre, City Campus Sheffield SI 1WD

(2)

ProQuest Number: 10701158

All rights reserved

INFORMATION TO ALL USERS

The qu ality of this repro d u ctio n is d e p e n d e n t upon the q u ality of the copy subm itted.

In the unlikely e v e n t that the a u th o r did not send a c o m p le te m anuscript and there are missing pages, these will be note d . Also, if m aterial had to be rem oved,

a n o te will in d ica te the deletion.

uest

ProQuest 10701158

Published by ProQuest LLC(2017). C op yrig ht of the Dissertation is held by the Author.

All rights reserved.

This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLC.

ProQuest LLC.

789 East Eisenhower Parkway P.O. Box 1346

(3)

Efficient Implementations of

Machine Vision Algorithms

using a Dynamically Typed

Programming Language

Jan Wedekind

A thesis subm itted in partial fulfilment of the requirem ents of

Sheffield H allam University for the degree of D octor of Philosophy

February 2012

(4)

Abstract

Current machine vision systems (or at least their performance critical parts) are predo­ minantly implemented using statically typed programming languages such as C, C ++, or Java. Statically typed languages however are unsuitable for development and maintenance o f large scale systems.

When choosing a programming language, dynamically typed languages are usually not considered due to their lack o f support for high-performance array operations. This thesis presents efficient implementations o f machine vision algorithms with the (dynam­ ically typed) Ruby programming language. The Ruby programming language was used, because it has the best support for meta-programming among the currently popular pro­ gramming languages. Although the Ruby programming language was used, the approach presented in this thesis could be applied to any programming language which has equal or stronger support for meta-programming {e.g. Racket (former PLT Scheme)).

A Ruby library for performing I/O and array operations was developed as part o f this thesis. It is demonstrated how the library facilitates concise implementations o f machine vision algorithms commonly used in industrial automation. That is, this thesis is about a different way o f implementing machine vision systems. The work could be applied to prototype and in some cases implement machine vision systems in industrial automation and robotics.

The development o f real-time machine vision software is facilitated as follows

1. A just-in-time compiler is used to achieve real-time performance. It is demonstrated that the Ruby syntax is sufficient to integrate the just-in-time compiler transparently. 2. Various I/O devices are integrated for seamless acquisition, display, and storage o f

video and audio data.

In combination these two developments preserve the expressiveness o f the Ruby program­ ming language while providing good run-time performance of the resulting implementa­ tion.

(5)

Publications

Refereed Journal Articles

• M. Boissenin, J. Wedekind, A. N. Selvan, B. P. Amavasai, F. Caparrelli, and J. R. Travis. Com puter vision m ethods for optical microscopes. Im age and Vision Computing, 25(7): 1107-16, 07/01 2007 (Boissenin et al., 2007)

• A. J. Lockwood, J. W edekind, R. S. Gay, M. S. Bobji, B. P. Amavasai, M. Howarth, G. M obus, and B. J. Inkson. A dvanced transm ission electron m icroscope triboprobe with autom ated closed-loop nanopositioning. M easurem ent Science and Technol­ ogy, 21(7):075901, 2010 (Lockwood et al., 2010)

Refereed Conference Publications

• Jan Wedekind, M anuel Boissenin, Balasundram P. Amavasai, Fabio C aparrelli, and Jon R. Travis. Object Recognition and Real-Time Tracking in M icroscope Imaging. Proceedings of the 2006 Irish M achine Vision and Image Processing C onference (IM VIP 2006), pages 164-171, Dublin City University, 2006. (W edekind et al., 2006)

• J. Wedekind, B. P. Amavasai, and K. Dutton. Steerable filters generated with the hypercom plex dual-tree wavelet transform. In 2007 IE E E International Confer­ ence on Signal Processing and Communications, pages 1291—4, Piscataway, NJ, USA, 2 4-2 7 Nov. 2007 2008. Mater. & Eng. Res. Inst., Sheffield H allam Univ., Sheffield, UK, IEEE (Wedekind et al., a)

• J. Wedekind, B. P. Amavasai, K. Dutton, and M. Boissenin. A m achine vision extension for the Ruby program m ing language. In 2008 International Conference on Inform ation and Automation (ICIA), pages 991-6, Piscataway, NJ, U SA , 2 0 - 23 June 2008 2008. M icrosyst. & M achine Vision Lab., Sheffield H allam Univ., Sheffield, UK, IEEE (Wedekind et al., b)

Formal Presentations

(6)

• Jan Wedekind. Com puter Vision Using Ruby and libJIT. (RubyConf), San Fran­ cisco, California, USA, Nov. 19th 2009 ( Wedekind, 2009)

• Jan Wedekind, Jacques Penders, Hussein Abdul-Rahm an, M artin Howarth, Ken Dutton, and Aiden Lockwood. Im plem enting M achine Vision Systems with a D y­ nam ically Typed Language. 25th European Conference on O bject-O riented Pro­ gramming, Lancaster, U nited Kingdom, July 28th 2011 (Wedekind et al., 2 0 1 1)

Published Software Packages

• m alloc1 • m ultiarray2 • hornetseye-alsa3 • hornetseye-dc 13944 • hornetseye-ffm peg5 • hornetseye-fftw36 • hornetseye-fram e7 • hornetseye-kinect8 • hornetseye-linalg4 • hornetseye-narray10 • hornetseye-opencv11 • hornetseye-openexr12 • hornetseye-qt413 • hornetseye-rm agick'4

’http://github.com/wedesoft/malloc/

2http://github.com/wedesoft/multiarray/

3http://github.com/wedesoft/hornetseye-alsa/

4http://github.com/wedesoft/hornetseye-dcl394/

5http://github.com/wedesoft/hornetseye-ffmpeg/

6http://github.com/wedesoft/hornetseye-fftw3/

7http://github.com/wedesoft/hornetseye-frame/

8http://github.com/wedesoft/hornetseye-kinect/

9http://github.com/wedesoft/hornetseye-linalg/

10http://github.com/wedesoft/hornetseye-narray/

11http://github.com/wedesoft/hornetseye-opencv/

12http://github.com/wedesoft/hornetseye-openexr/

13http://github.com/wedesoft/hornetseye-qt4/

(7)

• hornetseye-v411 • hornetseye-v41216 • hornetseye-xorg17

15http://github.com/wedesoft/hornetseye-v41/

16http://github.com/wedesoft/hornetseye-v412/

(8)

Acknowledgements

First I w ould like to thank Bala Amavasai for his supervision, support, and his unshakable optimism. He developed a large part of the M imas C + + com puter vision library and organised the N anorobotics grant. W ithout him I would not have been able to do this work. I am also very indebted to Jon Travis who has been a valuable source o f help and advice when com ing to the UK and while working at university.

I w ould also like to thank Ken Dutton, Jacques Penders, and M artin Howarth for continuing supervision of the PhD, for their advice and support and for giving me room to do research work.

I w ould also like to thank Arul Nirai Selvan, M anuel Boissenin, Kim Chuan Lim, Kang Song Tan, A m ir Othman, Stephen, Shuja A hm ed and others for being good col­ leagues and for creating a friendly w orking environment. In particular I w ould like to express my gratitude to Georgios Chliveros for his advice and m oral support.

Thanks to Julien Faucher who introduced 3D cam era calibration to the research group. A special thanks to Koichi Sasada for his research visit and for the many interesting and motivating discussions.

Thanks to Aiden Lockwood, Jing Jing Wang, Ralph Gay, X iaojing Xu, Zineb Saghi, G unter M obus, and Beverly Inkson for their valuable help in applying the w ork to trans­ mission electron m icroscopy in context of the N anorobotics project.

Finally I w ould like to thank my parents who sacrificed a lot so that I can achieve the best in my life. W ithout their support I w ould not have made it this far.

A seven year part-tim e PhD is a long time to work and make friends. M y apologies but there is ju st not enough room to mention you all.

The work presented in this thesis was partially funded by the EPSRC N an oroboticsls project. I also received a student bursary of the M aterials and Engineering Research Institute.

(9)

Declaration

Sheffield Hallam University

Materials and Engineering Research Institute

Mobile Machines and Vision Laboratory

The undersigned hereby certify that they have read and recommend to

the Faculty of Arts, Computing, Engineering and Sciences for acceptance

a thesis entitled

“Efficient Implementations of M achine Vision A lgo­

rithms using a Dynamically Typed Programming Language”

by

Jan

Wedekind

in partial fulfilment of the requirements for the degree of Doc­

tor of Philosophy.

Date:

February 2012

Director of Studies:

______________________________

Dr. Martin Howarth

Research Supervisor:

______________________________

Dr. Jacques Penders

Research Supervisor:

______________________________

Dr. Jon Travis

Research Advisor:

______________________________

Dr. Ken Dutton

Research Advisor:

(10)

Disclaimer

Sheffield H allam University Author: Jan W edekind

Title: Efficient Im plem entations o f M achine Vision A lgorithm s using a D ynam ically Typed Program m ing L anguage

Departm ent: M aterials and Engineering Research Institute

Degree: PhD Year: 2012

Perm ission is herewith granted to Sheffield H allam University to circulate and to have copied for non-com m ercial purposes, at its discretion, the above title upon the request of individuals or institutions.

TH E AUTHOR ATTESTS THAT PERM ISSIO N HAS BEEN OBTAINED FO R TH E USE OF ANY CO PY RIG H TED M ATERIAL A PPEARIN G IN THIS TH ESIS (OTHER THAN BRIEF EX CERPTS REQUIRING ONLY PROPER A CK N O W LED G EM EN T IN SCHOLARLY W RITING) AND THAT ALL SUCH USE IS CLEARLY

ACKNOW LEDGED.

(11)

Contents

Contents viii

Sym bols xiii

Acronym s xv

List o f Figures xviii

List o f Tables xxii

Listings xxiii

1 Introduction 1

1. 1 Interpreted Languages ... 1

1.2 D ynam ically Typed L a n g u a g e s ... 3

1.3 Contributions of this T h e s is ... 6

1.4 Thesis O u tlin e ... 7

2 State o f the A rt 9 2.1 Object L o c a lis a tio n ... 9

2.2 Existing FOSS for M achine V is io n ... 10

2.2.1 Statically Typed Libraries ... 12

2.2.2 Statically Typed E x t e n s i o n s ... 15

2.2.3 D ynam ically Typed Libraries ... 18

2.3 Ruby Program m ing L a n g u a g e ... 21

2.3.1 Interactive R u b y ... 22

2.3.2 O bject-Oriented, S in g le - D is p a tc h ... 23

2.3.3 Dynamic T y p i n g ... 24

2.3.4 Exception Handling ... 24

2.3.5 Garbage C o l l e c t o r ... 26

2.3.6 Control Structures ... 26

2.3.7 M ixins ... 27

2.3.8 Closures ... 27

2.3.9 C o n tin u a tio n s ... 28

2.3.10 I n tr o s p e c tio n ... 29

(12)

2.3.12 R e i f i c a t i o n ... 30

2.3.13 Ruby E x t e n s io n s ... 31

2.3.14 Unit T e s tin g ... 31

2.4 JIT C o m p ile r s ... 32

2.4.1 Choosing a JIT C o m p ile r... 32

2.4.2 libJIT API ... 33

2.5 S u m m a r y ... 36

3 H andling Im ages in Ruby 37 3.1 Transparent JIT I n te g ra tio n ... 38

3.2 M alloc O b j e c t s ... 40

3.3 Basic T y p e s ... 41

3.3.1 B o o l e a n s ... 42

3.3.2 I n te g e r s ... 42

3.3.3 Floating-Point N u m b e r s ... 43

3.3.4 Composite N u m b e r s ... 44

3.3.5 P o in te r s ... 44

3.3.6 Ruby O b je c t s ... 46

3.4 Uniform A r r a y s ... 46

3.4.1 Variable S u b s titu tio n ... 47

3.4.2 Lam bda T e r m s ... 48

3.4.3 Lookup O b je c ts ... 48

3.4.4 M ulti-Dimensional A r r a y s ... 49

3.4.5 Array Views ... 51

3.5 O p e ra tio n s... 52

3.5.1 Constant Arrays ... 52

3.5.2 Index A r r a y s ... 53

3.5.3 Type M a t c h i n g ... 53

3.5.4 Element-W ise Unary O p e ra tio n s ... 55

3.5.5 Element-W ise Binary O perations ... 57

3.5.6 LUTs and W a r p s ... 59

3.5.6.1 L U T s ... 59

3.5.6.2 W a r p s ... 60

3.5.7 I n je c tio n s ... 61

3.5.8 Tensor Operations ... 64

3.5.9 Argmax and A r g m i n ... 64

3.5.10 C o n v o lu tio n ... 66

3.5.11 I n t e g r a l ... 68

3.5.12 M ask in g /U n m a sk in g ... 71

(13)

3.6 JIT C o m p i l e r ... 73

3.6.1 Stripping T e rm s ... 75

3.6.2 Com pilation and Caching ... 75

3.7 Unit T e s t in g ... 77

3.8 S u m m a r y ... 79

4 In p u t/O u tp u t 80 4.1 C o lo u r S p a c e s ... 81

4.1.1 s R G B ... 81

4.1.2 YCbCr ... 82

4.2 Image F i l e s ... 85

4.3 HDR Image F i l e s ... 88

4.4 Video F i l e s ... 90

4.5 C am era I n p u t ... 94

4.6 Image D is p la y ... 95

4.7 RGBD S e n s o r ... 97

4.8 GUI In te g r a tio n ... 99

4.9 S u m m a r y ... 100

5 M ach in e Vision 101 5.1 P re p ro c e s s in g ... 101

5.1.1 N ormalisation and C l i p p i n g ... 101

5.1.2 M orphology ...102

5.1.2.1 Erosion and D ila tio n ... 102

5.1.2.2 Binary M o r p h o lo g y ... 104

5.1.3 Otsu T h r e s h o ld in g ... 105

5.1.4 G am m a C o r r e c tio n ... 105

5.1.5 Convolution F i l t e r s ... 106

5.1.5.1 Gaussian B l u r ...106

5.1.5.2 Gaussian G r a d ie n t...107

5.1.6 Fast Fourier T r a n s f o r m ... I l l 5.2 Feature L o c a tio n s ... 113

5.2.1 Edge D e te c to r s ... 113

5.2.1.1 R oberts’ Cross E d g e-D etecto r... 113

5.2.1.2 Sobel E d g e-D etecto r... 114

5.2.1.3 N on-M axim a Suppression for E d g e s ... 115

5.2.2 C orner D e te c to r s ...115

5.2.2.1 Corner Strength by Yang et al... 115

5.2.2.2 Shi-Tomasi Corner D e te c to r...116

5.2.2.3 H arris-Stephens C o rn er-and E d g e - D e te c to r ... 118

5.2.2.4 Non-M axim a Suppression for C o r n e r s ...120

(14)

5.3 Feature D e s c r ip to r s ... 120

5.3.1 Restricting Feature D e n s ity ...120

5.3.2 Local Texture Patches ... 122

5.3.3 SVD M a t c h i n g ... 123

5.4 S u m m a r y ... 124

6 Evaluation 125 6.1 Software M o d u le s ... 125

6.2 A ssessm ent of F u n c ti o n a li ty ...127

6.2.1 Fast N orm alised C ross-C orrelation...127

6.2.2 Lucas-K anade T r a c k e r ... 127

6.2.3 Hough T ra n s fo rm ...132

6.2.4 M icroscopy S o ftw a re ...133

6.2.5 Depth from Focus ... 134

6.2.6 G esture-based M ouse C o n t r o l ...136

6.2.7 Slide P r e s e n t e r ... 139

6.2.8 Cam era C a lib ra tio n ... 141

6.2.8.1 Corners of Calibration G r i d ...142

6.2.8.2 Cam era Intrinsic M a t r i x ...146

6.2.8.3 3D Pose o f Calibration G r i d ...147

6.2.9 Augm ented R e a lity ... 148

6.3 P erfo rm an c e...151

6.3.1 Com parison with NArray and C + + ... 151

6.3.2 Breakdown of Processing Time ... 153

6.4 Code S i z e ... 155

6.4.1 Code Size of P r o g r a m s ... 155

6.4.2 Code Size of L i b r a r y ...157

6.5 S u m m a r y ... 157

7 Conclusions & Future Work 159 7.1 C o n c lu s io n s ...159

7.2 Future W o r k ...161

A Appendix 163 A. 1 Connascence ... 163

A .2 Linear Least S q u a r e s ...164

A .3 Pinhole Cam era M o d e l ... 164

A.4 Planar H o m o g ra p h y ... 165

A .5 “m alloc” g e m ... 168

A .6 “m ultiarray” g e m ... 168

(15)

A .7.1 JIT E x a m p l e ... 168

A .7.2 Video P l a y e r ... 169

A .7.3 N orm alised C ro s s -C o rre la tio n ...169

A .7.4 Cam era C a lib ra tio n ... 170

A.7.5 Recognition of a rectangular m a r k e r ... 173

A.7.6 Constraining Feature Density ... 174

A.7.7 SVD M a t c h i n g ... 175

B ibliography 177

(16)

Symbols

:= “is defined to be” =: “defines”

= “is logically equivalent to”

e “is an elem ent o f” ! “m ust b e”

i-> “maps to” -> “from . . . to . . . ”

x product (or Cartesian product)

A “and”

© erosion © dilation © convolution — /3-reduction V “for all” B Boolean set

B b lu e ... 72

C clipping

Cb chrom a b lu e ...83

Cr chrom a re d ...83 3 “there exists”

G g re e n ... 72

Kb w eight of blue com ponent in lum a channel

K r w eight of red com ponent in lum a channel

N normalisation

JN0 set of all natural numbers including zero

P b chrom a blue

(17)

R set o f all real numbers

R r e d ... 72

S structure tensor

U chrom a b lu e ...83

V chrom a r e d ...83

Y lu m a ...83 Z set of all integers

(18)

Acronyms

1 D one-dim ensional... 48

2D tw o-dim ensional... 10

3D th ree-d im en sio n al... 10

AAC Advanced A udio C o d in g ... 90

ALSA Advanced Linux Sound A rchitecture AOT ah ead -o f-tim e...8

API application program m ing in te rfa c e ...15

A SF A dvanced Systems F o r m a t... 90

AVC A dvanced Video Coding AVI A udio Video Interleave...90

BLUE best linear unbiased estim ator BMP Bitm ap Im age F ile ... 86

CIE Comm ission internationale de l’eclairage CPU central processing u n i t ... 42

CRT cathode ray tu b e ... 82

DFT discrete Fourier tran sfo rm ...I l l DICOM Digital Im aging and Com m unications in M ed icin e ...86

DLL dynam ic-link lib ra ry ... 33

FFT Fast Fourier T ransform ... I l l FFTW Fastest Fourier Transform in the West FFTW 3 FFTW version 3 ...I l l FLV Flash V id eo ... 90

f o l d I fo ld -le ft...61

f o l d r fo ld -rig h t... 62

F O S S free and open source softw are... 10

G CC GNU Com piler C ollection... 126

(19)

GNU “G N U ’s Not U nix!”

GPL GNU General Public License

GPGPU general purpose G P U ... 6

GPU graphics processing u n it... 7

GUI graphical user in terface... 8

H.264 M PEG-4 AVC sta n d a rd ...90

HDR high dynam ic r a n g e ...8

HSV hue, saturation, and v alu e... 73

I in p u t... 6

HR infinite im pulse resp o n se... 106

IR I n fra re d ... 97

IRB Interactive Ruby S h e ll... 22

JIT ju s t-in -tim e ...7

JPEG Joint Photographic Experts G ro u p ... 83

LDR low dynam ic ra n g e ... 8

LLVM Low Level Virtual M ach in e... 32

LUT lookup ta b l e ... 59

MP3 M PEG A udio Layer 3 ... 90

MPEG M otion Picture Experts Group MPEG-4 M PEG standard version 4 ...90

MOV A pple Q uicktim e M o v ie... 90

MR m agnetic resonance O o u tp u t... 6

OCR optical character reco gn itio n...15

Ogg X iph. Org container fo rm at... 90

PBM portable b itm a p ... 86

PGM portable g ray m a p ... 86

PNG Portable N etwork G rap h ics... 86

PPM portable p ix m a p ... 86

RGB red, green, b lu e ... 44

RGBD RGB and d e p th ... 81

RANSAC Random Sample C o n se n su s... 10

(20)

SLAM Sim ultaneous Localisation and M a p p in g ...10

SO shared o b je c t... * ... 73

sR G B standard RGB colour s p a c e ... 81

SVD singular value decom p ositio n ...123

TEM transm ission electron m icro sco p y ...134

Theora X iph.Org video c o d e c ... 90

TIFF Tagged Im age File F o rm at... 86

Vorbis Xiph.Org audio c o d e c ... 90

VP6 On2 Truem otion VP6 c o d e c ... 90

VM virtual m ach in e ... 8

V4L Video for Linux V4L2 V4L version 2 WMA Windows M edia A u d io ...90

(21)

List of Figures

1.1 Optica] parking system for a c a r ... 2

1.2 Feedback cycle in a com piled and in an interpreted l a n g u a g e ... 2

1.3 ARM Gumstix b o a r d s ... 3

1.4 Early vs. late m ethod b in d in g ... 4

1.5 Static typing vs. dynam ic typing. Com m ent lines (preceded with “//”) show the output of the com piler ... 4

1.6 Static typing and num eric overflow. Com m ent lines (preceded with “//”) show the output of the p r o g r a m ... 5

1.7 Ariane 5 disaster caused by numerical o v e r f lo w ... 5

1.8 Software architecture o f m achine vision s y s t e m ... 7

2.1 Overview o f a typical object localisation a l g o r i t h m ... 9

2.2 Binary operations for different elem ent types (Wedekind et al., b ) ... 13

2.3 Low resolution image o f a c i r c l e ... 15

2.4 Processing time com parison for creating an index array with GCC com ­ piled code vs. with the Ruby V M ... 19

2.5 Interactive Ruby Shell ... 22

2.6 M ark & Sweep garbage c o lle c to r ... 26

2.7 Conditional statements in Ruby (Fulton, 2 0 0 6 ) ... 26

2.8 Loop constructs in Ruby (Fulton, 2 0 0 6 ) ... 27

3.1 Pointer operations (com pare Listing 3 . 5 ) ... 41

3.2 A bstract data type for 16-bit unsigned i n t e g e r ... 43

3.3 Abstract data types for single-precision and double-precision floating point n u m b e r s ... 44

3.4 Com posite types for unsigned byte RGB, unsigned short int RGB, and single-precision floating point RGB v a lu e s ... 45

3.5 Abstract data type for pointer to double precision floating point num ber . 45 3.6 Shape and strides for a 3D a r r a y ... 50

3.7 Extracting array views o f a 2D a r r a y ... 52

3.8 Type m a tc h in g ... 54

3.9 Avoiding intermediate results by using lazy evaluation ... 56

3.10 Pseudo colour p a l e t t e ... 60

3.11 Thermal image displayed with pseudo colours (source: NASA Visible E a r t h ) ... 60

(22)

3.12 Visualised com ponents of warp v e c to rs ... 61 3.13 W arping a satellite image (source: NASA Visible Earth) ... 61 3.14 Recursive im plem entation of injection (here: sum) ... 63 3.15 Recursive im plem entation of argum ent m a x im u m ... 65 3.16 Diagonal in je c tio n ... 66 3.17 Applying a moving average filter to an i m a g e ... 68 3.18 Recursive im plem entation of integral i m a g e ... 70 3.19 C om puting a moving average filter using an integral i m a g e ... 70 3.20 H istogram segmentation e x a m p le ... 73 3.21 Histogram s o f the red, green, and blue colour channel of the reference

i m a g e ... 74 3.22 3D h i s t o g r a m ... 74 4.1 Input/output in te g ra tio n ... 80 4.2 C olour image and corresponding grey scale image according to sensitivi­

ties o f the human e y e ... 83 4.3 C olour space conversions (Wilson, 2007) 84 4.4 YV12 colour space (Wilson, 2 0 0 7 ) ... 84 4.5 YUY2 colour space (Wilson, 2007) 84 4.6 UYVY colour space (Wilson, 2 0 0 7 ) ... 85 4.7 Artefacts caused by colour space c o m p re s s io n ... 85 4.8 Low resolution colour image using lossless PNG and (extrem ely) lossy

(23)

5.9 2D Gauss gradient f i l t e r ... 110 5.10 Gauss gradient filter (cr = 3, e - 1 /256) applied to a colour image . . . . I l l 5.11 Spectral image of a piece of f a b r i c ... 112 5.12 Exam ple image and corresponding R oberts’ Cross e d g e s ... 114 5.13 Exam ple image and corresponding Sobel e d g e s ... 115 5.14 N on-m axim a suppression for e d g e s ... 116 5.15 C orner detection by Yang et al...117 5.16 Shi-Tomasi c o rn e r-d e te c to r... 118 5.17 H arris-Stephens response f u n c t i o n ...119 5.18 H arris-Stephens corner- and edge-detector (negative values (edges) are

black and positive values (corners) are w h i t e ) ... 119 5.19 N on-m axim a suppression for c o r n e r s ... 120 5.20 Restricting feature density ... 121 5.21 Com puting feature locations and d e s c r ip to r s ... 122 5.22 SVD t r a c k i n g ... 124 6.1 Norm alised cross-correlation e x a m p l e ...128 6.2 Com parison o f tem plate and warped image ... 128 6.3 Gradient boundaries of t e m p l a t e ... 129 6.4 Warp without and with in te r p o la tio n ... 131 6.5 Exam ple of Lucas-K anade tracker in a c t i o n ...131 6.6 Line detection with the Hough tra n s fo rm ... 132 6.7 Configuration G U I ... 133 6.8 C losed-loop control of a nano m anipulator in a T E M ... 134 6.9 Part of focus stack showing glass fibres ... 136 6.10 Results of D epth from F o c u s ... 136 6.11 Human com puter interface for controlling a m ouse c u r s o r ...137 6.12 Software for vision-based changing of s lid e s ...139 6.13 Quick gesture for displaying the next s l i d e ... 140 6.14 Slow gesture for choosing a slide from a m e n u ... 141 6.15 Unibrain Fire-I (a DC 1394-com patible Firewire c a m e r a ) ...141 6.16 Custom algorithm for labelling the corners of a calibration g r i d ... 143 6.17 Result of labelling the corners ...144 6.18 Estim ating the pose of the calibration g r i d ... 149 6.19 Custom algorithm for estim ating the 3D pose o f a m a r k e r ...150 6.20 Augmented reality d em o n stratio n ... 152 6.21 Performance com parison of different array o p e r a tio n s ...152 6.22 Processing time of running “m + 1” one-hundred times for different array

s i z e s ... 153 6.23 Processing time increasing with length of e x p r e s s i o n ...153

(24)

6.24 Breakdown of processing time for com puting “- s ” w here “s ” is an array with one million e le m e n ts ... 7.1 The main requirem ents when designing a program m ing language or sys­

(25)

List of Tables

2.1 Processing steps perform ed by a typical machine vision s y s t e m s ... 10 2.2 Existing FOSS libraries for M achine Vision I / I I ... 11 2.3 Existing FOSS libraries for M achine Vision I I / I I ... 11 2.4 Ruby n o ta tio n ... 28 2.5 Just-in-tim e com pilers ... 33 3.1 Directives for conversion to/from native r e p r e s e n ta tio n ... 40 3.2 M ethods for raw memory manipulation ... 41 3.3 Generic set o f array o p e r a t io n s ... 53 4.1 Different types o f i m a g e s ... 82 4.2 M ethods for loading and saving images ... 87 4.3 M ethods for loading and saving images ... 89 5 . 1 Non-m axim a suppression for edges depending on the o r ie n ta tio n ... 116 6.1 Processing times measured for tasks related to com puting “ - s ” for an array 154 6.2 Size of OpenCV code for filtering i m a g e s ... 157 6.3 Size of Hornetseye code for all array o p e r a tio n s ... 157

(26)

Listings

2.1 M ulti-dim ensional “+ ” operator im plem ented in C + + . Com m ent lines

(preceded with “//”) show the output of the p r o g r a m ... 14 2.2 Integrating RM agick and NArray in Ruby. Com m ent lines (preceded with

“# ”) show the output o f the p r o g r a m ... 16 2.3 Using O penCV in Ruby. Comm ent lines (preceded with “# ”) show the

output of the p r o g r a m ... 16 2.4 Using NArray in Ruby. Com m ent lines (preceded with “# ”) show the

output o f the p r o g r a m ... 17 2.5 Array operations in Python using NumPy. Com m ent lines (preceded with

“# ”) show the output of the p r o g r a m ... 17 2.6 Tensor operation with the FTensor C + + l i b r a r y ... 18 2.7 M ulti-dim ensional “+ ” operator im plem ented in Ruby. Com m ent lines

(preceded with “# ”) show the output of the p r o g r a m ... 18 2.8 Arrays in Ruby. Com m ent lines (preceded with “# ”) show the output of

the p r o g r a m ... 19 2.9 Arrays in GNU Comm on Lisp. Com m ent lines (preceded with show

the output of the p r o g r a m ... 20 2.10 Lush program m ing language. Com m ent lines (preceded with show

(27)

2.26 Array operation com piled with libJIT ... 35 3.1 Reflection using missing methods ... 38 3.4 Converting arrays to binary data and b a c k ... 40 3.5 M anipulating raw data with M alloc objects ... 41 3.6 Boxing booleans ... 42 3.7 Constructor short c u t ... 42 3.8 Template classes for integer types ... 43 3.9 Boxing floating point n u m b e r s ... 43 3.10 Com posite n u m b e r s ... 44 3.11 Boxing com posite n u m b e r s ... 45 3.12 Pointer objects ... 46 3.13 Boxing arbitrary Ruby o b j e c t s ... 46 3.14 Variable objects and s u b s titu tio n ... 47 3.15 Lam bda abstraction and application ... 48 3.16 Im plem enting arrays as lazy l o o k u p ... 49 3.17 Uniform a r r a y s ... 49 3.18 M ulti-dim ensional uniform a r r a y s ... 50 3.19 Array v i e w s ... 51 3.20 Constant a r r a y s ... 54 3.21 Index a r r a y s ... 54 3.22 Type m a tc h in g ... 55 3.23 Elem ent-wise unary operations using “A r r a y # c o lle c t” ... 55 3.24 Short notation for elem ent-wise o p e ra tio n s... 56 3.25 Internal representation of unary o p e ra tio n s ... 56 3.26 Elem ent-wise binary operations using “A r r a y # c o lle c t” and “A rra y # z ip ’' 57 3.27 Internal representation of binary o p e r a t i o n s ... 58 3.28 Elem ent-wise application of a L U T ... 59 3.29 Creating a pseudo colour image ... 59 3.30 Warp from equirectangular to azimuthal p r o j e c t i o n ... 61 3.31 Left-associative fold operation in R u b y ... 62 3.32 Internal representation o f injections ... 62 3.33 Various cumulative operations based on in je c tio n s ... 63 3.34 Concise notation for sums of e l e m e n t s ... 63 3.35 Tensor operations in Ruby (equivalent to Listing 2 . 6 ) ... 64 3.36 Argument m a x i m u m ... 65 3.37 One-dim ensional convolutions in R u b y ... 67 3.38 Two-dimensional convolutions in Ruby ... 68 3.39 M oving average filter im plem ented using c o n v o l u tio n s ... 69 3.40 M oving average filter im plem ented using an integral i m a g e ... 70 3.41 Conditional selection as elem ent-wise o p e r a tio n ... 71

(28)
(29)

6.5 Lookup table for r e - la b e llin g ...138 6.6 Vision-based changing of slides ... 140 6.7 Custom algorithm for labelling the corners of a calibration g r i d ...145 6.8 W ebcam viewer im plem ented using Python and O p e n C V ... 155 6.9 W ebcam viewer im plem ented using Ruby and Hornetseye ...155 6.10 Sobel gradient viewer im plem ented using Python and O p e n C V ...156 6.11 Sobel gradient viewer im plem ented using Ruby and H o r n e t s e y e ... 156

(30)

“Plan to throw one away; you w ill anyhow.” Fred Brooks - The Mythical Man-Month

“I f you plan to throw one away, you w ill throw aw ay tw o.”

Craig Zerouni

“H ow sad it is that our PCs ship without program m ing

languages. E very com puter shipped should be p ro ­

gram m able - as sh ipped.”

Word Cunningham

“I didn ’tg o to university. D id n ’t even finish A -levels. But I have sym path y for those who did.”

Terry Pratchett

Introduction

M achine vision is a broad field and in many cases there are several independent ap­ proaches solving a particular problem. Also, it is often difficult to preconceive w hich approach will yield the best results. Therefore it is im portant to preserve the agility of the software to be able to im plem ent necessary changes in the final stages of a project.

A traditional application of com puter vision is industrial automation. That is, the cost of im plem enting a m achine vision system eventually needs to be recovered by savings in labour cost, increased productivity, and/or better quality in m anufacturing. M ost m a­ chine vision systems however are still im plem ented using a statically typed program m ing language such as C, C + + , or Java (see Section 2.2). Developm ent and m aintenance of large scale systems using a statically typed language is much m ore expensive com pared to when using a dynam ically typed languages (N ierstrasz et al., 2005).

This thesis shows how the dynam ically typed program m ing language Ruby can be used to reduce the cost of im plem enting machine vision algorithms. A Ruby library is in­ troduced which facilitates rapid prototyping and developm ent of machine vision systems. The thesis is organised as follows

• Section 1. 1 discusses interpreted program m ing languages

• Section 1.2 introduces the notion of dynam ically typed program m ing languages • Section 1.3 states the contribution of this thesis

• Section 1.4 gives an outline of the thesis

H istorically software for machine vision systems was predom inantly im plem ented in com piled languages such as assem bler or C /C + +. M ost com piled languages m ap effi­

(31)

ciently to machine code and they don’t use a run-tim e environm ent for m anaging vari­ ables and data types. Concise and efficient code is a requirem ent especially for em bedded systems with limited processing pow er and memory (e.g. see Figure 1.1 for an exam ple of an em bedded system involving com puter vision).

Figure 1.1: Optical parking system for a car

The downside of using a com piled language is that a developer is required to m ake changes to the source code, save them in a file, com pile that file to create a binary file, and then re-run that binary file. In contrast, interpreted languages offer considerable savings in development time. In an interpreted language the developer can enter code and have it run straight away. Figure 1.2 shows that the feedback cycle in an interpreted language is much shorter than the one of a com piled language.

Com piled language Interpreted language Figure 1.2: Feedback cycle in a com piled and in an interpreted language

(32)

Even though interpreted languages have been applied to m achine vision as early as 1987 (see M undy (1987)), m achine vision systems are still predom inantly im plem ented using com piled languages. The reason is that if an em bedded system is produced in large quantities, it is possible to offset the considerable software developm ent cost against small per-unit savings in hardware cost. However this trade-off might becom e less im portant with the advent of m odern em bedded hardware (Figure 1.3 for exam ple shows the Gum- stix board which is an em bedded com puter capable of running an operating system).

Figure 1.3: ARM Gumstix boards

It can be argued that the w idespread adoption of com piled languages is currently ham pering innovation (Nierstrasz et ah, 2005). The publication by Roman et al. (2007) dem onstrates that robotic projects can greatly benefit from the properties of the interpreted program m ing language Ruby. Interpreted languages not only allow for concise code, they also make interactive m anipulation of data possible where one can confirm the results immediately.

1.2 Dynamically Typed Languages

[image:32.613.172.428.154.363.2]
(33)

Ruby however it is in general im possible to determ ine whether the value of “x” always will be an integer. For exam ple the value might be a floating point num ber or a rational number.

in t t e s t ( i n t x)

{ def t e s t ( x )

return x + 1; x + 1

} end

/ / . . . # . . .

in t y = t e s t (42); y = t e s t 42

/ / . . . z = t e s t Complex::I

C + + (early m ethod binding) Ruby (late m ethod binding)

Figure 1.4: Early vs. late m ethod binding

Type safety is a term to describe the fact that static typing prevents certain program ­ m ing errors such as type m ism atches or m isspelled m ethod names from entering produc­ tion code. With static typing it is possible to reject these kind of errors at com pile time. Statically typed languages are engrained in safety critical systems such as nuclear pow er plants, air planes, and industrial robots because of increased type safety. Figure 1.5 gives an exam ple where the bug in the C + + program is rejected by the compiler. The equivalent Ruby program however discovers the error only at run-time and only for certain input.

#include <stdlib.h>

in t main(int argc, char *argv[])

{

in t x = a t o i ( a r g v [ l ] ) ; i f (x == ®) x += "test";

/ / error: in v a lid conversion from / / ’const char*’ to ’i n t ’

return Q; x = ARGV[®]. to _ i

} x += "test" i f x == ®

C + + (static typing) Ruby (dynam ic typing)

Figure 1.5: Static typing vs. dynam ic typing. Com m ent lines (preceded w ith “//”) show the output of the com piler

However statically typed im plem entations tend to becom e inflexible. That is, w hen a developer wants to modify one aspect of the system, the static typing can force num erous rewrites in unrelated parts of the source code ( IYatt and Wuyts, 2007). D evelopm ent and m aintenance of large scale systems using a statically typed language is m uch m ore ex­ pensive com pared to when using a dynam ically typed languages (N ierstrasz et al., 2005). H eavy users of statically typed languages tend to introduce custom m echanism s to deal with the absence of support for reflection and m eta-program m ing in their language (see the C E R N ’s C + + framework for exam ple Antcheva et al. (2009)).

(34)

Though offering some safety, static typing does not prevent program m ing errors such as num erical overflow or buffer overflow (Tratt and Wuyts, 2007). That is, the efficiency gained by using C or C + + is at the cost o f security (Wolczko et al., 1999). Figure 1.6 shows two program s w here num erical overflow occurs if a native integer type of insuffi­ cient size is chosen. A well known exam ple is the failure of the first A riane 5 (shown in

#include <iostream> using namespace std; in t main(void)

{

in t x = 2147483648; x += 1;

cout « x « en d l; / / -2147483647 return Q;

}

#include <iostream> using namespace std; in t main(void)

{

long x = 2147483648; x += 1;

cout « x « e n d l; / / 2147483649 return ©;

}

32-bit integer 64-bit integer

Figure 1.6: Static typing and numeric overflow. Com m ent lines (preceded w ith “//”) show the output of the program

Figure 1.7) due to an arithmetic overflow (see talk by Fenwick, 2008). That is, even w hen

Figure 1.7: A riane 5 disaster caused by num erical overflow

using static typing, it is still necessary to use techniques such as softw are assertions or unit tests to prevent runtim e errors from happening.

Dynamic typing on the other hand allows to com bine integers, rational num bers, com ­ plex numbers, vectors, and matrices in a seamless way. The Ruby core library m akes use of dynamic typing to represent integers, big numbers, floating point num bers, com plex numbers, and vectors w ork together seam lessly (see Section 2.3.3 for m ore details).

(35)

and the boundaries of Ruby arrays are resized dynamically. Furtherm ore dynam ic typ­ ing requires late binding o f m ethod calls which is com putationally expensive on current hardw are (Paulson, 2007). This thesis tries to address these problem s by defining repre­ sentations of native types in a Ruby extension1 (see Section 1.3).

1.3 Contributions of this Thesis

The title of this thesis is “Efficient Im plem entations of M achine Vision A lgorithm s using a D ynam ically Typed Program m ing Language”, The Ruby extension im plem ented in the context of this thesis makes it possible for researchers and developers w orking in the field of im age processing and com puter vision to take advantage of the benefits offered by this dynam ically typed language. The phrase “efficient im plem entation” was intentionally used in an am biguous way. It can mean

• m achine efficiency: The run-tim e perform ance of the system is sufficient to im ple­ m ent real-tim e m achine vision systems.

• developer efficiency: The program m ing language facilitates concise and flexible im plem entations which means that developers can achieve high productivity. The contribution of this thesis is a set of com puter vision extensions for the existing Ruby program m ing language. The extensions bring together perform ance and productiv­ ity in an unprecedented way. The Ruby extensions provide

• extensive input (I)/output (O) integration for image- and video-data • generic array operations for uniform m ulti-dim ensional arrays

- a set of objects to represent arrays, array views, and lazy evaluations in a m odular fashion

- optimal type coercions for all com binations of operations and data types The work presented in this thesis brings together several concepts w hich previously have not been integrated in a single com puter vision system:

expressiveness: An library for m anipulating uniform arrays is introduced. A generic set of basic operations is used to build com puter vision algorithm s from the ground up.

lazy evaluation: Lazy evaluation of array operations makes it possible to reduce m em ory-I/O. This facilitates the use of general purpose GPU (G PG PU) (not done as part of this thesis) where m em ory-I/O is the perform ance-bottleneck.

’Ruby libraries are generally called “Ruby extensions”

(36)

array views: Shared references make it possible to extract sub-arrays w ithout m ak­ ing a “deep copy” of the array.

transparent just-in-tim e (JIT) compilation: A JIT com piler and a cache are inte­ grated transparently to achieve real-tim e performance.

I/O integration: The im plem entation also provides integration for image- and video- I/O (see Figure 1.8) as well as the necessary colour space conversions.

Application

IRB/FXRI/NaturalDocs/RDoc

Ruby

CL

Ol <N CX> UJ

m UJ

Q.

CL m

in

_i -a.Q

Q. CL UL CL

U_

GNU+Linux operating system

Figure 1.8: Software architecture of machine vision system

The functionality was im plem ented in a m odular way (see Section 3.4). The result is a com prehensive approach to im plem enting com puter vision algorithms.

The type system and the expressions presented in Chapter 3 constitute the library which was developed as part of this thesis. If some of the expressions appear to be part of the Ruby syntax at first sight, it is due to the dynam ic nature of the program m ing language. A lthough the Ruby program m ing language was used, this approach could be applied to other dynam ically typed languages with sufficient m eta-program m ing support. The approach presented in this thesis could also be used to provide transparent integration of graphics processing units (GPUs) for parallel processing. Finally the facilitation of succinct im plem entations of various com puter vision algorithm s allows for a m ore form al understanding of com puter vision.

1.4 Thesis Outline

[image:36.612.125.465.160.363.2]
(37)

im plem enting machine vision algorithms. A part from offering productivity gains, dy­ nam ically typed languages also make it possible to com bine various types and operations seamlessly.

Chapter 2 gives an overview of the state of the art in m achine vision software, illus­ trating the difficulty of achieving perform ance and productivity at the same time. It will be shown that the perform ance of the Ruby virtual machine (VM ) is significantly lower than the perform ance achieved with GNU C. But it will also be argued that ahead-of- time (AOT) com pilation is incom patible with the goal o f achieving productivity.

Chapter 3 is about the core of the work presented in this thesis. Starting with memory objects and native data types, a library for describing com puter vision algorithm s is intro­ duced. It is dem onstrated how this approach facilitate succinct im plem entations o f basic im age processing operations. JIT com pilation is used to address the issue of performance.

Chapter 4 covers key issues in im plem enting interfaces for input and output of im ­ age data. Im age I/O involving cameras, image files, video files, and video displays is discussed. The key issues are colour space com pression, im age and video com pression, low dynam ic range (LDR) versus high dynam ic range (HDR) imaging, and graphical user interface (GUI) integration.

In Chapter 5 it is shown how different algorithm s which are com m on in the field of com puter vision can be im plem ented using the concepts introduced in chapter Chapter 3 and Chapter 4.

Chapter 6 shows some exam ples of com plete applications im plem ented using the H or­ netseye Ruby extension which was developed as part o f this thesis (see page iii). Further­ m ore a perform ance com parison is given.

At the end of the thesis Chapter 7 offers conclusions and future work.

(38)

“There are tw o w ays o f constructing a software design: O ne w ay is to m ake it so sim ple that there are obviou sly no deficiencies, and the other w ay is to make it so com ­ p licated that there are no obvious deficiencies. The first m eth od is far m ore difficult.”

Sir Charles Antony Richard Hoare

“I am a historian and a com puter programmer, but pri­ m arily la m a lawyer. M y research, ongoing for a decade, follow s a pu rely experim ental paradigm:

1. Try to create freedom b y destroying illegitim ate p o w e r sheltered behind intellectual p roperty law.

2. See what happens.

E arly results are encouraging.”

Eben Moglen

State of the Art

This chapter gives an overview of the state o f the art in machine vision systems, it dis­ cusses the features of the Ruby program m ing language, and available JIT com pilers are discussed

• Section 2.1 shows the typical structure of an object localisation system

• Section 2.2 gives an overview of a typical object localisation algorithm and how it is im plem ented

• Section 2.3 characterises the Ruby program m ing language by describing the para­ digm s it supports

• Section 2.4 points out different JIT com pilers and their properties • Section 2.5 gives a summary of this chapter

2.1 Object Localisation

The task of an object localisation algorithm is to determine the pose o f know n objects given a cam era image as input. Figure 2 .1 shows an overview o f a typical object locali­ sation algorithm. The processing steps are explained in Table 2.1. The processing steps

S e n s o r D a ta P rep ro c e s s in g — ► K e y -P o in t L o c a lis a tio n

F e a tu re D e s c rip tio n

R e c o g n itio n / T ra c k in g

U p d a te d W o rld M o d e l

(39)

Table 2.1: Processing steps perform ed by a typical m achine vision systems P rocessing step D etails

preprocessing

key-point localisation

feature description

recognition/tracking

basic operations such as filtering, thresholding, m or­ phology and the like are applied to the im age

a feature extraction method defines feature locations in the image

the descriptors for the local feature context are com ­ puted

the features are used to recognise and track known objects in the scene

are not mandatory. Some algorithm s do not use feature descriptors (e.g. G eom etric H ash­ ing (Lamdan and Wolfson, 1988)). Some algorithm s for tw o-dim ensional (2D) object lo­ calisation do not even use features at all (e.g. Fast N orm alised Cross-Correlation (Lewis,

1995)).

Current three-dim ensional (3D) object recognition and tracking algorithm s however are predom inantly based on feature extraction and feature m atching (e.g. spin im age fea­ tures by Johnson and Hebert (1999), Geometric Hashing (Lamdan and W olfson, 1988), Bounded Hough Transform (Greenspan et al., 2004), Random Sam ple Consensus (RAN- SAC) (Shan el al., 2004)). Approaches based on feature matching are furtherm ore used to deal with related problem s such as real-tim e Sim ultaneous Localisation and M apping (SLAM ) (e.g. Davison, 2003; Pupilli, 2006) and 3D modelling (e.g. Pan et al., 2009; Pollefeys et al., 2004; Tomasi and Kanade, 1992; Yan and Pollefeys, 2006).

There are m ore unconventional techniques (e.g. tensor factorisation (Vasilescu and Terzopoulos, 2007), integral images (Viola and Jones, 2001)) but they are m ostly applied to object detection. That is, the algorithm s detect the presence o f an object but do not estim ate its pose.

2.2 Existing FOSS for Machine Vision

A survey of existing free and open source software (FOSS) for machine vision has been conducted in order to find out about com m onalities of current algorithm s in use and how current com puter vision systems are im plemented.

Table 2.2 and Table 2.3 give an overview of noticeable com puter vision libraries. The libraries where checked against a set o f features. Each check mark signifies a feature being supported by a particular library. One can see that no library com pletely covers all the features which are typically required to develop an object recognition and tracking system as shown in Figure 2.1.

One can distinguish three different kinds of libraries: statically typed libraries,

(40)
[image:40.612.142.446.74.329.2]

Table 2.2: Existing FOSS libraries for M achine Vision I/II

feature Ble

p o C a m e ll ia C M V is io n li b C V D E as y V is io n F il te r s F r a m e w a v e G a m e r a G a n d a lf

Cam era Input Im age Files Video Files D isplay Scripting Warps H istogram s Custom Filters Fourier Transforms Feature Extraction Feature M atching GPL com patible

✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ? ✓ ✓ ✓ ✓

Table 2.3: Existing FOSS libraries for M achine Vision II/II

feature IT K /V T K IV T L T Il ib L u sh M im a s N A S A V . W . O p e n C V S c e n e L ib V IG R A

Cam era Input Image Files Video Files Display Scripting Warps Histogram s Custom Filters Fourier Transform s Feature Extraction Feature M atching GPL com patible

[image:40.612.141.445.432.691.2]
(41)

cally typed extensions for a dynam ically typed language, and dynam ically typed libraries.

2.2.1

Statically Typed Libraries

M ost com puter vision libraries are im plem ented in the statically typed C /C + + language. However C + + has a split type system. There are prim itive types which directly corre­ spond to registers of the hardw are and there are class types which support inheritance and dynam ic dispatch. In C + + not only integers and floating point num bers but also arrays are prim itive types. However these are the most relevant data types for im age processing. To im plem ent a basic operation such as adding two values so that it will w ork on different types, one needs to make extensive use o f tem plate meta-program m ing. That is, all com bi­ nations of operations, elem ent-type(s), and num ber of dim ensions have to be instantiated separately. For exam ple the Fram e W ave1 C-library has 42 explicitly instantiated different m ethods for m ultiplying arrays.

For this reason m ost libraries do not support all possible com binations o f elem ent- types and operations. A ssum e a library supports the following 10 binary operations

• addition (“+”) • subtraction • division (“/ ”) • m ultiplication (“*”) • exponent (“**”) • greater or equal (“>=”) • greater than (“>”) • less or equal (“<=”) • less than (“<”) • equal to (“==”)

Furtherm ore assume that it supports the following types as scalars and array elem ents • 6 integer types: 8-, 16-, and 32-bit, signed/unsigned

• 2 floating-point types: single/double precision

Finally for every binary operation there are the following variations • scalar-array operation

'http://framewave.sourceforge.net/

(42)

array-scalar operation • array-array operation

This results in 10 • 8 • 8 • 3 = 1920 possible com binations of operations and elem ent-types. T hat is, to fully support the 10 binary operations on these elem ent-types requires 1920 methods to be defined either directly or by means of C + + tem plate program m ing. That is, static typing and ahead-of-tim e com pilation leads to a explosion of com binations of basic types and operations. Listing 2.1 shows how much code is required w hen using C + + tem plates to im plem ent a elem ent-w ise “+” operator (array-array operation only) for the “b o o s t :: m u lti .array” data types provided by the Boost library. The im plem entation works on arrays of arbitrary dim ension and arbitrary element-type.

Static typing not only leads to an explosion of methods to instanciate. A related prob­ lem caused by static typing is that when a developer wants to m odify one aspect of the system, the static typing can force num erous rewrites in unrelated parts o f the source code (Tratt and Wuyts, 2007). Static typing enforces unnecessary “connascence” (a tech­ nical term introduced by Weirich (2009), also see A ppendix A .l) w hich interferes with the modularity of the software. In practise this causes problem s w hen im plem enting operations involving scalars, com plex numbers, and RGB-triplets (W edekind et al., b). Figure 2.2 shows that binary operations are not defined for some com binations of the argument types involved. That is, it is not sufficient to simply use C + + tem plates to

in-x + y, x X

•3

s

m

g o o

c/3 U PC

x • y, x /y

RGB

o o

RGB

o

Complex

oo

Complex

(3o

Scalar f( \ )

o(^)

Scalar

O o

22 13 o C/3 O u 00 o PC 22 13 o 00 o U CO a PC

x < y,x > y

RGB RGB

Complex

oo

Complex

Scalar

O

Scalar

o

22 13 o 00 X JD 'E-E o U 00

a

PC [image:42.612.155.429.389.683.2]
(43)

Listing 2.1: M ulti-dim ensional “+ ” operator im plem ented in C + + . Com m ent lines (pre­ ceded with “//”) show the output of the program

#include <boost/multi_array.hpp> / / 3726 li n e s o f code #include <iostream>

using namespace boost; template< typename T >

T &multi_plus(T &a, const T &b, const T &c) { a = b + c;

return a;

>

template< template< typename, s i z e _ t , typename > c la ss Arr, typename A llo c , typename T, s iz e _ t N >

d e t a i l : :m ulti_array::sub_array< T, N > multi_plus

(detail::m ulti_array::sub_array< T, N > a, const Arr< T, N, Alloc > &b, const Arr< T, N, Alloc > &c) {

typename Arr< T, N, Alloc >: :co n st_ itera to r j = b .b e g in O , k = c .b e g in O ; for (typename d e t a i l : :multi_array:: sub_array< T, N > : : it e r a t o r i =

a.beg in O ; i != a.endO; i++, j++, k++) m u lti_ p lu s (* i, * j , * k );

return a;

}

template< template< typename, s i z e _ t , typename > c la s s Arr, typename A lloc, typename T, s iz e _ t N >

Arr< T, N, Alloc > &multi_plus

(Arr< T, N, Alloc > &a, const Arr< T, N, Alloc > &b, const Arr< T, N, Alloc > &c) {

typename Arr< T, N, Alloc > : : con st_ itera to r j = b .b e g in O , k = c .b e g in O ; for (typename Arr< T, N, Alloc > : : it e r a t o r i = a.begin O ;

i != a.endO ; i++, j++, k++) m u lt i_ p lu s ( * i, * j, * k );

return a;

}

template < template< typename, s iz e _ t , typename > c la ss Arr, typename A llo c, typename T, s iz e _ t N >

multi_array< T, N > operator+

(const Arr< T, N, Alloc > &a, const Arr< T, N, Alloc > &b) { array< s i z e _ t , N > shape;

s t d : : copy (a. shape () , a.shapeO + N, shape. b eginO ) ; multi_array< T, N > retV al(sh ap e);

m u lti_p lu s(retV al, a, b ) ; return retV a l;

};

in t main(void) {

multi_array< i n t , 2 > a ( e x t e n t s [ 2 ] [ 2 ] ) ;

a [«][«] = 1; a[0] [1] = 2; a [l][® ] = 3; a [ l ] [ l ] = 4; multi_array< i n t , 2 > b ( e x t e n t s [ 2 ] [ 2 ] ) ;

b[8][®] = 5; b[® ][l] = 4; b [ l ] [ 8 ] = 3; b [ l ] [ l ] = 2; multi_array< in t , 2 > r(a + b ) ;

s t d : :cout « "[[" « r[«][®] « ", " « r [® ][l] « "], [" « r [ l ] [«] « ", " « r [1][1] « "]]" « std ::e n d l;

/ / [[

6

,

6

], [

6

,

6

]]

return 0;

(44)

stantiate all com binations of operations and argument types. One also has to address the problem that binary operations usually only are meaningful only for some com binations of element-types.

Finally using a com bination of m ultiple libraries is hard, because each library usually comes with its own set of data types for representing images, arrays, m atrices, and other elements of signal processing.

2.2.2

Statically Typed Extensions

Some com puter vision libraries come with bindings in order to use them as an exten­ sion to a dynam ically typed language. For exam ple for the O penC V 2 library there are Python bindings (P y C V ) as well as Ruby bindings (opencv.gem4). Some projects (e.g.

the G am era optical character recognition (OCR) software (Droettboom et a l , 2003) and the Camellia"' Ruby extension) use the Simplified W rapper G enerator (SW IG 6) to gener­ ate bindings from C /C + + header files. This allows one to use a statically typed extension in an interpreted language and it becom es possible to develop m achine vision softw are interactively w ithout sacrificing performance.

Open classes and dynam ic typing make it possible to seamlessly integrate the func­ tionality of one library into the application program m ing interface (API) of another. For exam ple Listing 2.2 shows how one can extend the N A rray7 class to use the R M agick8 library for loading images. The method “NArray#read” reads an image using the RM ag- ick extension. The im age is exported to a Ruby string which in turn is im ported into an object of type “NArray”. The image used in this exam ple is shown in Figure 2.3.

Figure 2.3: Low resolution im age of a circle

However supporting all possible com binations of types and operations with a statically typed library is hard (see Section 2.2.1). In practise most com puter vision extensions only provide a subset of all com binations. Listing 2.3 shows that the O penCV library for exam ple supports elem ent-w ise addition of 2D arrays of 8-bit unsigned integers (line 3).

2http://opencv.willowgarage.com/

3http://pycv.sharkdolphin.com/

4http://rubyforge.org/proj ects/opencv/

5http://camellia.sourceforge.net

6http://swig.org/

7http://narray.rubyforge.org/

[image:44.613.244.341.426.522.2]
(45)

Listing 2.2: Integrating RM agick and NArray in Ruby. Com m ent lines (preceded with “# ”) show the output o f the program

require ’narray’ require ’RMagick’ c la s s NArray

def NArray.read(filename)

img = Magick::Image.read(filename)[Q]

str = img. exp ort_pixels_to_str ®, Q, img.columns, img.rows, "I", Magick::CharPixel

to_na s t r , NArray: : BYTE, img.columns, img.rows end

end

arr = NArray.read ’c i r c l e .p n g ’ arr / 128

# NArray.byte(20,2Q):

# [ [ 1, 1, 1, 1, 1, 1, 1, ®, ®, ©, ©, ®, ®, 1, 1, 1, 1, 1, 1, 1 ] , # [ 1, 1, 1, 1, 1, ®, ®, ®, ®, ®, ®, ®, ®, ©, ®, 1, 1, 1, 1, 1 ] , # [ 1, 1, 1, ®, ®, ®, ®, ©, ®, ®, ®, ®, ®, ©, ©, ®, ®, 1, 1 , 1 ] , # [ 1, 1, ®, ®, ®, ©, ®, ®, ©, ®, ®, ®, ®, ®, ®, ®, ®, ©, 1, 1 ] , # [ 1, 1, ®, ®, ®, ®, ®, ®, ©, ®, ®, ©, ®, ®, ®, ®, ®, ®, 1, 1 ] , # [ 1, ®, ®, ®, ®, ®, ©, ®, ®, ®, ®, ®, ©, ®, ®, ®, ®, ®, ®, 1 ] , # [ 1, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ©, ®, ©, 1 ] , # [ ®, ©, ®, ®, ®, ®, ®, ®, ©, ®, ©, ®, ®, ®, ®, ®, ®, ®, ®, ® ] , # [ ®, ®, ®, ®, ®, ®, ®, ®, ©, ®, ®, ®, ®, ®, ®, ®, ®, ®, ®, ® ] ,

#

But trying to add elem ents of 8-bit unsigned and 16-bit unsigned will cause an exception

Listing 2.3: Using OpenCV in Ruby. Com m ent lines (preceded with “# ”) show the output of the program

1 require ’opencv’

2 include OpenCV

3 CvMat.new(6, 2, CV_8U) + CvMat.new(6, 2, CV_8U)

4 # cOpenCV: :CvMat:2x6,depth=cv8u, channel=3> 5 CvMat.new(6, 2, CV_8U) + CvMat.new(6, 2, CV_16U)

6 # (ir b ):4 : warning: OpenCV error code (-2Q5) : cvAdd (84® in cxarithm.cpp)

7 #0penCV::CvStatusUnmatchedFormats:

8 # from ( i r b ) : 4 : i n * + ’

9 # from ( ir b ) : 4

(line 5). Other libraries such as Easy Vision9 (an extension for Haskell) even have different method names depending on the types of arguments involved. For exam ple “a b s D iff8 u ” to com pute the elem ent-w ise absolute difference o f arrays of 8-bit unsigned integers or “s q r t3 2 f ” to com pute the elem ent-wise square root of arrays of 32-bit floating point values.

In contrast to the previously m entioned libraries, the N A rray10 (Tanaka, 2 0 10a,b) Ruby extension supports adding arrays with different elem ent-types (see Listing 2.4). The library also does optimal return type coercions. For exam ple adding an array with

9http://perception.inf.um.es/easyVision/

10http://narray.rubyforge.org/

(46)

Listing 2.4: Using NArray in Ruby. Comment lines (preceded with “# ”) show the output o f the program

r equire ’n array’ a = NArray.byte 6, 2 # NArray.byte( 6 , 2 ) :

# [ [ fi, fi, fi, ®, ®, ® ] , # [ ®, 0, ®, «, ®, ® ] ]

b = NArray.sint 6, 2 # NArray. s i n t ( 6 , 2 ) : # [ [ fi, fi, ®, 0, 0, ® ] ,

# [ ®, ®, ®, ®, «, ® ] ]

a + b

# N A r ra y .sin t( 6 , 2 ) :

# [ [ ®, 0, «, ®, ®, fi ] ,

# [ ®, fi, « , fi, ®, ® ] ]

2 * a + b

# N A r ra y .sin t( 6 , 2 ) : # [ [ 0, 0, 0, 0, fi, fi ] ,

# [ fi, fi, fi, fi, fi, fi ] ]

Listing 2.5: Array operations in Python using NumPy. Com m ent lines (preceded with “# ”) show the output of the program

1 from numpy import *

2 a = a r r a y ( [ [ l , 2 ], [3, 4 ] ] , dtype = in t8)

3 b = a r r a y ( [ [ l , 2 ], [3, 4 ] ] , dtype = u in tl6 ) 4 a + b

5 # a rra y ([[2 , 4 ],

6 # [6, 8 ] ] , dtype=int32) 7 2 * a + b

8 # a r r a y ( [ [ 3 , 6],

9 # [9, 12]], dtype=int32)

single precision com plex numbers (“NArray:: SCOMPLEX”) and an array with double preci­ sion floating point numbers (“NArray: :DFL0AT”) will result in an array of double precision com plex num bers (“NArray: :DC0MPLEX”). In contrast to OpenCV however, the NArray li­ brary does not support unsigned integers.

A sim ilar but more sophisticated library is N u m P y 11 for Python (also see Oliphant, 2006). NumPy also offers a C-API which makes it possible to define custom elem ent- types. Listing 2.5 shows that Num Py supports unsigned integer as well as signed integer types. In contrast to NArray the result of the type coercion is an array o f 32-bit integers (line 4). Sim ilar to the NArray library, NumPy is im plem ented in C and uses tables of function pointers to do operations on com binations of elements.

The problem with this approach is that the last operation shown in Listing 2.4 as well as Listing 2.5 (m ultiplying an array with two and adding another array) creates an array as interm ediate result. That is, the result of the scalar-array m ultiplication is w ritten to m emory and then read back again when the array-array addition is perform ed. Since the

(47)

Listing 2.6: Tensor operation with the FTensor C + + library

Index< ’i ’ , 3 > i ; Index< ’j ’ , 3 > j ; Index< ’k ’ , 3 > k;

Tensor2< double, 3, 3 > r , a, b; r ( i , k) = a ( i

Figure

Figure 1.3: ARM Gumstix boards
Figure 1.8: Software architecture of machine vision system
Table 2.2: Existing FOSS libraries for Machine Vision I/II
Figure 2.2: Binary operations for different element types (Wedekind et al., b)
+7

References

Related documents

Inquiry form included data about profession, seniority, questions regarding accidents at work (did the participant experienced an accident at work-needle stick injury or other,

moisture of extinction of live and dead fuel from standard fuel model 10 can parameterize the live and dead fuels present in sampled stands (Page and Jenkins 2007b). Custom fuel

Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you

Rather than relying entirely on the model to create single deterministic initial snow water storage, as currently implemented in operational forecasting, this study incorporates

Consistency at regional level with global projections; (2) Physical plausibility and realism , such that changes in different climatic variables are mutually consistent and

• The proposed block diagonal matrix based massive MIMO uplink detection scheme and the block Gauss-Seidel method for V2I communications have shown much reduced processing delay

It was a really complicated game and I was just saying that it looked really awesome and my husband was just like “I would not recommend this for your first game, this is

Do not install Epico Forest Products Solid hardwood flooring over concrete unless an acceptable plywood subfloor and vapor barrier is firmly anchored over sleepers and a