On Simultaneous Localization and Mapping inside the Human Body (Body-SLAM)

(1)

Doctoral Dissertations (All Dissertations, All Years)

Electronic Theses and Dissertations

2014-04-28

On Simultaneous Localization and Mapping inside

the Human Body (Body-SLAM)

Guanqun Bao

Worcester Polytechnic Institute

Follow this and additional works at:

https://digitalcommons.wpi.edu/etd-dissertations

This dissertation is brought to you for free and open access byDigital WPI. It has been accepted for inclusion in Doctoral Dissertations (All Dissertations, All Years) by an authorized administrator of Digital WPI. For more information, please [email protected].

Repository Citation

Bao, G. (2014).On Simultaneous Localization and Mapping inside the Human Body (Body-SLAM). Retrieved from

(2)

inside the Human Body (Body-SLAM)

by

Guanqun Bao

A Dissertation

Submitted to the Faculty

of the

WORCESTER POLYTECHNIC INSTITUTE

In partial fulfillment of the requirements for the

Degree of Doctor of Philosophy

in

Electrical and Computer Engineering

by

April 2014

APPROVED:

Professor Kaveh Pahlavan, Major Thesis Advisor Professor Yehia Massoud, Head of Department

Professor Lifeng Lai, ECE Dept., WPI Professor Emmanuel Agu, CS Dept., WPI

(3)

(4)

Abstract

Doctor of Philosophy

Body-SLAM: Simultaneous Localization and Mapping inside Human Body by Guanqun Bao

Wireless capsule endoscopy (WCE) offers a patient-friendly, non-invasive and painless investigation of the entire small intestine, where other conventional wired endoscopic instruments can barely reach. As a critical component of the capsule endoscopic exam-ination, physicians need to know the precise position of the endoscopic capsule in order to identify the position of intestinal disease after it is detected by the video source. To define the position of the endoscopic capsule, we need to have a map of inside the human body. However, since the shape of the small intestine is extremely complex and the RF signal propagates differently in the non-homogeneous body tissues, accurate mapping and localization inside small intestine is very challenging. In this dissertation, we present an in-body simultaneous localization and mapping technique (Body-SLAM) to enhance the positioning accuracy of the WCE inside the small intestine and reconstruct the tra-jectory the capsule has traveled. In this way, the positions of the intestinal diseases can be accurately located on the map of inside human body, therefore, facilitates the follow-ing up therapeutic operations. The proposed approach takes advantage of data fusion from two sources that come with the WCE: image sequences captured by the WCE’s embedded camera and the RF signal emitted by the capsule. This approach estimates the speed and orientation of the endoscopic capsule by analyzing displacements of fea-ture points between consecutive images. Then, it integrates this motion information with the RF measurements by employing a Kalman filter to smooth the localization results and generate the route that the WCE has traveled. The performance of the proposed motion tracking algorithm is validated using empirical data from the patients and this motion model is later imported into a virtual testbed to test the performance of the alternative Body-SLAM algorithms. Experimental results show that the proposed Body-SLAM technique is able to provide accurate tracking of the WCE with average error of less than 2.3cm.

(5)

First and foremost, I would like to express my deepest appreciation to my advisor Professor Kaveh Pahlavan, not only for his guidance and support in academics and research, but also for enlightening me by sharing his insights and life experience. I cannot remember how many times professor Pahlavan has provided creative discussions and advices when I was stuck with my research. He inspired me with his short didactic stories which contain the true philosophy of life. My work and accomplishments were only possible because of his help and encouragement.

I also would like to thank the members of my PhD committee, Prof. Allen H. Levesque for always being kind and supportive and bringing me into this prestigious research lab, Professor Massoud for sharing his insights when I was hesitating between life choices, Professor Agu for his excellent lectures on Computer Graphic, which lays down the fundamentals of my thesis, Professor Lai for his valuable suggestions and comments towards my thesis and Professor Kamran Sarafian for being my external committee member off campus. I also would like to extend my thanks to Dr. David Cave from Umass Medical Center for introducing us to the field of localization of wireless capsule endoscopy and educating us with the knowledge of his expertise.

I also would like to show my applications to the former and current members of the CWINS lab: Dr. Yunxing Ye, Dr. Jie He, Ruijun Fu, Shen Li, Jin Chen, Xin Zheng, Zhuoran Liu, Yishuang Geng, Bader Alkandari, Fardad Askarzadeh, for their direct or indirect help in preparing this thesis. Special thanks to Liang Mi for working with me on the emulation testbed and being so hardworking and productive.

Finally, and most importantly, I would like to thank my wife Zhijiao Wang for her longstanding support, encouragement and unwavering love during the past six years. I thank my parents, Yuling Zhao and Lichun Bao, for their faith in me. Without their support I would’t have a chance to achieve today’s accomplishments.

(6)

Abstract ii

Acknowledgements iii

Contents iv

List of Figures vii

List of Tables x

Abbreviations xi

Symbols xiii

1 Introduction 1

1.1 Evolution of Wireless Capsule Endoscopy (WCE) . . . 2

1.2 Motivation . . . 3

1.3 Contributions . . . 5

1.4 Outline of the Dissertation. . . 7

2 Challenges in WCE Localization 9 2.1 Overview of Wireless Capsule Endoscopy (WCE) . . . 10

(7)

2.2 Literature Review . . . 14

2.3 RF Localization Techniques . . . 16

2.3.1 RSS based techniques . . . 17

2.3.2 ToA based techniques . . . 19

2.3.3 Localization Algorithms . . . 20

2.3.3.1 least square algorithm . . . 21

2.3.3.2 maximum likelihood algorithm . . . 22

2.3.4 Cramer-Rao Lower Bound (CRLB) . . . 24

2.4 Challenges of Localization inside Small Intestine . . . 27

3 Design of Algorithms for Body-SLAM 31 3.1 Formulation of Body-SLAM . . . 32

3.2 Motion Tracking using Endoscopic Images . . . 34

3.2.1 Analyzing the Content of Endoscopic Images . . . 34

3.2.1.1 Image segmentation using SRM . . . 34

3.2.1.2 Image classification using SVM classifier. . . 37

3.2.2 Feature Points Matching. . . 39

3.2.3 Image Unrolling . . . 41

3.2.4 Speed Estimation. . . 43

3.2.5 Direction of Moving Estimation . . . 45

3.3 Data Fusion of Visual and RF Information. . . 48

3.3.1 Kalman Filter. . . 49

3.3.2 Relative Position Predictions using Images . . . 52

3.3.3 Absolute Position Measurements by RF Localization . . . 54

(8)

4 Performance Evaluation of Body-SLAM 58

4.1 Empirical Results of Motion Tracking . . . 58

4.1.1 Speed Estimation using PillCam COLON 2 . . . 60

4.1.2 Statistical Speed Modeling . . . 61

4.2 Design of Testbed for Performance Evaluation . . . 68

4.2.1 Visual Component . . . 70

4.2.1.1 Physical testbed . . . 70

4.2.1.2 Virtual testbed . . . 72

4.2.2 RF Component . . . 79

4.2.2.1 RF propagation emulation using FDTD . . . 79

4.2.2.2 RSS vs ToA . . . 81

4.3 Performance Evaluation for Body-SLAM . . . 82

5 Conclusion and Future Direction 92 5.1 Conclusion . . . 92

5.2 Future Direction . . . 93

A Appendix: Full Publication List 94 A.1 Related to the Thesis. . . 94

A.2 Not Related to the Thesis . . . 96

B Appendix Tutorial 98

(9)

2.1 The architecture of WCE . . . 12

2.2 Wireless Capsule Endoscopy. . . 12

2.3 A typical RF localization system . . . 17

2.4 A typical 3D pattern of body mounted sensors used as reference points of the performance evaluation scenario for localization of the WCE . . . 25

3.1 Overall flow chart of Body-SLAM . . . 33

3.2 Two basic categories of endoscopic images . . . 35

3.3 Two sequences of segmented image with Q = 16 Top 2 rows: FT, Bottom 2 rows: FL . . . 36

3.4 Illustration of feature mapping using Kernel function . . . 39

3.5 A WCE moving inside the small intestine . . . 40

3.6 Feature matching between two consecutive images using A-SIFT . . . 41

3.7 Image acquisition system of WCE. . . 41

3.8 The process of ”unrolling” the cylindrical image. . . 43

3.9 Speed estimation . . . 45

3.10 Direction of moving of the capsule . . . 45

3.11 Direction of moving estimation . . . 46

3.12 A typical RF localization system . . . 55

(10)

3.13 A complete flowchart of data fusion of images and RF measurements using

a Kalman filter . . . 57

4.1 Some typical landmarks for the WCE . . . 59

4.2 PillCam COLON 2 with double cameras . . . 61

4.3 Speed estimation results from PillCam COLON 2 double cameras . . . 62

4.4 Statistics of speed estimation using PillCam COLON 2 double cameras . 63 4.5 PDF of the speed estimation from different individuals . . . 64

4.6 CDF of the speed estimation from different individuals . . . 64

4.7 Speed estimation results of a sequence of real endoscopic images . . . 66

4.8 A typical speed pattern of moving fast . . . 67

4.9 A typical speed pattern of moving slow . . . 67

4.10 Design of emulation testbed for quantitative performance evaluation . . . 69

4.11 A physical visual model for the small intestine (a) wired endoscopic cam-era (b) appearance of the physical model (c) pictures taken from inside the physical model . . . 71

4.12 Mapping the physical testbed into virtual 3D space . . . 74

4.13 3D testbed . . . 75

4.14 Emulated endoscopic images from virtual visual testbed . . . 75

4.15 3D path generation from a 3D GI tract model . . . 77

4.16 Emulation testbed set up . . . 78

4.17 RF propagation setup using SEMCAD X . . . 80

4.18 RSS versus distance (left) and Time-of-Arrival (TOA) versus distance (right) inside human body . . . 81

(11)

4.20 Result of the motion tracking compared with ground truth . . . 86

4.21 Mean square error (MSE) in the motion racking process . . . 87

4.22 Localization results of different algorithms and performance evaluation . . 89

4.23 Error distributions of different algorithms . . . 90

4.24 Performance evaluation by CDF plot of different algorithms . . . 91

B.1 Generating the path inside human body . . . 100

(12)

2.1 FDA-approved wireless capsule systems and specifications . . . 13

2.2 Parameters for the statistical implant to body surface pathloss model . . 18

4.1 Motion tracking performance for each step . . . 86

(13)

ASIFT Affine Scale InvariantFeatureTransform BAN Body Area Network

CDF Cumulative DistributionFunction CT Computer Tomography

CRLB Cramer Rao LowerBound

FDA Food andDrug and Administration FCC FederalCommunicationsCommission FP FeaturePoints

GI GastroIntestinal

GPS GlobalPositioningSystem HOG Histogram of OrientedGradients IPS Indoor Positioning System ISM IndustrialScientific andMedical IMU Inertial MeasurementUnit LBP LocalBinaryPattern LS Least Square

KF KalmanFilter

MLE MaximumLlikelihoodEstimation

(14)

MSE MeanSquareError

MRI MagneticResonance andImaging

NIST NationalInstitute of Science andTechnology PDF Probability Density Function

RSS ReceivedSignalStrength SNR Signal toNoise Ratio

SLAM Simultaneous LocalizationAndMapping SRM Statistical RegionMerging

SIFT Scale InvariantFeatureTransform ToA Time of Arrival

(15)

In n×n identity matrix

(.)T _{matrix transpose}

(.)H n×n matrix conjugate transpose

|S|i ithcolumn of matrixS

|S|i,j i, jthcolumn of matrixS

|S|i,j signum function

sign(.) i, jthcolumn of matrixS

diag(A) vector resulting from extraction of diagonal elements ofA

tr(.) matrix trace

▽(f) gradient with respect tof

E(.) expectation

δ(.) discrete Kronecker delta function

Bel−₍_.₎ _{prior belief}

Bel+(.) posterior belief

(16)

Introduction

Many of the profound innovations in science and engineering start with metaphors

pre-sented in the science fictions. The wireless information networking industry was

mo-tivated by the Captain Kirk’s communicator in the 1960s science fiction series “Star

Trek”. The idea was formed in the early 1980s; the Federal Communications

Commis-sion (FCC) released the Industrial, Scientific and Medical (ISM) bands; the IEEE 802.11

standardization committee created the WLAN standard in 1997 [1]. After almost half a

century, modern smart phones are what the evolution of the “Star Trek” communicator

fantasy brought to us. Recently, another 1960s science fiction, the “Fantastic Voyage”,

in which a space craft with its crew were shrunken to become a micro-device capable

of traveling inside human body to remove a brain clot, has stimulated a new wave of

innovative science and engineering for the Body Area Network (BAN) [2–5]. That space

craft lost its navigation capabilities and went through an unguided dramatic traveling

experience within the human body before it exits through tears from the eye of the

human subject. Today, wireless endoscopic capsules are traveling inside the digestive

system in the same way as the space craft in the fantastic voyage traveled and one can

(17)

envision emergence of a number of other similar applications for micro-robots inside the

human body.

1.1

Evolution of Wireless Capsule Endoscopy (WCE)

Endoscopy [6] is a medical procedure used to examine the interior wall of the digestive

system. According to a study conducted in 2002 [7], approximately 19 million people in

the United States were estimated to be affected by disorders of the small intestine. This

statistic indicates that effective advancements in endoscopy technology are extremely

worthy of investigation. When using the conventional endoscopic instrument, a long

flexible tube with a miniature camera needs to be inserted through the mouth or the

anus in order to get into the gastrointestinal (GI) tract. Owing to its rigidity and large

size, it causes much discomfort to whoever undergoes this procedure. This generally

limits the willingness of patients to have their GI tract examined regularly. Furthermore,

the lack of capability to reach the entire small intestine is also a significant shortcoming

of the current wired endoscope.

Wireless Capsule Endoscopy (WCE) [8–11], a significant step in the efforts of developing

a more effective endoscopy technique, was invented to overcome the above limitations.

The first WCE prototype for the small intestine was approved by the Food and Drug

Administration (FDA) in 2001. Over subsequent years, this technology has been evolving

into one of the most popular non-invasive imaging tools of the intestinal disease diagnosis.

WCE is a pill-shaped device which consists of a short focal length CMOS camera, light

source, battery and radio transmitter [12,13]. After the endoscopic capsule is swallowed

by a patient , this miniature device propelled by peristalsis of GI tract begins to work

(18)

000 images) while moving along the GI tract. At the same time, images are sent out

wirelessly in Ultra High Frequency (UHF) at 432 MHz to a small portable recorder

attached to the waist [14]. The images are subsequently downloaded from the portable

recorder to a workstation for analysis off line. The whole examine process takes about

8 h, during this period, the patient do not need to be confined to a hospital or clinic

environment during the examination and is free to continue their daily routine. Up

to now, WCE has been used to detect the following diseases [15–17] small intestinal

blooding, Crohn disease, ulcer, tumors, vascular lesions and colon cancers.

1.2

Motivation

Although WCE provides a non-invasive wireless imaging technology for observing the

entire GI tract, one significant drawback of this technology is that it cannot localize

itself during its several hours journey. Therefore, when an abnormality is detected by

the video source, the physicians have limited idea where the abnormality is located which

prevents the following up therapeutic operations being executed immediately. Therefore,

having a precise localization system for the endoscopic capsule would greatly enhance

the benefits of WCE.

However, localization of the WCE inside the human body is not trivial. There are some

fundamental technical challenges which make accurate localization inside human body

a difficult task.

• First, we don’t have a clear map of inside human body. A map of the digestive

system plays a very important role in refining the localization results [18–20] since

(19)

computed tomography (CT) and magnetic resonance imaging (MRI) imaging tools

are not able to provide enough resolution to extract the path of the small intestine.

• Second, conventional single source localization techniques, for example RF

lo-calization techniques, cannot provide satisfactory lolo-calization results due to the

non-homogeneity and severe attenuation of body tissues [21]. We need to design

more complicated hybrid localization algorithms that integrate all possible data

sources to enhance the localization accuracy. To do this, we need researchers with

multidisciplinary background including wireless localization, robotics and image

processing etc.

• Third, validation of existing localization algorithms are challenging [22]. After

the capsule is swallowed by the patient, we have limited control of the endoscopic

capsule. Exploratory clinical procedures such as planar X-ray imaging and

Ultra-sound cannot be easily used for verifying the position and motion status of the

capsule due to their high cost and potential risk to the patient’s health.

• Last but most importantly, operating experiments inside human body is extremely

difficult. As we mentioned previously, there are practical challenges to verify the

performance of any localization algorithm. Moreover, human subjects are different

from one and another, we need a uniform platform to do comparative performance

evaluation for different algorithms.

These challenges make deign of an accurate localization system for the WCE inside

human body a unsolvable engineering problem for more than 13 years. And this became

the motivation of my research : To design a localization system that is able to precisely

localize the endoscopic capsule as it travels along the digestive system and meanwhile

(20)

1.3

Contributions

To meet the challenges introduced above, in this dissertation, we present an in-body

simultaneous localization and mapping technique (Body-SLAM) to enhance the

po-sitioning accuracy of WCE inside the small intestine and meanwhile reconstruct the

trajectory the capsule has traveled. The contributions of this multi-disciplinary and

inter-disciplinary dissertation are:

• Design and performance evaluation of a Body-SLAM algorithm to accurate

local-ize the position of WCE and reconstruct the 3D map the capsule has traveled.

The proposed Body-SLAM technique estimates the speed and orientation of the

endoscopic capsule by analyzing displacements of feature points between

consecu-tive images and this motion information is integrated with the RF measurements

by employing a Kalman filter to smooth the localization results and the generated

3D map.

• To achieve this objective, we modeled the motion of the endoscopic capsule using

empirical data obtained from a actual patients. This motion model is further

imported into a emulation testbed for performance evaluation.

• We designed a tested for performance evaluation of hybrid localization algorithms

that benefits from content of the endoscopic images as well as the features of the

RF signal emitted from the video capsule. We used this testbed to demonstrate

the effectiveness of hybrid localization algorithms for Body-SLAM inside small

intestine.

(21)

1. K. Pahlavan, G. Bao, Y. Ye, S. Makarov, U. Khan ... K. Sayrafian, “Rf

local-ization for wireless video capsule endoscopy”. International Journal of Wireless

Information Networks, Vol.19 (4), pp.326-340, 2012.

2. G. Bao, Y. Ye, U. Khan, X. Zheng and K. Pahlavan, “Modeling of the Movement

of the Endoscopy Capsule inside GI Tract based on the Captured Endoscopic

Images”, The 2012 International Conference on Modeling, Simulation and

Visual-ization Methods (MSV), Las Vegas, USA, July, 2012.

3. G. Bao and K. Phalavan, “Motion Estimation of the Endoscopy Capsule using

Region-based Kernel SVM Classifier”, 2013 IEEE International Conference on

Electro/Information Technology (EIT), Rapid City, SD, May 9-11, 2013.

4. G. Bao, L. Mi and K. Phalavan, “Emulation on Motion Tracking of Endoscopic

Capsule inside Small Intestine”, 2013 World Congress in Computer Science,Computer

Engineering, and Applied Computing (WORLDCOMP’13), Las Vegas, USA, 2013.

5. G. Bao, L. Mi and K. Phalavan, “A Video Aided RF Localization Technique

for the Wireless Capsule Endoscope (WCE) inside Small Intestine”, 8th

Interna-tional Conference on Body Area Networks, Boston, Massachusetts, United States,

September 30 - October 2, 2013.

6. L. Mi, G. Bao and K. Phalavan, “Design and Validation of a Virtual Environment

for Experimentation inside the Small Intestine”, 8th International Conference on

Body Area Networks, Boston, Massachusetts, United States, September 30 -

Oc-tober 2, 2013.

7. R. Fu, G. Bao and K. Pahlavan, “Activity Classification with Empirical RF

Prop-agation Modeling”, 8th International Conference on Body Area Networks, Boston,

(22)

8. L. Mi, G. Bao and K. Pahlavan, “Geometric Estimation of Intestinal Contraction

for Motion Tracking of Video Capsule Endoscope”, SPIE Medical Imaging:

Image-Guided Procedures, Robotic Interventions, and Modeling, San Diego, California,

February 15-20, 2014.

9. G. Bao, L. Mi, Y. Geng and K. Pahlavan, “A Computer Vision based Speed

Estimation Technique for Localizing the Wireless Capsule Endoscope inside Small

Intestine,” submitted to Signal Processing Letters, IEEE, April, 2014.

10. G. Bao, L. Mi, Y. Geng, M. Zhou and K. Pahlavan, “A Video-based Speed

Esti-mation Technique for Localizing the Wireless Capsule Endoscope inside

Gastroin-testinal Tract, ” submitted to IEEE Engineering in Medicine and Biology Society

(EMBC 14), March, 2014.

11. M. Zhou, G. Bao and K. Pahlavan, “Mutual Information based Motion Tracking

Technique for the WCE inside Large Intestine”, submitted to IEEE Engineering

in Medicine and Biology Society (EMBC 14), March, 2014.

12. G. Bao, L. Mi and K. Pahlavan, “Hybrid Localization of Micro-robotic

Endo-scopic Capsule inside Small Intestine by Data Fusion of Vision and RF Sensors”,

submitted to Sensor Journal, IEEE, March, 2014.

A full publication list can be found in AppendixA.

1.4

Outline of the Dissertation

This dissertation focuses on the hybrid localization which we called “Body-SLAM” for

the wireless capsule endoscopy and testbed design for comparative performance

(23)

is organized as follows: in chapter 2, we give a overview of the existing localization

technologies of WCE and addressed the technical challenges in this field. In Chapter 3,

we present a hybrid localization technique, which we called “Body-SLAM”, that uses

endoscopic images for motion tracking and combines the motion information with the

RF signal radiated from the capsule to enhance the localization accuracy, and

mean-while reconstruct the trajectory the capsule has traveled. In chapter 4, performance

evaluation of the proposed localization algorithm are given by using empirical data and

design of emulation testbed. Finally, we conclude the dissertation in chapter 5 and give

(24)

Challenges in WCE Localization

While physicians can receive clear images of the interior of the entire digestive system

using WCE, they have little idea of the exact location of the capsule when an

abnor-mality is found by the video source. To localize intestinal abnormalities, physicians

have to administrate successive radiological, endoscopic or surgical operations, which

are invasive and potentially harmful to patient’s health. If we could develop a wireless

localization system to localize these devices, not only can physicians diagnose the

medi-cal diseases, but they can also learn where the diseases are located. However, designing

such a localization system is a very challenging task. In this chapter, we review the

existing localization techniques, especially the RF localization techniques, discuss their

limitations and address the challenges in designing localization system for inside human

body.

(25)

2.1

Overview of Wireless Capsule Endoscopy (WCE)

Wireless Capsule Endoscopy (WCE) is a pill-shaped device which consists of a short focal

length CMOS camera, light source, battery and radio transmitter [12,13] as shown in

Figure2.1. After the endoscopic capsule is swallowed by a patient, this miniature device

begins to work and record images at least 2 frames per second while moving along the

GI tract. At the same time, images are sent out wirelessly to a data recorder attached

to the patient’s waist. The whole process takes about 8 h, then all the image data

are downloaded into a work station and physicians could inspect the whole video and

diagnosis diseases in the GI tract. Being such an innovative technique without cable

connection, WCE offers a patient-friendly, non-invasive and painless investigation of

the entire GI tract, especially the small intestine, where other conventional endoscopic

instrument can barely reach. Up to now, WCE has been used to detect the following

diseases: small intestinal blooding, Crohn disease, ulcer, tumors, vascular lesions and

colon cancers [15–17].

A typical capsule endoscopy system consists of 3 components shown in Figure 2.2

[12,13]:

1. A wireless capsule endoscope

All capsule endoscopes have similar components: a disposable plastic capsule, a complementary metal oxide semiconductor or high-resolution charge-coupled de-vice image capture system, a compact lens, white-light emitting diode illumination sources, and an internal battery source.

2. A sensing system with sensing pads or a sensing belt to attach to the patient, a

(26)

The mode of data transmission is either via ultra-high frequency band radio teleme-try (PillCam, EndoCapsule) or human body communications (MiroCam). The lat-ter technology uses the capsule itself to generate an electrical field that uses human tissue as the conductor for data transmission. Currently PillCam SB2 and Miro-Cam are available with extended battery life, which may be beneficial in patients with delayed small-bowel transit.

3. A personal computer workstation with proprietary software for image review and

interpretation.

Major visualization systems are RAPID Reader from Given Imaging, WS-1 En-doCapsule from Olympus America and MiroView from IntroMedic.

(27)

1 2 3 4 5 6 7 8 Optical dome Lens holder Lens Illuminating LEDs CMOS imager Battery RF transmission model Antenna

Figure 2.1: _{The architecture of WCE}

(a) (b) (c)

(28)

te r 2. C h a lle n ge s in W C E L oc a liz a tio n 13

Table 2.1: _{FDA-approved wireless capsule systems and specifications}

WCE company Size, mm Weight View angle Frame rate Battery life Resolution

EndoCapsule

Olympus America 11×26 3.5 g 145o 2 /sec 8 hours 512×512

PillCam SB2

Given Imaging 11×26 2.8 g 156o 2 /sec 8 hours 256×256

PillCam SB3

Given Imaging 11×26 2.8 g 156o _2-6/sec _{12 hours} ₃₂₀_×₃₂₀

PillCam SB2EX

Given Imaging 11×26 3.3 g 156o 2 /sec 12 hours 256×256

MiroCam

(29)

2.2

Literature Review

WCE provides a noninvasive way to inspect the entire small intestine. As a critical

component of capsule endoscopic examination, physicians need to know the precise

po-sition of the endoscopic capsule in order to identify the popo-sition of intestinal disease

after it is found by the video source [23–25]. The follow up therapeutic operations and

effect of drug administration are heavily dependent on the accuracy of capsule’s

posi-tion informaposi-tion [26]. Therefore, having a precise and reliable localizaposi-tion system plays

an important role in enhancing the benefits of WCE. During the past few years, many

attempts have been made to develop accurate and reliable localization systems for the

WCE. A good review of existing localization techniques is given in [27]. These

technolo-gies can be divided into those using magnetic field [28–32] or inertial systems [33], using

image processing techniques [34] and techniques using RF signals [4,35–37].

In magnetic sensing based techniques, a magnet is inserted into the WCE and the WCE

is located by measuring the magnetic field [38,39]. This technique increases the weight

and size of the WCE and the magnetic field of the WCE used for localization will be

interfered by the external magnetic fields used for other applications such as the Magnetic

Resonance Imaging (MRI) systems. One can also insert radiation opaque material into

the WCE and trace the location of the WCE using X-ray or Computed Tomography

(CT) scan. Continuous imaging using X-ray or CT scan is very expensive and it bears

the health risks for the patient [27,40].

In [33], Ciuti and his colleagues magnetic inertial sensing based localization system.

They inserted a three-axis accelerometer LIS331DL into the capsule. the This inertial

sensing not only provides the approximate location and orientation of the capsule in

(30)

magnetic link between the external permanent magnet and the capsule. However, it

would be difficult to make a compact capsular mechanism to be swallowable with the

integration of such a inertial sensing subsystem and four cylindrical magnets. Also, this

localization technique only offers rough spatial information (an average error of 3 cm)

without data in a vertical direction.

Besides the magnetic field based and inertial sensing based techniques, using computer

vision based technique for localization the WCE is being investigated [34,41–44].

Be-cause the capsule endoscope changes position and direction very slow, some identical

areas exist in the successive two endoscopic images, so we can find the correspondent

point pairs in these two images. Using the image correspondences, we can determine the

motion (rotation and translation) parameters of capsule endoscope with an appropriate

algorithm. This approach can be a complementary method for improving the magnetic

localization and orientation method.

Using the RF signal used for image transmissions for the WCE to also locate the capsule

offers a natural and low cost solution that does not add to the capsule extra complexity

and payload [4,45–49]. Therefore, it has been chosen for use with the smartpill capsule

in USA and the M2A capsule in Israel. RF signal has been widely used for locating

an object in both outdoor and indoor environments with the accuracy achieved up to

hundreds of millimeters [19,50]. Nevertheless, applying radio frequency in the task of

tracking an object when it moves inside a special environment, such as the GI tract,

is a challenge. This is because high-frequency signals suffer significant attenuation at

different levels when they pass through different living tissues, whereas low-frequency

signals due to their long wavelengths are not able to deliver the desired precision of

sev-eral millimeters. The most commonly used RF techniques are Received Signal Strength

(31)

of using RF signal for localization and address their limitations and challenges when

applied inside human body.

2.3

RF Localization Techniques

The wireless localization industry was initiated by Global Positioning System (GPS)

for outdoor navigation in early 1970s and later evolved into Indoor Positioning System

(IPS) in 1990s [51–57]. Soon after, with the release of Body Area Network (BAN)

IEEE 802.15.6 standard and arising of implantable micro-robots, the future trend of

this localization technique is moving inside the human body [4,58–60]. The first major

application for this localization technology is the wireless capsule endoscopy [7,29,61–

63].

A commonly used RF localization infrastructure is to attach many calibrated external

RF sensors to the anterior abdominal wall of the human body to detect the RF signal

emitted by the wireless capsule as shown in Figure 2.3. By interpreting the

charac-ter of the received signal (RSS or ToA) into distance between the capsule and body

mounted sensor array, position of the capsule can be estimated by pattern matching

algorithms such as least square algorithm and maximum likelihood algorithm [28,36].

However, RF localization of micro-robots inside humans is not trivial. Compared to

outdoor and indoor environments, the inside of the human body is a complex

environ-ment making engineering design and visualization a formidable task [64]. The inside of

the human body is an extremely complex medium for RF propagation because it is a

non-homogeneous liquid-like environment with irregularly shaped boundaries and severe

path-loss. Things become more complex when it comes inside human body since the

(32)

used as references for localization are also in motion. More importantly, reliable designs

need testing the hardware implementation, but we cannot easily test devices inside the

human bodies. Therefore, existing RF localization systems sometimes end up providing

discontinuous and scattered estimations with large errors.

Figure 2.3: _{A typical RF localization system}

2.3.1 RSS based techniques

The name of “wireless” capsule endoscope indicates its capability to transmit the images

by RF signal. The transmitter embedded inside the capsule sends endoscopic images,

which are captured during its travel along the GI tract, to several receivers placed

uniformly on the exterior of the patient abdomen as shown in Figure 3.12. Taking

advantage of this integrated function, people can measure the strength of the received

RF signals at each sensor and use each sensor as a reference node to localize the capsule

(mobile node). The tracking algorithm is based on the observation that the closer the

(33)

Table 2.2: _{Parameters for the statistical implant to body surface pathloss model}

Implant to body surface LP(d0) dB α σdB

Deep tissue 47.14 4.26 7.85 Near surface 49.81 4.22 6.81

the RSS reading and the distance from the transmitter to the receiver can be expressed

by a pathloss model as given below [45,46,65,66]:

RSS(d) =Pt−P L(d0)−10αlog10

d d0

+S(d > d0) (2.1)

where d is the distance between transmitter and receiver, P T is the transmit power,

P L(d0) is the path loss for a reference distance d0 (i.e. 50 mm), α is the path loss

gradient which is determined by the propagation environment. For example, in free

space, α equals to 2. Since the human body tissue strongly absorbs RF signal, a much

higher value for the path loss gradient is expected for inside human body. Sis a Gaussian

random variable caused by shadow fading. From Eq. 2.1, the distances between the

capsule and each of the sensors can be roughly estimated by the RSS readings. Then,

the capsule’s location can be calculated using trilateration method.

A propagation attenuation model plays a vital role in the RSS technique. In order to

reduce the positioning error, it is necessary to develop an appropriate implant to body

surface path loss model. The parameters of one of the most cited signal attenuation

model developed by National Institute of Standards and Technology (NIST) at MICS

band are summarized in Table 2.2.

The empirical model mentioned previously is not accurate enough for the complex

envi-ronment of the GI tract. The model was developed by National Institute of Standards

(34)

Instead of using a signal propagation model, another RSS based localization scheme is

called “finger printing” technique [47]. The way of finger printing technique works is to

create a lookup table for position estimation first. Offline measurement survey needs to

be done in advance, in which at each position of the capsule, both the corresponding

signal strength measured by each of the sensors and its position data were recorded into

the table. During the experiment, online data were compared with the data stored in the

lookup table to find the closest match and thus to select the most appropriate position.

However, since we don’t have a map of inside the body to do the survey and people are

different in term of body shape, this method doesn’t have too much practical value.

2.3.2 ToA based techniques

For RF based localization, a widely known benefit of ToA based techniques is their high

accuracy compared to RSS based techniques [60,67]. The ToA based technique relies on

measurements of travel time of signals between the known reference nodes and unknown

mobile node. Ranging distance is calculated by multiplying the propagation velocity of

RF signal and the measured ToA value [60,68].

di =c×τi (2.2)

However, since the human body is formed of tissues with different characteristics of

conductivity and relative permittivity, the RF signal propagates with various speed

through different organs [69]. These variations in the speed are the dominant source

of error for the ToA-based RF localization inside the human body. Also, in near-field

application, time-based methods are difficult because radio waves travel with a very high

(35)

required in order to obtain the position resolution of 0.3 m. Another geometric location

method, time difference of arrival (TDoA), does not have these disadvantages. All it

needs is a transmission that has a recognizable unambiguous starting point. The data

used in the location calculations is the time difference in the reception of that starting

point at the several reference nodes, and not the actual time of flight of the signal from

the target to the fixed sensors. But in order to have sufficient data to find the mobile’s

coordinates, TDoA requires one more reference node than ToA.

2.3.3 Localization Algorithms

In every ranging based localization, the position of the mobile node is determined as

the intersection of the spheres [70], of which centers are the coordinates of the reference

nodes and radius are the ranging distancemi between the reference nodes

xi yi zi

T

and the target node

x y z

T

, where

m2_i = (x−xi)2+ (y−yi)2+ (z−zi)2 (2.3)

Since inside the human body is an non-homogeneous environment, there is difference

between the true distance and the ranging distance using ToA. Therefore, the spheres do

not always intersect at one single point. The goal of the localization algorithm is to find

out the best estimation of the target’s actual position based on the noisy measurements.

Two most commonly used optimal estimation algorithms are least square algorithm and

maximum likelihood algorithm. In the following subsections, we explained how these

(36)

2.3.3.1 least square algorithm

In least square (LS) algorithm [71,72], at least three reference nodes are needed to solve

the least square problem. Substituting

x′=x−x1 y′ =y−y1 z′=z−z1 (2.4) and

x′i =xi−x1 (i= 2,3) (2.5)

into Eq. 2.3and subtracting the first one (i= 1) successively from it for i= 2,3 results

in an equation set in the matrix form as

            x2−x1 y2−y1 z2−z1 x3−x1 y3−y1 z3−z1 ... xn−x1 yn−y1 zn−z1                     x y z         = 1 2             m2₁−m2₂+k2−k1 m2₁−m2₃+k3−k1 ... m2₁−m2_n+kn−k1             (2.6) where ki =x2i +yi2+z2i (2.7) it can be denoted as 2At=b (2.8)

(37)

where t= x y z T (2.9) A=             x2−x1 y2−y1 z2−z1 x3−x1 y3−y1 z3−z1 ... xn−x1 yn−y1 zn−z1             (2.10) b=             m2 1−m22+k2−k1 m2₁−m2₃+k3−k1 ... m2 1−m2n+kn−k1             (2.11)

The solution can be obtained by using the least square method [72,73]:

t= 1

2(A

T

A)−1ATb (2.12)

2.3.3.2 maximum likelihood algorithm

This section talks about how to using maximum likelihood (ML) algorithm [74,75] to

do the localization. Assume the RSS measurements intensity for each sensor is

Ri=γi K X k=1 Ck |ρk−ri|α +ωi (2.13)

(38)

where Ri is the t-th sample: γi is gain factor, Ck is intensity of the k-th contaminant

source, ρk is the position of the k-th source, ri is the position of the mobile node, ωi is

the background noise.

Eq. 2.14can be also expressed as

Ri=γi

C

m2_i +ωi (2.14)

wheremi is shown in Eq.2.3, which is the Euclidean distance between the mobile node

and sensor nodes.

Setting ξi = (ωi−µi)/σi ∼ N(0,1), (_sigmaRi−µi)i ∼N( γi σi C m2 i

,1), we can define the following

matrix notation: Z = (R1−µ1) σ1 , (R2−µ2) σ2 ... (RN−µN) σN T (2.15) G=diag γ1 σ1, γ2 σ2... γN σN (2.16) D = 1 m2 1 ,_m12 2 ..._m12 N T (2.17) ξ= ξ1, ξ2...ξN T (2.18)

We use MLE method to estimate the location. The joint probability density function

(39)

f(Z|θ) = (2π)N/2exp

−1₂(Z−GDC)T₍_Z₋_GDC₎

(2.19)

its log likelihood function is:

L(θ)∼ −1 2 N X i=1 Zi−γi C m2 i =−1₂ N X i=1 Ri_σ−_iµi −γi C m2 i (2.20)

where the θ is the estimated mobile position. Thus, we can get the maximum likely

mobile position by minimizing this function [76].

2.3.4 Cramer-Rao Lower Bound (CRLB)

Cramer-Rao lower bound (CRLB), named in honor of Harald Cramer [77] and

Calyam-pudi Radhakrishna Rao [78] who were among the first to derive it, expresses a lower

bound on the variance of estimators of a deterministic parameter. In the localization

literature [79,80], CRLB defines the lower bound on the precision of a localization that

one algorithm can reach. To calculate the CRLB for localization inside human body, we

define a performance evaluation scenario and models for the behavior of the localization

metrics mentioned above, the RSS and ToA, for RF signaling in between the GI tract

and the body-mounted sensors used for localization. In this section, we introduce a

gen-eral scenario for comparative performance evaluation of RSS and ToA based localization

for capsule endoscopy application. The scenario is designed to reflect the performance

in different organs, the path of movement of the WCE inside the small intestine, and

the number and pattern of installation of body mounted sensors on the torso. Since the

(40)

caused by the refraction at the boundary of organs and tissues inside the human body,

models for behavior of the RSS and ToA are fairly complicated.

Figure 2.4: _{A typical 3D pattern of body mounted sensors used as reference points}

of the performance evaluation scenario for localization of the WCE

Consider the WCE whose location is being indexed as 1 and m body mounted receiver

sensors denoted with indexes 2...m+ 1 as shown in Figure 2.4. Each receiver

sen-sor i is capable of measuring the ToA τi or RSS ri from the WCE. The observation

vector is X = |τ2...τm+1| for the ToA case or X = |r2...rm+1| for the RSS. Assume the localization coordinate of the WCE is θ1 = [x1, y1, z1], then our objective here is

to estimate the location of the WCE ˆθ1. The τi observation are modeled as normal

random variables fτi|θ₁,θi N(di,1|v, σ¯ 2

T), where di,1 is the distance between the WCE and receiver sensor i. ¯v is the average propagation speed of the RF signal inside the

human GI tract, and σT is the parameter describing the ToA ranging error caused

by human tissue non-homogeneity. The ri measurements are log-normally distributed

f_ridB|θ₁,θi N(Pr(dB), σ 2

(41)

at the reference distance from the WCE. α is the pathloss gradient and σ2sh is the

variance of the log normal shadowing.

The CRLB of ˆθ1 iscov(ˆθ1)>I(θ1)−1 is the Fisher information matrix (FIM)

Iθ1 =−E▽θ1(▽θ1lnι(X|θ1, θ)) =         Ixx Ixy Ixz Ixy Iyy Iyz Ixz Iyz Izz         (2.21)

whereι(X|θ1, θ) is the logarithm of the joint conditional probability density function:

Similar expressions can be extend toIyy, Izz, Ixy, IxzandIyz. The CRLB on the variance

(42)

σ2₁ =tr{covθ(ˆx1,yˆ1,zˆ1)} =min tr(cov(ˆθ1)) =tr(I(θ1)−1)

= (−Ixx(Iyy+Izz)+IxyIxy+IxzIxz.../(−IxxIyyIzz+IxxIyzIyz...+IxzIyyIzz)) (2.26)

2.4

Challenges of Localization inside Small Intestine

There are a number of fundamental multi-disciplinary scientific and technological

chal-lenges facing the RF localization of the WCEs inside the human body. To design an

accurate localization system for inside human body we need to consider the following

[4]:

• Modeling of the Movements of WCE inside the GI Tract

The first challenge for meaningful analysis of RF localization inside the human

body is to use clinical databases and clinical procedures performed by GI

special-ist, to model the movements of the endoscopy capsule inside the GI tract [81].

Previously acquired and stored databases of patients with approximately 55,000

images per patient could be examined for detection of landmarks or fixed points

such as the pylorus and the ileocecal valve [29, 82]. Using the location of these

landmarks, the number of images that observes the landmark, and the fact that

the images are taken at a rate of two frames /sec (recently released WCE can take

up to six frames / sec), we should design a model for the movements of the capsule

in the GI tract to be mapped into the hardware and visualization platform. In the

future, inertial sensing units that are small enough to be embedded in a pill size

(43)

the endoscopic capsule. This information could be used to enhance the movement

model provided by examining the images reported by the capsule. The improved

model for the movements of the WCE using inertial sensors would enhance the

RF localization result. The feedback controlled inertial sensors have been already

used to monitor the robotic end luminal system using magnetic field to efficiently

perform diagnostic and surgical medical procedures [33].

• In every localization technique, map always plays a very important role in terms

of refining the localization results [18,19]. Existing literature [20] reported that a

clear street map is able to reduce the GPS localization error from tens of meters

to several meters in the urban area. In case of the localization inside human body,

“map” is even more important since everything goes through the GI tract follows

the same route. Knowing a clear pattern of the intestinal tract will greatly enhance

the localization accuracy. Therefore, tracing the path of intestinal tract is essential

to the accurate capsule localization.

• Modeling of the Wideband RF Propagation from Inside the Human Body

The second challenge is to model the wideband characterization of the RF

propaga-tion channel between an endoscopy capsule and body-mounted sensors [83,83,84].

We could use measurements inside phantoms and on the human subject’s body

sur-face to calibrate existing software simulation tools for direct solution of Maxwell’s

equations inside the human body. We then could use the software to determine

the waveforms observed by a body-mounted sensor used as a reference point for

localization or another endoscopy capsule inside the tract that could be used for

cooperative localization purposes. Finally, it should be possible to design models

(44)

localization techniques) as capsules travel along the GI tract, to be used by the

CPS for performance evaluation and visualization .

• Design of Complicated Algorithms for Localization inside the GI Tract

The third challenge would be the design and comparative performance evaluation

of alternative localization algorithms and discovery of methods for visualization

of the results. For this part one needs to consider the use of channel models for

spatial and temporal variation of the signal, the model for the track of physical

movement of the capsule inside digestive system, and landmarks detected from

video frames of the endoscopy capsule camera [41,63,85, 86]. In addition to the

RF localization features, we may expect that these algorithms could exploit the

knowledge of pattern of movements and the visual data observed by the camera

inside the tract. The Cramer-Rao lower bound (CRLB) for the performance of

basic RSS and ToA based localization algorithms for capsule endoscopy are already

available in the literature [87]. We can use these bounds as a guideline for the

expected performance of the designed algorithms [88].

• Security and Reliability Issues

One last challenge in RF localization for WCEs would be to examine and where

possible quantify the security, reliability, and privacy of implantable WCEs in

human bodies. Here, there is an impending need to understand and analyze radio

propagation of signals from WCEs outside the human body at larger distances

where they may (a) cause interference (accidental or malicious) to the localization

of WCEs and or devices inside a human body (b) recovered by more powerful

devices towards identifying existence of such WCEs in specific patients. The former

(45)

the latter impacts the privacy of patients and the medical procedures that may be

conducted on the patients.

(46)

Design of Algorithms for

Body-SLAM

Since the RF signal suffers from the noisy characteristics of wireless channel and

multi-path distortions, it is natural to resort to other techniques to improve the overall

per-formance of the localization system. One way to enhance the perper-formance of RF

lo-calization is to combine the motion information of the capsule by employing a data

fusion algorithm such as Kalman filter [89] or particle filter [43,90,91]. In our previous

work [53], we have used both filters to integrate the RSS-based Wi-Fi localization and

the movement models from inertial sensors including accelerometers and gyroscopes for

cooperative robotic localization in indoor areas. The results were promising since this

method shown the potential to enhance the localization accuracy by combining data

from various sensor sources. However, as we mentioned previously, inertial sensors that

meet the accuracy requirement for the WCE applications are too large to be embedded

inside a video capsule and even if they can be embedded as the assembly technology

(47)

improves, the cost of the capsule will be increased dramatically. In the localization

lit-erature, there has been a trend to extract motion parameters from consecutive image

sequence to improve the accuracy of RF localization, which is known as visual based

Simultaneous Localization and Mapping (V-SLAM) algorithm [92–94]. In the WCE

ap-plication, since the endoscopic capsule continuously takes pictures with very short time

interval (up to 6 frames / sec), it is possible to extract the motion information of the

capsule by processing the video stream captured by the embedded vision sensor. This

motion information can be used as an alternative of inertial sensors to smooth the RF

localization results and meanwhile to reconstruct the trajectory that the capsule has

traveled in the same manner of V-SLAM for the indoor geo-location.

3.1

Formulation of Body-SLAM

For every location aware application, higher positioning accuracy can be achieved by

em-ploying hybrid techniques which take advantage of data fusion of different sensors [89].

Since the only two data sources come with the endoscopic capsule are video stream

cap-tured by the embedded vision sensor and wireless signal received by the body mounted

RF sensors, an intuitive idea to enhance the localization accuracy of the capsule is

through combination of the two [95]. As we mentioned before, the endoscopic capsule

continuously takes pictures at short time interval (2 - 6 frames / sec) as it travels. Thus,

it is possible to obtain information such as how quick the capsule moves and the

direc-tion of moving to track the posidirec-tion of the capsule. In this chapter, we present a novel

motion tracking algorithm for the endoscopic capsule by analyzing the displacements of

unique portion of the scene, which referred as feature points (FPs), between

(48)

points matching, image unrolling and quantitative calculation of motion parameters.

Detailed procedures of each step are explained in the upcoming subsections.

RF propagation simulation using Semcad X Algorithm design for RF localization Modeling the Speed of WCE using Endoscopic

Images

Algorithm design of Body-SLAM & performance evaluation

Map generation and visual tesbed design

(49)

3.2

Motion Tracking using Endoscopic Images

The movement of the endoscopic capsule is highly unpredictable. It may move fast,

slow, rotate and with any combination of the movements stated above. This complicated

movement of the wireless capsule creates great errors to the localization accuracy since

the Received Signal Strength (RSS) various a lots due to fast fading and sudden change

of antenna gain caused by flipping and rotating. Thus, knowing how the capsule moves

will help us to better understand the radio propagation channel inside human body and

therefore enhance the accuracy of the existing localization methods.

3.2.1 Analyzing the Content of Endoscopic Images

3.2.1.1 Image segmentation using SRM

To model the pattern of movements of the endoscopic capsule, we need to categorize the

endoscopic images first to get a conceptual idea how the capsule moves [85]. Based on

our observation, the received endoscopic images can be briefly categorized into two basic

categories: “facing the tunnel” (FT) and “facing the lumen” (FL). Two sets of typical

FT and FL images are shown in Figure 3.2. FT is the case when the focal axis of the

camera is parallel to the center of the intestinal tube. The major feature of this set of

images is always there would be a black hole (we call it “tunnel” here) somewhere in

the picture representing the vanishing line of the intestinal tube. Through a sequence

of consecutive FT images, we can clearly see the capsule either moves propelled by the

intestinal motility. On the contrast, FL is the case where the capsule tends to stop or

moves not as fast as in the FT. The reason why we do such classification is we are trying

to develop a geometric model for the FT images to quantitatively calculate the speed of

(50)

Fig.1 two sets of images (top: FT, bottom: FL)

ܴ ܴ

ᇱ

ቊ ܯ݁ݎ݃݁

݂݅ ܴത െ ܴ

തതത ൑ ඥݏ

ᇱ ଶ

ܴ ܳ

ݏ

ଶ

ܴ

ᇱ

ܳ

ܰ݋ݐ ݉݁ݎ݃݁

݋ݐ݄݁ݎݓ݅ݏ݁

ܴത

ܴ ݏ ܴ ܳ

ට ܳ ܴ

࣬

ோ

ߜ

Figure 3.2: _{Two basic categories of endoscopic images}

To distinguish the FT images from the FL images seems to be a fairly easy task for the

human eye, however, it has been proved to be extremely difficult for the machines.

Ma-jor sources of difficulties include highly complicated shape of the scene, various lightning

conditions and uncontrolled noise include liquid and bubbles inside the GI tract [85].

Given a set of labeled images, finding what is in common among each set and what is

difference between different set can provide inductive clues for classifier design. Some

normally used feature descriptors such like Histogram of Oriented Gradients (HOG) and

Local Binary Pattern (LBP) [96] doesn’t work well for our application since no

distin-guish difference can be found between the two image sets. Thus, in terms of image

representation, our approach is a region-based method. We used a Statistical Region

Merging (SRM) techniques described in [97] to segment the original image into several

sub-regions with each region represent an object. The basic idea of this technique is to

grow the major regions iteratively by combining smaller regions or pixels with

homoge-neous properties. A typical example of segmented FT image is shown in Figure3.3with

segmented regions shown in their representative colors. From Figure3.3 we can clearly

see that after the segmentation, the image preserved the tunnel shape (the darkest

(51)

Chapter 3. Design of Algorithms for Body-SLAM 36

will effectively reduce the variance in the feature space.

ࣘ

ࣘ ࢞

ࣘ

(

࢞

)

࢞

_࢏ࢀ

࢞

࢐

ܭ൫࢞

࢏

࢞

࢐

൯

ܭ൫࢞

࢏

࢞

࢐

൯

ࣘ ࢞

࢏ ࢀ

ࣘ൫࢞

࢐

൯

this will generate a support vector machine in an infinite

dimensional while do so in roughly the same amount of time it

If we measure the

margin by the kernel function and perform the optimization, a

Note that the boundary

࢝

ࢀ

ࣘ ࢞

ܾ

(15)

) into the above equation with replacing

࢞

ࣘ ࢞

෍ ߙ

_௜

ݕ

_௜

ࣘ ࢞

_࢏ ࢀ

ࣘ ࢞

ܾ ෍ ߙ

௜

ݕ

௜

ܭ ࢞

࢏

࢞

௡ ௜

ܾ

0

௡ ௜

(16)

ܭ ࢞ ࢞

ᇱ

݁ݔ ݌ ቆെ

ԡ࢞ െ ࢞

ᇱ

ԡ

ଶ

ߪ

ଶ

ቇ

(17)

ࣘ ࢞

Fig.7 Two sequences of segmented image with Q = 16 Top row: FT,

Figure 3.3: _{Two sequences of segmented image with Q = 16 Top 2 rows: FT, Bottom}

2 rows: FL

The region merging rule is following:

P(R, R′) =        M erge if|R¯−R¯′_|₆p_s2₍_{R, Q}_{) +}_s2₍_R′_{, Q}₎ N ot merge Otherwise (3.1)

where ¯R is the average value of a certain color channel inside region R, s(R, Q) is a

threshold function whose value is controlled by Q. Detailed expression of s(.) can be

found in [97]. A good threshold is to find balance between preserving the major

com-ponents of the scene and the risk of over merging. The choice Qcontrol the coarseness

of the segmentation: a large Qwill keep more detailed regions while a smallQtends to

merge the small regions. From the experimental point of view, we set the value ofQto

(52)

After merging, pixels inside each isolated region share a common color expectation while

the expectations between adjacent regions are different for at least one color channel.

Then, we can extract features out of the segmented images. To reach a good classification

performance, the feature selection must obey the following rule: choose the feature that

is more likely to appear in one set other than in the other set. This can be measured

by calculating the co-occurrence of similar instances from different sets with the same

label. Features that are more distinguishable may increase the precision of classification.

Nine features are selected to classify the images. They are the size of the darkest region

of the segmented image, length of the darkest region, length of the darkest region, RGB

value for the darkest region and RGB value for the remaining regions.

3.2.1.2 Image classification using SVM classifier

After feature extraction, the segmented images are classified using a Kernel Support

Vector Machine (K-SVM). The training data set are labeled in the following format

xi, yi

, where xi is a n×1 feature vector, each element of the feature vector is

composed by the feature we extracted from the previous section, since we extracted 9

features to represent an image, heren= 9, and yi∈

+1,−1

is the label of the image.

If the endoscopic image is FT, yi = +1, otherwise, yi = −1. Suppose we have some

hyperplanes which separates the positive from the negative examples, the pointsxthat

lie on the hyperplane must satisfy

wTx+b= 0 (3.2)

where w is a weight vector with the same dimension of x and b is a bias term, which

(53)

called “geometric margin”, can be expressed as follows:

wT_i +b

kwk (3.3)

Since the hyperplane expressed by Eq. 3.2 are identical after w and b are scaled by a

common constant, we can add a normalized restriction to this expression:

min|wT_i +b|= 1 (3.4)

Then, the optimal solution is the boundary that maximize the minimum distance which

expressed by Eq.3.3. By restriction of Eq. 3.4, this can be reduced to maximization of

1

kw_k.

The above equations are only applicable for the linear separable case. However, for our

application, since the content of endoscopic images from different sets sometimes share

similar features, the two set of images are not always linear separable. In another word,

a hyperplane that can perfectly classify the two image sets does not exist. Thus, we

need an approach that able to achieve nonlinear boundaries. Kernel mapping [98] is a

technique which is used to solve nonlinear separation data set. The basic concept of the

Kernel method is to map the vector xi to a higher dimensional space (possibly infinite

dimensional) and do the SVM in this higher dimensional space. Figure 3.4shows a one

dimensional example of kernel mapping. The transformed space should satisfy that the

distance is defined in the transformed space and the distance has a relationship to the

(54)

Fig. 5 Simple example of feature mapping using Kernel function

࢞

א

ሬሬሬሬሬԦ

ࡾ

࢔ ௠௔௣௣௜௡௚

ሱۛۛۛۛۛሮ

ࣘ

(

࢞

)

א

ࡾ

ሬሬሬሬሬԦ

ো

(14)

ࣘ

ࣘ ࢞

࢞

࢏ࢀ

࢞

࢐

ܭ൫࢞

࢏

࢞

࢐

൯

ܭ൫࢞

࢏

࢞

࢐

൯

ࣘ ࢞

࢏ ࢀ

ࣘ൫࢞

࢐

൯

࢝

ࢀ

ࣘ ࢞

ܾ

࢞

ࣘ ࢞

෍ ߙ

௜

ݕ

௜

ࣘ ࢞

࢏ ࢀ

ࣘ ࢞

ܾ ෍ ߙ

௜

ݕ

௜

ܭ ࢞

࢏

࢞

௡ ௜

ܾ

௡ ௜

ܭ ࢞ ࢞

ᇱ

݁ݔ ݌ ቆെ

ԡ࢞ െ ࢞

ᇱ

ԡ

ଶ

ߪ

ଶ

ቇ

ࣘ ࢞

Figure 3.4: _{Illustration of feature mapping using Kernel function}

x∈R~n mapping→ φ(x)∈R~m (3.5)

whereφis the mapping function. Since φ(x)is very high dimensional, it would not be

very easy to work withφ(x)explicitly. If we measure the margin by the kernel function

and perform the optimization, a nonlinear boundary can be obtained.

3.2.2 Feature Points Matching

For the FT images, the translation of the endoscopic capsule inside the small intestine

can be modeled as a tiny camera passing through a elastic cylindrical tube as shown

in Fig. 3.5. Since the WCE continuously takes pictures at a rate up to 6 frames/sec,

common portions of the scene may present between consecutive images [34]. These

portions of the images are called “feature points” (FP). The pattern and magnitudes of

the displacements of these feature points can be used as a hint to reveal the speed of

the endoscopic capsule.

To make an accurate estimation of the capsule’s speed, it is very important that the

FPs extracted from the reference (first) frame can be accurately located in the following

(55)

The Affine Scale-invariant Feature Transform (ASIFT) [99–101] defined by the affine

camera model in Eq. 3.7, is a perfect matching tool for the WCE images due to its

immune property to viewpoint changes, blur, noise and spatial deformations.

A=HλR1(Ψ)TtR2(Φ) =λ     cosΨ −sinΨ sinΨ cosΨ         t 0 t 1         cosΦ −sinΦ sinΦ cosΦ     (3.6)

whereRrepresents rotation and T represents tilt. Ψ is rotation angle of camera around

optical axis. Φ is longitude angle between optical axis and a fixed vertical plane. λ is

zoom parameter. Detailed procedure of FPs matching using ASIFT can be found in

[100]. An example of feature points matching is given in Fig.3.6(a), in which blue “o”

represents the coordinates of detected FPs in the reference frame, red “o” represent the

coordinates of matched FPs on the second frame. If we connect the corresponded FP

pairs on the same frame (as shown in Fig. 3.6 (b)), a bunch of motion vectors will be

generated representing the displacements of FPs between frames.

Intestinal wall

Wireless Capsule Endoscope (WCE)

LED light Lens

Field of view

(56)

(a) Corresponding feature points between two consecutive frames

(b) Formation of motion vectors

Figure 3.6: _{Feature matching between two consecutive images using A-SIFT}

3.2.3 Image Unrolling

To standardize the displacement of each FP pair and facilitate the quantitive calculations

of motion parameters that are useful for localization, we need to perform an inverse

cylindrical projection [102] (also referred as “image unrolling” in [63, 103]) to project

the original cylindrical image onto an flatten view coordinate system, which we called

“unrolled” image domain. As shown in Fig.3.7, given a pointP at distancedaway from

the camera, the angler depth ofP is defined as:

ܴ ܥ ܲ ߠ d ݂ ܲ ߟ ݎ

Cylindrical image domain

(ݔ_଴,ݕ_଴) ܲ

߶

Figure 3.7: _{Image acquisition system of WCE.}

θ=tan−1 R d (3.7)

(57)

whereR represents the radius of the intestinal tube. It can be seen from Eq.3.7that a

smaller angler depth indicates a larger distance away from the camera. To facilitate the

derivation of angler depth, we map the coordinate (x, y) of any point on the cylindrical

image plane to the unrolled image plane (x′, y′) by:

x′ = Lφ 2π y

′₌_r _(3.8)

where φ is the angle between point P and the horizontal axis in the cylindrical image

plane (shown in Fig.3.8(a)).

φ=tan−1 y−y0 x−x0 (3.9)

r is the radius of the circular ring associated with point P that can be calculated by:

r=p(x−x0)2+ (y−y0)2. (3.10)

L and H are length and height of the unrolled image plane respectively. Fig.3.8

illus-trates the procedure of image unrolling.

In this unrolled image plane, x′ _{axis represents the radian angle}_φ _{whose value ranges}

from 0 (whenx′ = 0) to 2π (whenx′ =L). y′axis represents angular depth which reflect the distance away from the camera. y′ = 0 represents a 0 angular depth and y′ = H

gives the maximum field of view η of the camera. As can be seen in Fig. 3.8(a), after

the mapping, the circular rings in the cylindrical image plane are stacked up vertically

in the unrolled image plane. Under this new coordinate system, the angular depth of

(58)

θ∼= y′ H η (3.11)

The angular depth obtained from Eq.3.11would facilitate the