Computational and Design Techniques for a Semi-Autonomous Computerized Dog-Training System with Timing and Accuracy Performance Comparable to a Professional Dog Trainer.

(1)

ABSTRACT

MAJIKES, JOHN J. Computational and Design Techniques for a Semi-Autonomous Computerized Dog-TrainingSystem with Timing and Accuracy Performance Comparable to aProfessional Dog Trainer. (Under the direction of David L. Roberts.)

Humans spend a great deal of time and effort to train dogs to perform specific tasks when

given a command. Areas of research, such as canine cognition and Applied Behavior Analysis, have been developed to aid in this dog training. From this body of knowledge and the

canine-human relationship, professional dog trainers have come to rely on timing, accuracy, and

repetition to place the desired task under stimulus control such that a dog provides the desired behavior only when the stimulus is given.

For example, after observing a dog raising its paw possibly to scratch, a trainer might

recog-nize this paw motion, exploit the dog’s innate ability to recogrecog-nize human gestures, and quickly offer positive reinforcement. The trainer continues shaping the behavior until it resembles a

hand shake, associates the behavior with a cue of “shake”, and places the hand-shake behavior

under stimulus control such that it’s only offered when the trainer says “shake”.

My research relies on the canine-human relationship, three fundamental animal training

concepts of timing, accuracy, and rate of reinforcement, and codifies the dog training

tech-niques of capturing, shaping, cue association, and stimulus control into a semi-autonomous computerized canine training system with timing comparable to, but more consistent than a

professional dog trainers. The difficulties in the system design are threefold: First, the system

must balance timely behavior recognition with sensor noise reduction. Second, the system must consistently recognize and reinforce the behavior when it is offered. Third, the system must

employ a reinforcement schedule that encourages the animal to repeatably provide the

behav-ior when requested. Training success requires the system to identify when the dog is attentive, offer the cue, and reinforce the behavior at a rate of reinforcement comparable to a professional

dog trainer.

The success of the system will be shown in three experiments and one pilot by using existing analytical techniques for noise/latency trade-off, for behavior recognition accuracy, and for

achieving a rate of reinforcement comparable to a professional dog trainer. Experiment 1 will examine accuracy and latency of the system. Experiment 2 will compare a professional dog

trainer to the system using a novel and an existing posture classification system. Experiment

3 will show some evidence of canine learning when using the system. And the Discriminative Stimulus Pilot will combine the experience from the first three experiments to demonstrate a

system that can begin to do discriminative stimulus training.

(2)

professional dog trainer. Professional dog trainers rely on years of experience and understanding

of behavioral signs to effectively communicate human requests and canine responses. A system that codifies a professional trainer’s knowledge and facilitates human-canine communications

has the potential to positively effect the lives of both canines and humans. Therefore the goal

(3)

(4)

Computational and Design Techniques for a Semi-Autonomous Computerized Dog-Training System with Timing and Accuracy Performance Comparable to a

Professional Dog Trainer

by John J. Majikes

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

Computer Science

Raleigh, North Carolina

2018

APPROVED BY:

Robert St. Amant Alper Bozkurt

Barbara Sherman David L. Roberts

(5)

DEDICATION

This work is dedicated my family.

First, this work is dedicated to my parents and grandparents who were my first teachers. Their lives didn’t afford them the opportunity of an advanced education. They gave up so much

so that we could have so much more.

I couldn’t be where I’m at in life without my brothers, who were always there for me. Thanks for being my enforcers, my protectors, and my friends. You guys were always there

when I needed you and sometimes when I thought I didn’t. Thanks for always having my back.

To my best friend and my world, Kim, you are my reason for living. This work is not so much dedicated to you; I am and everything I do is. Without you, life would be empty. Thanks

so much for putting up with so many things, not the least of which is this PhD.

To Jacob and Keri, you are the stars in the sky I look up to. You give the direction forward on all that I do. I am so in awe of the people you’ve become.

Thanks to Fran, MaryLou, Tim, Judy, Wendy, Pat, Joe, Kim, Mary, Annie, and Linda for

(6)

BIOGRAPHY

John was salutatorian of Hanover Area class of 1979 and graduated from Pennsylvania State University in 1983 with a B.S. in Computer Science. Although John always wanted to earn a

PhD, John decided to start a career with a large software company instead. While working full time, John received a Masters in Advanced Technology from Binghamton University on May

24, 1986, the same day Kim and John got married.

(7)

ACKNOWLEDGEMENTS

I’d like to thank Dr. Laurie Williams and Dr. Donald Bitzer for coming to Research Triangle Park University Day in 2007 to encourage people to go back to graduate school. It planted the

seed to finish the dream.

I’d like to thank Dr. Jerry Johnson, one of my undergraduate advisers, and the original PhD

seed givers. After 35 years being out of Penn State, he wrote my letter of recommendation to

North Carolina State University encouraging me to continue the dream. Jerry has offered me sage career advise over multiple decades.

I’d like to thank my first PhD adviser, Dr. Tao Xie, for taking me in, getting me to the U.S.

Food and Drug Administration Artificial Pancreas conference, and for getting me through my first defense. A special thanks to Drs. Boris Kovatchev, Marc Breton, Patrick Keith-Hynes, and

the team at the University of Virginia school of medicine for allowing me access to the Artificial

Pancreas software.

I’d like to thank Les Short, Jim Duncan, and Tony Bhe from my previous employer for

facilitating my graduate work within the constraints of the company. I’d like to thank Bob

Garrell and Paul Wentworth of Oracle Corporation for the flexibility to continue working while pursuing the PhD. Additionally, I’d like to thank Michele Carlo and Ren`e Cherny for technical

writing help throughout my graduate process.

I’d like to give a special thanks to my second adviser, Dr. Emerson Murphy-Hill. I would have certainly dropped out had he not kept me as an advisee until a path opened with Dr.

Roberts and the CIIGAR team. I’d also like to thank Dr. Vincent Freeh for his counsel during

this transition period.

Thanks to Dr. Alper Bozkurt, Dr. Barbara Sherman, Dr. Rob St. Amant, Dr. Rita

Brugaro-las, Dr. Pu Jerry Yang, Sherrie Yuschak, Sean Mealin, Katie Walker, and Marc Foster of the

CIIGAR team for the advice, input, and guidance through the whole process. A special thanks to Katie Walker, Rita Brugarolas, and Marc Foster for doing their best to see that I didn’t let

themagic blue smoke escape from the equipment nor electrocute others or myself while playing with hardware.

Sherrie, thanks for all the time spent explaining canine behavior to aleft-brainer. I hope I’ve picked up a percentage of what you explained to me. I’m especially thankful for your making the extra effort to complete the last experiment. Traveling a few days after just moving your

household to Ohio and immediately coming back to North Carolina mainly to perform my

last experiment is humbling to me. Your faith and effort to complete the project is very much appreciated.

(8)

and effort to participate in the experiments. Of course, Paws4Ever hosted us for a week and

gave us full access to their dogs, staff, and facilities. Doing long running experiments without institutions like Paws4Ever would be impossible. Also several dogs from the NC State College of

Veterinary Medicine (CVM) staff practiced with the equipment to help me test my code. From

that group, a special thanks goes out to Marlin and his human Dr. Jeffrey Applegate. Marlin could pick your pockets and take your keys if you’d let him. Don’t let him, he’d probably drive

your car away. He’s that smart.

Of course, many thanks to my adviser, Dr. David Roberts, for stepping in and taking me as an advisee. When I met Dave I had already spent nearly 6 years and two advisers on this

PhD dream. After taking me on, I still had a few bouts of leaving the program. Dave offered

(9)

TABLE OF CONTENTS

List of Tables . . . .viii

List of Figures . . . ix

Chapter 1 Introduction . . . 1

1.1 Background . . . 3

1.2 A Platform for Computer-Assisted Training . . . 4

1.3 Related Work . . . 6

1.3.1 Canine Cognition . . . 7

1.3.2 Animal Training Used In This Research . . . 7

1.3.3 Dog monitoring devices . . . 8

1.3.4 Automated dog training . . . 9

1.4 Experiment Outline . . . 10

Chapter 2 Canine considerations . . . 11

Chapter 3 Training Experiments: Timing, Accuracy, and Training Equipment 12 3.1 Hardware . . . 12

3.2 Harness Ergonomics . . . 13

3.3 Harness Accelerometer Position . . . 14

3.4 Classification Algorithms . . . 15

Chapter 4 Timing and Accuracy Criteria Experiments . . . 18

4.1 Experiment 1: Data Preprocessing . . . 19

4.1.1 Experiment 1: Materials and Methods . . . 19

4.1.2 Experiment 1: Data Preprocessing Setup . . . 19

4.1.3 Experiment 1: Data Preprocessing Results . . . 22

4.2 Experiment 2: Timing and Accuracy . . . 22

4.2.2 Experiment 2: Timing and Accuracy Setup . . . 24

4.2.3 Experiment 2: Timing and Accuracy Results . . . 26

4.3 Experiment 3: Training . . . 36

4.3.2 Experiment 3: Training Setup . . . 37

4.3.3 Experiment 3: Training Results . . . 38

Chapter 5 Discriminative Stimulus Control Pilot . . . 42

5.0.1 Discriminative Stimulus Pilot: Materials and Methods . . . 43

5.1 Discriminative Stimulus Pilot: Equipment . . . 44

5.2 Discriminative Stimulus Pilot: Protocol . . . 45

5.3 Discriminative Stimulus Pilot: Protocol Steps . . . 47

(10)

Chapter 6 Discriminative Stimulus Control Results . . . 50

6.1 Discriminative Stimulus Pilot: Maintaining Rate of Reinforcement with Shaping Protocol . . . 51

6.2 Discriminative Stimulus Pilot: TTCs as Shape Levels Change . . . 54

6.3 Discriminative Stimulus Pilot: Incorrect Button Pushes as Shape Levels Change . 56 6.4 Areas of Improvement . . . 57

Chapter 7 Broader Impact of This Dissertation . . . 59

References. . . 62

APPENDICES . . . 68

Appendix A Glossary . . . 69

Glossary . . . 70

(11)

LIST OF TABLES

Table 4.1 Number of sit, stand, and eat data instances along with their duration for the dogs used in Section 4.1’s data preprocessing and tuning of the moving average window size. . . 20 Table 4.2 Matrix of number sit and non-sit instances, FP and FN, and percentage

accuracy for the 16 dogs using each classifier in Experiment 2. . . 27 Table 4.3 The mean and standard deviation response latency for the RF and VT

al-gorithms and trainer in comparison to the mean of the three experimenters. Note that the entry for the trainer ∆t indicates the trainer’s timing rela-tive to the mean of the three experimenters during the evaluation period where the algorithm indicated in that row was also being evaluated. To compare the trainer’s timing to the algorithm’s, compare the values in the columns of that row. . . 30 Table 4.4 The number of sits, FP, FN, PR, and SE for each of the 16 dogs in

Ex-periment 2. . . 31 Table 4.5 A comparison of the VT algorithm and RF∗, replaying the data at 0.125 s

intervals (8 Hz). . . 34 Table 4.6 The number of sits, FP, FN, PR, and SE for each of the three dogs in

Experiment 3. The mean and standard deviation response latency for the RF and VT algorithms in comparison to the mean of the video analysis. . 38

Table 5.1 Block’s success criteria. . . 46 Table 5.2 Random stimulus delay ranges for different shaping levels . . . 49

Table 6.1 Paired t-test P value comparison of average response latency between shap-ing levels . . . 52 Table 6.2 Number of TTCs per minute block per shape level . . . 54 Table 6.3 Percentage incorrect button pushes per TTCs, minute, and shape level . . 56

(12)

LIST OF FIGURES

Figure 1.1 The Computer-Assisted Training platform, including a smart harness with IMUs, a laptop with algorithms for posture detection, and a computer-controlled treat dispenser to reinforce desirable behavior. . . 5

Figure 3.1 A plot of x-axis acceleration from two different IMUs for five different postures. . . 15

Figure 4.1 Number of FN over varying moving average window sizes. Arrows show where the lowest FN occurred with the smallest window size. . . 21 Figure 4.2 Percent accuracy instance-based classification over varying moving

av-erage window sizes. Arrows show where the highest percent instance-classification occurred with the smallest moving average window size. . . . 21 Figure 4.3 The NC State Canine Facility where Experiment 2 took place. Shown on

left is the exercise pen in foreground, video recorder at top of the picture, and the table for experimenters. The dog owner area is in an adjacent room. Shown on right is a schematic of the facility. . . 25 Figure 4.4 Bar chart showing the mean time difference and one standard deviation for

the 16 dogs for the RF classifier, VT classifier, and the trainer with respect to the three experimenters. Negative values for VT and Trainer indicate detection of sit prior to the experimenters, and the narrower standard deviation for the VT algorithm shows a consistency improvement from the trainer. . . 28 Figure 4.5 Posture-based classification sensitivity, the percentage of sit postures

cor-rectly classified, T P ÷(T P +F N). . . 32 Figure 4.6 Posture-based classification precision, the percentage of sit classifications

that are correct,T P ÷(T P +F P). . . 32 Figure 4.7 Posture-based classification sensitivity improvement by requiring two

con-secutive sit instances for sit posture classification sensitivity. Note that there was no improvement in VT classification sensitivity since it was already at 100. The improvement was with respect to RF* (with data replayed at 8 Hz). . . 35 Figure 4.8 Example time line of instances where a 2Sit classification would reduce FN. 35 Figure 4.9 Posture-based classification PR improvement by requiring two

consecu-tive sit instances for sit posture classification. The improvement was with respect to RF* (with data replayed at 8 Hz). . . 35 Figure 4.10 Bar chart showing the mean time difference and one standard deviation

for the 3 dogs for the RF and VT Classifiers with respect to the video analysis. . . 39 Figure 4.11 The cumulative number of sits offered by each dog over time. . . 40 Figure 4.12 Orientation of Dogαto experimenter vs. treat dispenser grouped by thirds

of the experimental trial. . . 40

(13)

Figure 6.1 Average response latency per shaping level . . . 52

Figure 6.2 Dog17 Session 1 average response latency . . . 53

Figure 6.3 TTCs per minute for each shaping level . . . 55

Figure 6.4 Percentage incorrect button pushes per minute per shape level . . . 57

Figure B.1 Left to right Dogα and Dogβ. . . 79

Figure B.2 Left to right Dogγ and Dogδ. . . 79

Figure B.3 Left to right Dog01 and Dog02. . . 79

(14)

Chapter 1

Introduction

As long as dogs have been domesticated, humans have spent time and effort to train them to perform specific tasks when given a command. Canine cognition research has shown that dogs,

possibly through this domestication, have an innate ability to recognize human behavior and

social cues [29]. Research in Applied Behavior Analysis, the study of how behavior changes in animals, suggests the most effective approach to animal training emphasizes the use of

positive reinforcement [32], which involves giving something the animal desires such as food to

reinforce the sought-after behavior. Based on this research and thousands of years cohabitation, dog training is the process by which dogs learn to perform specific, desired behaviors. One

common training procedure is to allow a dog to produce a desired natural behavior, such as

sit. Then, the dog is immediately given a food treat as a primary reinforcer, delivered by the human trainer. Over time, the trainers may use a clicker sound immediately after the

sit that will act as a secondary reinforcer or promise of the primary reinforcement that will

be forthcoming. With repeated training, the reinforced behavior (sit) occurs with increasing frequency. Then, the trainer gives the auditory cue, “sit”, immediately before the behavior and an association between the cue and the behavior is established. A more complex, formal,

real-world, professional animal training example might be when a gorilla sticks its arm out of its cage the trainer might recognize the behavior, provide a treat as a reinforcer, shape the

behavior so that the arm is extended palm up, associate the behavior with a cue of “arm”, and place the desired behavior under stimulus control so that when a veterinarian needs to draw blood she says “arm” and the gorilla voluntarily and quickly extends its arm through the bars. The cue of “arm” becomes the stimulus or antecedent for the animal to present the behavior in anticipation of the consequence or reinforcement. The complexity of placing a behavior under stimulus control requires that the gorilla not only stick its arm out when the stimulus “arm” is given, but also that the gorilla’s arm does not come out when the stimulus is not given.

(15)

research suggests that dogs have the ability to understand human intent [8]. Respect for canine

cognition enhances dog training by leveraging the dog’s ability to learn. For example, training a guide dog to avoid an obstacle is useless unless the dog’s spatial understanding can be leveraged

to include the space needed for the human at its side. Cognitive research has shown that inexpert

dogs have the ability to navigate a human around an obstacle course [49]. Dog training must not only understand behavior training but also respect cognitive learning of the animal.

Training is often slowed due to inadequate knowledge or skill on the part of the trainer.

Fundamental concepts that affect the animal’s performance arecanine cognition,timing, consis-tency, andrate of reinforcement of the desired behavior [4]. Human trainers, especially novices, fail to recognize non-verbal canine communication and often struggle to provide timely and

con-sistent reinforcement when teaching animals. Novices may reduce or inhibit the learning process by conducting training when the dog is stressed or distracted. Indiscriminately reinforcing

mul-tiple behaviors or failing to recognize correct behaviors are examples of inadequate training

criteria that can reduce the rate of reinforcement and can slow down the learning process. Re-inforcing the wrong behavior is an example of inadequate training criteria that can teach the

animal to do an undesirable action. Timing is critical to both reinforcement and stimulus

con-trol. Reinforcement is most effective when delivered as close as possible to the desired behavior; a delay greater than 0.5 seconds can retard a dog’s ability to create the association between the

antecedent, the behavior, and the consequence [69]. Stimulus control requires timely recognition of when the dog is attentive to the stimulus, or cue, so that a high rate of reinforcement can be

achieved.

Computers, as compared to human trainers, excel at extremely accurate timing and main-taining consistent criteria and could be used to enhance the process of training animals and

by reducing human error, might reduce ambiguity and improve the human-canine relationship.

Thus, human-animal-computer interactions could enhance our inter-species relationships with the animals we live or work with [40]—a computer training system would improve

communi-cation and benefit the dog. In addition, a semi-autonomous training system could be used in

animal shelters to train dogs to provide beneficial stimulation [41] and to enhance adoptability. When introducing shelter dogs to potential owners, canine cognition research has shown that

dogs are more likely to be adopted if they sit or stand in response to and show interest from

the potential adopter [66].

My research relies on the canine-human relationship, three fundamental animal training

con-cepts of timing, accuracy, and rate of reinforcement, and codifies the dog training techniques of

capturing, shaping, and stimulus control into a semi-autonomous computerized canine training system with timing comparable to, but more consistent than a professional dog trainers. The

difficulties in the system design are threefold: First, the system must balance timely behavior

(16)

re-inforce the behavior when it is offered. Third, the system must employ a rere-inforcement schedule

that encourages the animal to repeatably provide the behavior when requested. Training success requires the system to identify when the dog is attentive, offer the stimulus, and reinforce the

behavior at a rate of reinforcement comparable to a professional dog trainer.

The success of the system will be shown in three experiments and a pilot by using existing analytical techniques for noise/latency trade-off, for behavior recognition accuracy, and for

achieving a rate of reinforcement comparable to a professional dog trainer. Experiment 1 will

examine accuracy and latency of the system. Experiment 2 will compare a professional dog trainer to the system using a novel and an existing posture classification system. Experiment

3 will show some evidence of canine learning when using the system. And the Discriminative

Stimulus Pilot will combine the experience from the first three experiments to demonstrate a system that can begin to do discriminative stimulus training.

Since training relies on the canine-human relationship, a computer system cannot replace a

professional dog trainer. Professional dog trainers rely on years of experience and understanding of behavioral signs to effectively communicate human requests and canine responses. A system

that codifies a professional trainer’s knowledge and facilitates human-canine communications

has the potential to positively effect the lives of both canines and humans. Therefore the goal of this work is a system that facilitates canine training, algorithmically encapsulates training

techniques, and leverages the advantages of computers: timing and consistency.

Our research improves the human-canine relationship by codifying the Applied Behavior

Analysis research and canine training expertise into a semi-autonomous computerized canine

training system of hardware and software. This document describes the comparison of a pro-fessional dog trainer to a system that respects canine cognition and uses timing, accuracy, and

rate of reinforcement to begin placing a behavior under stimulus control.

1.1 Background

Since this research involves multiple, technical disciplines and pets are commonly found in a

majority of households in the United States [3], a short description of three technical terms along

with some of their more common usage will be presented to facilitate a common understanding of this research by a diverse audience.

Stimulus control occurs when an animal perceives a stimulus, produces a behavior, and

receives reinforcement. B.F. Skinner’s research, among others, famously used light stimulus and pigeons pecking at buttons to describe stimulus control [61]. Behavior research also uses

the terms antecedent, behavior, and consequence. Novice dog trainers might use the terms cue, behavior, and reward. This research and others use Three-Term Contingency (TTC), to

(17)

In addition to the terminology used, an important point to understand is that there are two

parts to stimulus control. The first part is that the behavior is presented by the animal when the stimulus is given. The second part is that the animal should not present the behavior when

the stimulus is absent. Due to the difficulty of training stimulus control with a large number

of dogs, this research presents a pilot which uses a small number of dogs and shows automated training of the first part of stimulus control, which is that the animal presents the behavior

when it perceives the stimulus.

The Matching Law of Behavior is a guiding principle of animal training that dictates behaviors are influenced by how often they are reinforced. Studies have shown that if an animal

is presented with two behavior choices, such as two buttons to press, the animal presents

the behaviors directly proportional to the amount of reinforcements given for each button [5]. If the rates of reinforcement for the behaviors change, the animal will change its behaviors

proportionally [31]. Given the Matching Law of Behavior, a desired behavior, and extraneous

behaviors, it is important for this research accurately detect and reinforce only the desired behavior. If our research incorrectly detects and reinforce extraneous behaviors, the animal will

not be trained to present the desired behavior.

Although the goal of this research is semi-autonomous dog training, safety is the utmost concern. To ensure the safety of both canines and humans, review and approval of all

experi-ments was completed by a university animal oversight committee. During all interactions with client owned dogs, a Veterinary Behavior Technician (VBT) and the client were present.

Throughout this research the term VBT is used to describe the professional dog trainer

pri-marily responsible for observing the dog’s behavior for signs of stress, discomfort, or behavioral inhibition. For all experiments, the VBT or the client, but not the researchers, handled the

dogs. The VBT and the client were present and helped ensure the safety of everyone involved

in these experiments.

1.2 A Platform for Computer-Assisted Training

To realize computer-assisted training, we’ve developed both hardware and software systems

that provide a real-time view of canine postures and the capability to reinforce those postures with minimal human input. The system consists of a smart harness, which is a custom harness

outfitted with Inertial Measurement Units (IMUs) on the back and chest area of the dog, a base

unit running classification algorithms, and a remotely operated treat dispenser. To close the loop and provide reinforcements for the desirable postures, remotely-operated treat dispensers

are triggered based on the output of the classification algorithms. At this point in time the system is not fully-autonomous since human input is needed to train the posture recognition

(18)

Figure 1.1: The Computer-Assisted Training platform, including a smart harness with IMUs, a laptop with algorithms for posture detection, and a computer-controlled treat dispenser to reinforce desirable behavior.

is not meant to replace the human trainer but facilitate canine training, the human will be

present during all experiments. Figure 1.1 is an overview of our system used for recognizing and reinforcing postures.

Drawing on knowledge of Applied Behavior Analysis and canine cognition, we identified the

followingdesign criteriafor our system:

1. Canine Learning: Research has also shown that animals pick up on many unseen human cues [24], and these cues may be a fundamental reason why the dog presents the desired

behaviors. For example, canine cognition research has shown that dogs recognize human

social cues such as gazing and staring [14][28][29][33]. Therefore, during all experiments there will be a fundamental respect for canine cognition by ensuring that the dogs can get

acclimated to the test environment and that a human trainer is always present to recognize

any signs of stress, discomfort, or behavioral inhibition. Respecting the possibility that the dogs may pick up unseen human cues, even when the system is controlling an experiment,

the human trainer will be present with the same posture, position, and equipment used

with trainer controlled experiments. All experiments will be designed to fundamentally respect and utilize canine cognition.

2. Timing: Research in Applied Behavior Analysis has demonstrated the significance of the

delay between a behavior (posture change in our case) and the delivery of reinforcement

(conditioned or primary) [15]. As a general rule, it is known that the shorter the latency between the desired behavior and the delivery of the reinforcement to the animal, the

(19)

sig-nificantly hinders learning in dogs [69]. Therefore, our goal is a 0.2 second delay in our

sys-tem. Our efforts are detailed in Experiment 2 and Experiment 3 in Sections 4.2.3 & 4.3.3, respectively.

3. Consistency: The Matching Law of Behavior [18] is a guiding principle of animal training

that dictates behaviors are influenced by how often they are reinforced. Studies have shown

that whether behaviors are desired or undesired, they are offered at a rate similar to the rate of reinforcement [5][31]. It follows that it is critically important for our system to

be as accurate as possible, so as to maintain consistent criteria for what is reinforced.

Operationally, this translated to a goal of having as few erroneous (either additions or deletions of) posture recognitions as possible.

4. Reinforcement: When and how a reinforcement is delivered is called the reinforcement

schedule. A continuous reinforcement (CRF) schedule, where the desired behavior is re-inforced each time it occurs, is best used during the initial stages of learning [42]. A

CRF schedule combined with accurate recognition of behaviors can increase the rate of

reinforcement which is one indication of learning. Although rate of reinforcement is de-pendent on things other than just the reinforcement schedule, for example consistency

and the animal’s behaviors, our goal during the Discriminative Stimulus Pilot, detailed

in Chapter 5, is to use CRF schedule and have an increased rate of reinforcement as one indication of learning.

In general, all four of these criteria are related. Association of behavior and reinforcement will not occur when the dog is not attentive. Speed and timing of detection of a posture is

influenced by the tolerance of the algorithms to data noise which is dependent on sensor location

and ergonomic harness considerations for the dog. Consistency and accuracy are affected by the detection speed which influences the response strength of stimulus-to-response latency. A

decreased response latency with consistent and accurate reinforcement provides a high rate of

reinforcement. Providing a high rate of reinforcement that accurately reinforces only the desired behavior is an indication of learning a behavior.

Throughout the remainder of this paper, we will detail the hardware, ergonomic, and

soft-ware designs we created in order to realize a computer-assisted training system that meets the needs of dogs as defined by these criteria and begins placing a behavior under stimulus control.

1.3 Related Work

Since the human-canine relationship has existed for centuries [59], much research exists on

dog training. Only dog training techniques and devices related to training automation will be

(20)

1.3.1 Canine Cognition

In the last century, animal training attempted to show intelligence similar to humans. Clever

Hans was a horse trained by his mathematician owner to count, perform arithmetic, and tell

time [24]. Clever Hans was advertised as a “thinking” horse that would tap out answers to various questions posed by his owner and others. Years later it was discovered that Clever Hans

was actually reading the face of the humans to understand when to stop tapping. By attempting

to show intelligence similar to humans, Clever Hans actually showed animal social intelligence. Counting intelligence tests using animals such as chimpanzees [6] and canines [38] have

continued even into this century. But cognition research has diverged into the effects of

domes-tication skills such as recognizing, remembering, and applying information [28][29]. Over the last few years, there has been an increase in the amount of research involving domestic dogs’

cognition versus wolves, foxes, and other canidae [46].

For example, wolves who are raised by humans are not as good at understanding human

gestures at finding food as are domesticated dogs [28]. Domesticated dogs are not only more

likely to follow human gestures than apes who rely on hand gestures for conspecific communica-tion [8], dogs also follow human gaze when making decisions [14]. In fact, domesticated dogs first

infer from human social gestures and if they are not available will use their own inference [20].

The design of these experiments assumes that the dogs have an innate understanding of human gestures and gazes.

1.3.2 Animal Training Used In This Research

Training animal behaviors using operant conditioning divides the consequences into four

cate-gories: positive reinforcement, negative punishment, negative reinforcement and positive pun-ishment [42]. Positive reinforcement is based on the theory that a behavior is strengthened when

it is followed by a pleasant stimulus, for example, after a dog rolls over he’s offered a treat.

Negative punishment is based on the theory that a behavior is weakened if followed by the removal of a pleasant stimulus, for example, when a dog barks for attention the human turns

around and ignores the undesired behavior. Negative reinforcement is based on the theory that a behavior is strengthened if followed by the removal of a unpleasant stimulus, for example,

when a dog yields to the pull of a leash. Positive punishment is when a behavior is followed by

a unpleasant stimulus, for example, a barking dog receives a shock from a shock collar. This research focuses on the most effective and humane for dog training, positive reinforcement [32].

Positive reinforcement dog training derives from the fact that humans, as supposedly the

more intelligent species, should be able to get dogs to voluntarily offer the desired behavior [47], recognize the behavior, and immediately offer a reinforcement. Punishment, such as shock

(21)

positive punishment are used as sparingly as possible. In essence, positive reinforcement dog

training uses pleasant stimulus or reinforcements to facilitate translation of human requests into canine action [43].

A clicker is a popular device used to communicate that a primary reinforcement is

emi-nent [21, 54, 55]. After repeatedly creating the click sound whenever the primary reinforcer or treat is being offered, the dog associates the click as a secondary reinforcer. Clicker training

as a form of positive reinforcement has been used to train dogs to find electronic storage

de-vices [58] and detect hypoglycemic and hyperglycemic attacks in humans [57]. For dogs that are motivated by play, a primary reinforcer of throwing a tennis ball can also has been used to

train bomb sniffing dogs [37], drug detection dogs [1], and colorectal cancer detection dogs [63].

1.3.3 Dog monitoring devices

Commercial devices are available that provide the first step of positive reinforcement training,

recognizing or monitoring a behavior. With accelerometers and gyroscopes, the TailTalk

Fitbit-type band placed on a dog’s tail translates tail wag speed and movement into emotions that are sent to your smart phone [72]. The PetPace monitors canine health by sending heart rate,

temperature, and respiration in real time to your smart phone [34]. The Nikon Heartography also

monitors canine heart rate but instead of sending data to your smart phone, the Heartography camera takes pictures when your dog’s heart rate peaks [2]. The Heartography communicates

to you what excites your dog.

Research studies have also been creating devices that extend canine physiology monitoring beyond that of commercial devices. A research study demonstrated that cardiovascular

activ-ity monitoring similar to photoplethysmogram (PPG) and electrocardiogram (ECG) could be

provided in a non-invasive way [9]. Extending these health and activity monitors wirelessly provides handlers with continuous access to the dog’s welfare [10]. Wireless access to health

monitors would be helpful for search and rescue dog handler while the dogs work in potentially

hazardous environments such as disaster sites.

To progress beyond monitoring to post-processing classification, a research study developed

a network of devices demonstrating the ability to do off line classification [12]. The researcher

also justified key locations for sensors to provide classification [13] and key analysis of the sensor data’s impact on algorithmic classification [11]. Other research has generalized classification

algorithms that do not require a preprocessing classification training data set. This research

allows the system to identify common postures or behaviors without having the dog model the behavior [67].

(22)

treat dispenser. For lengthy research studies that require multiple trainers, automated treat

dispensers would provide consistency and mitigate the reinforcement timing differences be-tween the trainers [70]. For studies that require the removal of a trainer due to the possibility

of distraction or influence, an automated dispenser can be initiated remotely after a correct

behavior [27]. Other studies which required the analysis of the dogs behavior when humans aren’t present, like separation anxiety behavior of barking, chewing, or urinating, have used

automated treat dispensers [53].

All these advances in dog monitoring devices, algorithms, and research provides a foundation for automated dog training.

1.3.4 Automated dog training

The use of wearable IMUs has become increasingly popular. Studies of these units on humans use a wide variety of machine learning algorithms to classify postures [52]. There has been work

in applying similar classification techniques using sensors on animals [17, 48, 68]; however,

these studies have primarily focused on monitoring activity levels of animals, not identifying specific activities [36]. Our research differs from this work in that the system is designed for

computer-assisted training, and therefore closes the loop from posture recognition to feedback.

Additionally, our system meets a multi-criteria performance objective: that accuracy is equally important as response latency.

Posture recognition uses inertial sensors to classify the activity the dog is currently

perform-ing. In one study, recognition was done in real-time; however the positioning of the sensors were placed inefficiently for posture recognition [56]. Researchers identified positions for sensors on a

dog in order to optimize for posture classification accuracy (without attention to latency) [12].

In other work, accelerometer data was used to identify when and for how long canines exhibited a total of seven static postures and dynamic behaviors, however the recognition was not done

in real-time due to the reliance on data that needed to be manually extracted from video, and

the need to connect to the sensors to get the data [23]. Previous work on the evaluation of a machine learning algorithm based on a two-stage cascade classifier used raw sensor data [13] to

accurately recognize five static postures and three dynamic behaviors in near real-time [11].

Automation for training discriminative stimulus has been done previously, most famously by B.F. Skinner [44]. But this training was done using small animals in a very confined and

controlled environment called a “Skinner box”. Some automation of training discriminative stimulus for canines also has been done. After Diabetic Alert Dogs (DAD) have been trained to detect hypoglycemic and hyperglycemic samples, testing of the hypoglycemic discriminative

(23)

as a conditioned reinforcer and a toy as a primary reinforcer whenever the proper odorant was

selected [26]. These canine training systems were also not automated as a human actively gave commands to retry when a False Positive (FP) detection was made. As will be demonstrated

later, our system requires some human involvement in data labeling, but our research differs

from this type of automation in that we believe we are the first to automate the discriminative stimulus training.

1.4 Experiment Outline

The system will use three training experiments described in Chapter 4 and one pilot described

in Chapter 5 to validate that our system can codify canine training expertise into a

semi-autonomous computerized canine training system, comparing it to a professional dog trainer, and begin placing a behavior under stimulus control. Experiment 1 uses a small number of dogs

to compares our novel classifier with an existing classifier and uses data smoothing to empirically

optimize the smart harness sensor data to minimizing classification latency while maximizing classification accuracy. Experiment 2 builds on Experiment 1 by using a larger number of dogs

to compare a professional dog trainer to the two posture classification algorithms. Experiment

3 uses both classifiers to demonstrate some evidence of canine learning. The Discriminative Stimulus Pilot in Chapter 5 consolidates the previous experiments, uses the novel posture

classification algorithm, and demonstrates that the system begins to place a behavior under

(24)

Chapter 2

Canine considerations

All experimental procedures were approved by the Institutional Animal Care and Use Com-mittee (IACUC) of North Carolina State University (NC State). The IACUC is a federally

mandated committee, qualified through the experience and expertise of its members, that

over-sees its institution’s animal program, facilities, and procedures. An IACUC approved protocol ensures that research procedures with animals will avoid or minimize discomfort, distress and

pain to the animals, consistent with sound research design, and that the welfare of the animals

used in a specific research protocol will be treated humanely, according to mandated guidelines. Additionally, in accordance with Mancini’s Animal-Computer Interaction (ACI) Manifesto [40],

the dog’s comfort and psychological well-being were the most important design criteria. Any

system we developed first and foremost was designed to “Protect both human and nonhuman participants from physiological or psychological harm at all times....” [40]. Accordingly, our design process included a careful evaluation canine cognition and of the ergonomics of the

hardware platform, including weight, position, and size. We iteratively refined the design by working with a veterinary technician specialized in animal behavior to identify any signs of

stress, discomfort, or behavioral inhibition.

Section 3.2 describes the iterative steps taken to make the harness and sensors as comfortable for the timing, accuracy, and training experiments. Section 5.4 describes the safety protocol

(25)

Chapter 3

Training Experiments: Timing,

Accuracy, and Training Equipment

©2016 ScienceDirect. Adapted with minor modifications and with permission from J. Majikes, R. Brugarolas, M. Winters, S. Yuschak, S. Mealin, K. Walker, P. Yang, B. Sherman, A. Bozkurt, and D. L. Roberts, “Balancing Noise Sensitivity, Response Latency, and Posture Accuracy for a Computer-Assisted Canine Posture Training System” Int. J. Hum. Comput. Stud., 2016.

Figure 1.1 shows the Computer-Assisted Training platform used in the timing, accuracy,

and training experiments described in Sections 4.1, 4.2, and 4.3. The remainder of this chap-ter describes the canine harness and the sensors attached to it, the smart harness ergonomic

considerations, and the base unit laptop that implements our posture classification algorithm.

3.1 Hardware

The main part of our system is the smart harness, the development of which has been detailed

in our prior work [7]. Based on those results, we made several improvements to the previous

smart harness that are described Section 3.2. For continuity, we provide a brief description of the previous harness before describing the improvements.

To handle sensor information and communications, the harness is equipped with a small

BeagleBone Black (BBB) computer. The BBB includes a 1-GHz processor from Texas Instru-ments (TI; AM3358BZCZ100), 2 Gbytes of on-board flash storage, and 512 Mbytes of DDR3

RAM. It also includes up to 65 general-purpose input/output accessible pins, eight pulse-width modulation (PWM) channels, and eight channels of 12-bit analog to digital converters and 2

digit I2C serial buses, which are used to interface with the two IMUs. The two IMUs each

included a three-axis accelerometer (LSM303) and three-axis gyroscope (L3GD20H) and were configured to produce readings at 10 Hz. The BBB runs Ubuntu GNU/Linux, giving access to

(26)

communica-tion link is IEEE 802.11, and most of the communicacommunica-tion to and from the base stacommunica-tion is done

using User Datagram Protocol (UDP) to increase speed. We included a watchdog program that is responsible for monitoring the wireless connection and other services, and restarting them if

they encounter an error or fail to respond.

The base unit in Figure 1.1 is used to collect all sensor data, classify the postures, and record all experiment information. The unit is a LenovoThinkPad W530 with an Intel Core i7-374QM

2.7 GHz processor, 8GB RAM, running 64-bit Windows 7. (The base unit was upgraded to

Windows 10 before starting the Discriminative Stimulus Pilot discussed in Chapter 5.) The base unit processed the sensor data, took human input for labeling sensor data, ran algorithms for

determining posture (described below), and provided data logging capabilities. We integrated a

laser pointer (with the laser disabled) to function as a hand-held remote-control to communicate a trainer’s interpretation of posture changes. The laser pointer allowed our trainer to operate

much like they would using a clicker (a behavior marker and also a conditioned reinforcer [54]).

Using the remote, we can log their timing data for comparison to our algorithms.

3.2 Harness Ergonomics

In prior work we modified a Lift Load Carry Harness from Ray Allen Manufacturing (SKU RA36MHL-P) to include hardware for sensing and communications. The smart harness has

three straps, one around the neck, one around the chest, and one around the belly. It also contains Velcro for easy attachment of accessories. Before beginning the experiments the smart harness from the prior research was modified to be more compatible and comfortable for

the range of dogs used in the testing. Considering that many dogs of varying age, weight, and

physical ability should be able to benefit from our efforts, we modified the harness to be more comfortable and less bulky. The fitting of the harness for each dog included a slow observation

and introduction period under the observation of a professional dog trainer with more than 15

years of experience with the following credentials: a veterinary technician specialized in animal behavior, credentialed as a Certified Professional Dog Trainer-Knowledge Assessed

(CPDT-KA), a Veterinary Behavior Technician (VBT), a Registered Veterinary Technician (RVT), a

Veterinary Technician Specialist (VTS) in behavior. All dogs became comfortable within several minutes using this introduction method.

One important consideration was the weight of the harness. During discussions with the

professional dog trainer we learned that untrained dogs can safely carry between 5% and 10% of their body weight. The harness from the prior research contained two large batteries (0.3 kg

each), one on each side of the harness. The original intent was to have the batteries weight down both sides of the harness to maintain a more stable position for the IMUs. This harness

(27)

of the algorithm used for the timing, accuracy, and training experiments, we found that the

classification algorithm could tolerate small movement of the harness such that a single 0.1 kg battery could be centered on the harness and still provide sufficient stability for the experiments.

The finalized harness weighed 0.7 kg.

In addition to the weight and size, in previous research a double-pass steel buckle attached the harness around the dog’s neck. Adjusting the buckle was slow and the process made some

dogs uncomfortable, so we fixed the buckle in place. To wear the smart harness the neck strap

was simply placed over the dog’s head. Using some pilot dogs owned by students and faculty at NC State, additional ergonomic testing of the harness showed that the pilot dogs backed away

when the harness was above their head. To accommodate this perceived discomfort, the original harness double-pass buckle was modified to use a Velcrofastener around the neck that opened quickly and secured more easily than the original. Using the Velcro fastening neck strap, the dogs showed no backward stepping in any of the experiments. Providing a more comfortable

smart harness could only facilitate the canine cognition process.

3.3 Harness Accelerometer Position

When developing wearable technologies for canines there are a limited number of practical sensor sites. For example most dogs will not tolerate sensors strapped to their paws, tail, or

around their snout. Because comfort for the dogs was a primary concern for this work, we

limited our investigation to areas of the body the professional dog trainer indicated would be least likely to cause discomfort.

Previous research [11, 12] showed how we optimized inertial sensor sites by considering the

kinetics of the canine to identify independently moving locations on the body. Four locations were tested: the chest of the animal, the abdomen, and two locations at the back; one close to

the head (around withers) and the other close of the tail (around rump) of the animal.

Since accelerometers measure not only the dynamic acceleration due to actual motion, but also the static acceleration, which corresponds to the projection of gravity over the axes of the

sensor, different postures may result in similar data depending on how the IMUs are aligned

with respect to the dog’s body and gravity, thereby requiring a multi-sensor measurement to assess the posture of the dog accurately.

In order to select the locations we looked at the angle of change along three axes of

ac-celerometer data for different postures relative to the baseline standing posture. For example as it was suggested by others [56], the two sensors on the back provide very similar sensor data

due to both sites moving together during most postures.

The optimal locations for IMUs leading to larger angle changes between postures were

(28)

Figure 3.1: A plot of x-axis acceleration from two different IMUs for five different postures.

the rump. These location fit our ergonomic design goals.

3.4 Classification Algorithms

The job of the classification algorithms is to turn the IMU readings coming from the harness at 10 Hz into labels of postures as quickly and accurately as possible. After experimentation

on posture detection in earlier work, it was determined that a Random Forrest (RF) classifier

produced extremely accurate results [11, 12, 13]; however, that work focused solely on accuracy, and did not account for latency. Accordingly, we chose to use the RF classifier in this work

because it has been highly-accurate in the past and, due to its simplicity (relative to other

machine learning algorithms), it is efficient.

The design criteria of classification timing and accuracy must be balanced. Noise inherent

in the sensor data effects the accuracy of classification. The use of noise filtering techniques

effect the timing of classifications. To better understand the relationship between timing and accuracy, we employed two techniques: 1) data smoothing, and 2) a very simple threshold-based

classification scheme. Note that we make no claim about these techniques being optimal for training—that would require much more extensive empirical testing; however, because of their

varying performance characteristics which we’ll discuss below, they provide a good counterpoint

(29)

remains an open question as to whether there is an “optimal” classification scheme.

To understand these concepts, consider Figure 3.1. The Figure contains a plot from earlier work [12] where five different postures were considered: standing, laying down, sitting, standing

on two legs, and eating off the floor. The x-axis in the plot contains the IMU reading from the

dog’s chest parallel to gravity, and the y-axis of the plot contains the IMU reading from the dog’s back near the base of its tail perpendicular to gravity, aligned with its spine. In all five

postures, the y-axis of the figure is relatively tightly grouped; however, especially for standing

and eating off the floor, the x-axis of the figure has higher variance due to the vertical movement of the head while in these postures. This example illustrates one type of noise in the data, which

may come from multiple sources. One approach to addressing noise is to filter the data using

a moving average filter [62]. The more aggressively the data are filtered, the more noise gets removed; however, as a side effect, true changes in IMU readings are slower to be detected—a

higher latency. We will present data to illustrate the effects of this trade-off below.

The second approach we investigated for addressing the latency vs. accuracy trade-off was to implement a very efficient threshold classification scheme. “Variance-based threshold”

clas-sification (VT), works by identifying the means and standard deviations of each of the six IMU

axes, identifying boundaries between those values that differentiated the data associated with different postures, and constructing a set of decision rules accordingly. For example, examining

the data in Figure 3.1, a value of the x-acceleration on the front IMU (x-axis in the Figure) below 0.4 is very-highly-correlated with the eating off the floor posture. One way to interpret

the VT technique is as computing probabilistic bounding boxes around the IMU readings

as-sociated with different postures. For all measurements, we use VT classification to imply the classification of values within a range of the mean plus or minus a multiple of the standard

deviation.

If the sensor values for the five postures are normally distributed we could predict the likelihood of future values falling within certain ranges. For example, if the data ranges between

µ±3.0σ for one posture (say eating) and no other posture results in data in that range, then we could be 99.7% confident that a sensor reading in that range reflects a eating posture. Given that the sensor readings from multiple sources are not independent and normally-distributed sources

of data we cannot use these estimates formally; however, it does give us a general preference

for selecting as large a σ range as possible, while maintaining strict separation between the ranges associated with each axis for each posture. Algorithm 1 presents the process of finding

the largest range of values expressed as a multiple of σ from the µ. These values are used to

(30)

Algorithm 1 VT Threshold Calculation 1: P ={sit, stand, eat, ...}# Set of postures 2: S ={x1, y1, z1, x2, y2, z2}# Set of sensors

3: Let L ={x1, y1, z1, x2, y2, z2, p}# Labeled data is sensor data and observed posture (p∈P)

4: LetLs,p|p∈P, s∈S# Sensor values for a specific sensor and posture

5:

6: # Calculate the mean and standard deviation for each set of sensor and posture pairings 7: µs,p=mean Ls,p

8: σs,p=stdev Ls,p

9:

10: # CalculateµDiff, the difference between the means of two postures for a given sensor 11: # CalculateσDistance, theµDiff expressed in units of standard deviation

12: # AσDistances,p1,p2= 3 implies a strict separating range out to three standard deviations.

13: for allp0∈P do

14: for allp00∈P, p06=p00do

15: for alls∈ {x1, y1, z1, x2, y2, z2}do

16: µDiffs,p0_,p00=µ_s,p0−µ_s,p00

17: σDistances,p0_,p00=

µDif f_s,p0_,p00

σ_s,p0+σ_s,p00

18: end for 19: end for 20: end for 21:

22: # Find the minimumσDistance for each sensor and posture. 23: LetµMins,p= min(σDistances,p,p0)|p0∈P, p6=p0

24:

25: # For each posture, find the two largestµMin that will be used for classification. 26: for allp∈P do

27: largestMuMinp, secondLargestMuMinp= 0

28: largestMuMinSensorp, secondLargestMuMinSensorp= 0

29: for alls0∈ {x1, y1, z1, x2, y2, z2}do

30: if µMins0_,p>largestMuMinpthen

31: # Remember the new largest and second largest ranges 32: secondLargestMuMinp= largestMuMinp

33: largestMuMinp=µMins0_,p

34:

35: # Remember which sensors provide the ranges 36: secondLargestMuMinSensorp= largestMuMinSensorp

37: largestMuMinSensorp=s0

38: else

39: if µMins0_,p>secondLargestMuMinpthen

40: # Remember range and sensor that’s second largest 41: secondLargestMuMinp=µMins0_,p

42: secondLargestMuMinSensorp=s0

43: end if 44: end if 45: end for 46: end for 47:

(31)

Chapter 4

Timing and Accuracy Criteria

Experiments

©2016 ScienceDirect. Adapted with permission from J. Majikes, R. Brugarolas, M. Winters, S. Yuschak, S. Mealin, K. Walker, P. Yang, B. Sherman, A. Bozkurt, and D. L. Roberts, “Balancing Noise Sensitivity, Response Latency, and Posture Accuracy for a Computer-Assisted Canine Posture Training System,” Int. J. Hum. Comput. Stud., 2016.

We conducted three separate training experiments to understand and validate the perfor-mance of our system before moving onto the Discriminative Stimulus Pilot as shown in

Chap-ter 5. In all experiments our professional dog trainer monitored the canine cognition observing

any signs of stress, including (but not limited to) panting, lip licking, “whale eye”, pacing, avoidance of the testing area, or a lack of interest in the task. Experiment 1, presented in

Section 4.1, involved collecting posture data from two dogs to identify the optimal parameter

settings for preprocessing the data. Experiment 2, presented in Section 4.2, involved using the classification algorithms with the parameters identified in Experiment 1 to compare against the

timing and accuracy of a professional dog trainer. Lastly, in Section 4.3, we present Experiment

3 and the results of further validation of the system and assumptions about response latency whereby various parameter settings were compared in the context of a semi-autonomous

com-puter training system and evaluated based on evidence of dogs learning. Research in Applied

Behavior Analysis has demonstrated the significance of the delay between a behavior (posture change in our case) and the delivery of reinforcement (conditioned or primary) [15]. As a general

rule, it is known that the shorter the latency between the desired behavior and the delivery

(32)

4.1 Experiment 1: Data Preprocessing

To address our design criteria of timing and consistency, we experimented with different ways

of preprocessing our data. We used a simple moving average filter with varying window sizes to balance noise reduction while limiting increases in latency. None of the dogs used in this

exper-iment were used in the timing experexper-iment in Section 4.2, but the results from this experexper-iment

set the moving average window size for the timing experiment.

The moving average window size is a knob that enables us to make a trade-off between noise

sensitivity and latency. A small window size allows true changes in sensor values to be reflected

more quickly in the data, but the influence of high-frequency noise in the data remains high. A classification algorithm applied to data with a small moving average window size will likely

have low latency in detecting postures, but suffer a lower accuracy resulting from the presence of more noise in the data. On the other hand, applying a large moving average window size

will result in very clean data, free from noise. This reduction in noise will come at the cost

of response latency. By aggressively averaging the data, true changes in sensor readings will also be smoothed, thereby increasing the latency before the posture change gets reflected in the

data. The experiments in this section are designed to empirically identify parameter settings

that result in the best trade-off between response latency and accuracy over the RF and VT classifiers.

4.1.1 Experiment 1: Materials and Methods

Dogs: Two dogs performed sit, stand, and eat postures. The sensor data was replayed with

different window sizes to maximize accuracy and minimize sit false negatives. A false negative occurred when the dog was sitting but the system classification was either a stand or eat.

Personnel: Experimenters performed the manual labeling of sensor data. One experimenter called out the posture of the dog. The other experimenter entered the labels into the base unit.

Goal: Since during training any non-classified sit could not be reinforced thereby

inhibit-ing learninhibit-ing, minimizinhibit-ing sit false negatives and accuracy were the primary goals of the data smoothing.

4.1.2 Experiment 1: Data Preprocessing Setup

The experimental process involved collecting data from two dogs, Dogαand Dogβ. See Table B.1 in the Appendix for more information on the dogs used, including age, breed, sex, and weight.

(33)

Table 4.1: Number of sit, stand, and eat data instances along with their duration for the dogs used in Section 4.1’s data preprocessing and tuning of the moving average window size.

Dog Sits Eats Stands Total

Dogα

Instances (10 Hz) 920 57 1881 2858 Time (mm:ss) 01:32.0 00:05.7 03:08.1 04:45.8

Dogβ

Instances (10 Hz) 1093 0 510 1603

Time (mm:ss) 01:49.3 00:00.0 00:51.0 02:40.3

three postures. The dogs were given food rewards after maintaining each posture for somewhere

between two and ten seconds. We collected between three to five minutes of data from each dog. Table 4.1 shows for each dog, the number of sit, stand, and eat instances collected.

The x-, y-, and z-axis components of the two accelerometers were collected at 10 Hz, and

labeled in real-time by a pair of experimenters. For Table 4.1 Dogαsat for a total of 920 instances or a combined 92.0 seconds of the 285.8 second experimental period. Both of the experimenters

were responsible for watching the dog. One of the experimenters was responsible for calling out

the posture changes and the other for entering the label on the keyboard of the base unit. Using two experimenters reduced the chances of error and distraction causing latency in applying the

labels.

The RF classifier used in this experiment is from the scikit-learn RandomForestClassifier.1

To characterize performance of different moving average window sizes, the raw data was

post-processed using different moving average window sizes ranging from 1 (no filtering) to 10 (a 1-second average). We performed 10-fold cross validation on each moving average size to

compare instance-based accuracy. Since the cross validation was done using instances of data

that were Independent and Identically Distributed (IID), instance-based accuracy used in the cross validation is a true test of instance-classification accuracy.

Ultimately the goal of this research is not to detect sit instances as used in 10-fold cross

validation in Experiment 1 but to recognize a sit posture so the distinction between stand instances and eat instances is superfluous; however, to ensure our data more closely represented

the distribution of postures we will encounter in later experiments, eating was included in

the data set as well—taking a treat from the computer-controlled treat dispenser is the same posture as eating off the floor. Results, however, are reported only for two labels: “sitting” and

(34)

Figure 4.1: Number of FN over varying moving average window sizes. Arrows show where the lowest FN occurred with the smallest window size.

(35)

4.1.3 Experiment 1: Data Preprocessing Results

Figure 4.1 shows the number of False Negatives (FN) for both the VT and RF instance-based

classification schemes for Dogαand Dogβ using moving average sizes of 1 through 10. Figure 4.2

shows the instance-based accuracy for both classification algorithms for the same dogs and window sizes. The arrows in the figure indicate the smallest moving average window size where

the best algorithm performance occurred.

The arrows in Figure 4.1 show that the smallest window size that had the fewest FN for Dogα VT and RF instance-based classification, and Dogβ VT and RF instance-based classification are

at six, three, three, and two respectively. Similarly, the arrows in Figure 4.2 show the smallest

window size that had the highest percent instance-based classification accuracy for Dogα VT and RF, and Dogβ VT and RF are at four, three, three, and three respectively.

For a window size of four the accuracy was always as good or better than a window size of three. In addition, except for Dogα VT, a window size of 4 had the lowest number of FN.

Therefore, based solely on accuracy, a window of size four would seem to be the best choice for

future experiments.

4.2 Experiment 2: Timing and Accuracy

Experiment 1 empirically balanced data smoothing while maximizing accuracy and minimizing

FN. Experiment 2 had two goals. First, the parameters identified in Experiment 1 were used to compare the timing and accuracy of the classification algorithms. Second, outlier analysis

was done to give some indication of the usefulness of the harness with dogs of varying age,

weight, and physical ability. To validate the algorithms as being capable of detecting posture effectively, we chose to compare latency and accuracy of our semi-autonomous system with

the professional dog trainer. To check for the usefulness of the harness with different dogs,

Dixon’s [19] and Grubbs’ [25] tests were used for detecting outliers of the dogs’ percentage accuracy.

When the dog sits, the transition process to a sit posture requires several anatomical move-ments by the dog, and to provide a consistent point of comparison for labeling, the three

experimenters noted when the dog’s rump hit the floor by pressing a button on a keyboard.

The average time of the three experimenters’ inputs was used as the point of comparison of when the sit occurred. Timing accuracy reported here compares this point in time with the

time of the professional dog trainer and with the time of the classification algorithms.

Recall that our overarching goal is to achieve computer-assisted training. Training a dog

1_{http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.}

(36)

to sit requires recognizing the posture and reinforcing within a half second of when the sit

occurs—this is the posture acquisition. Whereas instance-based classification data reports val-ues for every IMU reading at 0.1 second intervals, posture-based classification data represents

the detection of the single instance that represents the beginning of the new posture. If

addi-tional reinforcement is provided due to FP it may result in slower learning. If the reinforcement is provided too late after the posture change occurs the dog may not associate the reinforcement

with the change in posture and either no learning will occur or the dog will learn an

associa-tion with a spurious behavior (sometimes referred to as superstitious behavior in the operant conditioning literature [60]).

In an effort to balance our design criteria of timing and accuracy, the system reduced FP of

classifying two sits when the dog made a slight movement while sitting. The semi-autonomous system assumed that a dog could not revert from and back to a sit posture in less than 0.5 s.

For example, assume that in a one second interval at 10 Hz there are nine sit classifications

and a single stand classification in the middle. The system will not assume the single stand instance classification (for 0.1 s) was an actual stand posture but would assume this was simply

due to noise in the sensor data. The system requires 5 non-sit instance classifications (0.5 s)

before assuming the dog is no longer in the sit posture. More complicated spurious posture classifications could have been done, but empirical test data analysis indicated that eliminating

a single, spurious interval classification of a non-sit meet the timing and accuracy criteria. To meet the operationally-inspired threshold of 0.5 s response latency for sit posture classification,

whenever a dog is in a stand or eat posture, a single sit instance will immediately trigger a sit

posture classification.

Therefore, in addition to timing, both instance-based accuracy and posture-based

classifi-cation accuracy has to be evaluated. Our evaluation of spurious sit posture classificlassifi-cations will

be presented in Section 4.2.3. Instance-based accuracy uses the same 10-fold cross validation used in Section 4.1. Posture-based accuracy is measured in sensitivity (SE) and precision (PR).

SE is the ratio of correct sit classifications, or True Positives (TP), over the number of TP and

FN, _{T P}T P₊_{F N}. In other words, what percentage of the time the algorithm classified a sit relative to how often it should have. PR is the ratio of TP over the number of TP and false positive

(FP), _{T P}T P₊_{F P}. In other words, what percentage of the time the algorithm classified a sit was it

correct in doing so.

4.2.1 Experiment 2: Materials and Methods

Dogs: 16 dogs performed up to five minutes of sit, stand, and eat postures for each of the two