Programming Robots

(1)

PROGRAMMING ROBOTS

Useful robot algorithms in both pseudocode and source code. Because programming is a very huge subject and there are billions of books and tutorials on how to program already written, all I plan to cover is specifically what is important to programming robots not mentioned in common literature.

ATMEGA BOOTLOADER TUTORIAL

Before starting this tutorial, please read my bootloading tutorial (coming soon!) and my UART tutorial! You must also already have a UART connection set up on your robot for this to work: $50 Robot UART tutorial.

As you should know, when I finish my other bootloading tutorial, is that bootloaders are software that can replace your hardware programmer. Instead of hooking up a

programmer, you can program using a serial connection. You will need a programmer to upload the bootloader, but you won't ever need the programmer again, except for maybe programming fuses or lockbits.

Now that you understand what a bootloader is and the benefits for one, now I will demonstrate how to install a bootloader onto your $50 Robot or any other robot with an ATmega microcontroller.

We will be using a bootloader that has an auto-baud feature, where your microcontroller will attempt to reconfigure its internal baud settings to ensure proper bootloader

configuration. This does not mean your other hardware will auto-configure, soooo . . . Important: Make sure that all your external hardware is set to the same exact baud rate or this will not work!

Just for reference, the bootloader I selected is the open sourcefast tiny & mega UART bootloader. This bootloader is a bit buggy, comes with zero documentation, and not much in comments in the source code . . . but it's the best I've found that can handle a wide range of ATmega types. I've made some small config changes to it for use on the$50 Robot, and those files will be downloadable later in this tutorial.

(2)

Configure BAUD (if you haven't already done so) Click:

Start->Settings->Control Panel->System

A new window will come up called 'System Properties'. Open the Hardware tab and click device manager. You should see this:

Go to Ports, select the one you are using, and right click it.

(3)

Now configure the settings as you like, as described in the UART tutorial.

Upload Bootloader

You have two options here. You can either use my precompiled bootloader: ATmega8 Bootloader hex file

ATmega168 Bootloader hex file Axon USB Bootloader hex file beta v2.1 Bootloader files

and then upload it using AVR Studio.

Or you can custom modify it for your specific setup/ATmega. In the following steps I'll explain both.

update: new bootloader software is available

(4)

Note: If you do not plan to modify the bootloader code, you may skip this step. Open up AVR Studio, and in the Project Wizard start a new project called bootloader:

Make sure you select 'Atmel AVR Assembler' since we will be programming in

Assembly. Don't worry, its mostly done for you already. You do not need to program in Assembly to write a bootloader, but the particular bootloader we are using is written in that language, and so we must compile for it.

Click Finish, and the new project should load up.

Install Files

Now, download this zip file, and unzip it into your bootloader directory: Bootloader Source Files (v1.9)

Bootloader Source Files (v2.1)

note: this tutorial teaches only v1.9, but v2.1 is better

Now you must also put your own robot program .hex into the bootloader file as well. For example, suppose you just modified your own custom photovore code and you compiled it. Take that compiled .hex and place it into your bootloader folder. Don't forget to do this every time you change your custom code!

Optional: Compile Code

(5)

Look for a file that matches the microcontroller you are using. For example, if you are using the ATmega168, look for the file M168.asm. Open that file up, and copy/paste the contents into your bootloader.asm that is already open in AVR Studio.

Now looking at the datasheet of your microcontroller (pin out section), verify that the Tx and Rx pins are correct in bootloader.asm. This is an important step, and in rare cases can break something if you skip this step!!! Make any changes as needed.

For example, this is what it should look like for both the ATmega8 and ATmega168:

.equ STX_PORT = PORTD .equ STX_DDR = DDRD .equ STX = PD1 .equ SRX_PIN = PIND .equ SRX_PORT = PORTD .equ SRX = PD0

Now compile it by pressing build:

Upload Code to ATmega

Now that you have your new custom bootloader .hex file, you need to simply upload that to your microcontroller. Use your hardware programmer like you have always done:

(6)

And finally, you need to program a fuse to tell it to use the bootloader.

IMPORTANT: If you change the wrong fuse you can possibly destroy your ATmega! Don't change any other fuses unless you know what you are doing!

(7)

Your Bootloader is Uploaded and Ready!

Now disconnect your programmer cable. You won't be needing that again!

You will need to power cycle your microcontroller (turn it off then on again) after uploading your bootloader for the settings to take effect.

Upload YOUR Program Through UART

update 2010: A GUI version of the bootloader can be found on the Axon II setup tutorial.

Now open up a command prompt by going to start->Run...

(8)

and typing in 'cmd' and pushing ok:

A new command prompt should open up. Using the command 'cd', go into the directory of your bootloader files. See below image for an example.

With your robot turned off and UART ready to go, type in this command: fboot17.exe -b38400 -c1 -pfile.hex -vfile.hex

38400 is your desired baud (9600, 38400, 115200, etc) c1 is your com port (c1, c2, c3, etc)

'file' is the name of your program you want uploaded. The filename MUST be 8

characters or less or it will not work (a bug in the software), and the file must be located in the same folder as fboot.exe. For example, if photovore.hex was your file, do this: -pphotovore.hex -vphotovore.hex

(yes, you need to say it twice, with p for the first time and v for the second time)

Press enter, and now you will see a / symbol spinning. Turn on your robot, and it should now upload.

(9)

For some unexplained reason, I occasionally get an error that says:

Bootloader VFFFFFFFF.FF Error, wrong device informations

If you get this error, just repeat this step again and it should work.

note: after typing in a command once into the command prompt, you do not need to type it again. Just push the up arrow key to cycle through previously typed commands.

The Bootloader Didn't Work?!

What if I did the tutorial but its still not working/connecting?

Chances are you missed a step. Go back to the beginning and make sure you did everything correctly.

Try power cycling your microcontroller.

Make sure the hardware programmer is unplugged.

Make sure baud is configured properly on ALL of your hardware and ALL of your involved software.

Make sure no other device is trying to use the same com port at the same time, such as AVR Studio, HyperTerminal, etc.

(10)

Some mistakes that you can make will cause your command prompt window to freeze up. Just open up a new window and try again.

Some users have noticed that too many unused globabl variables in your source code will cause problems. See

this forum post for more info.

And a side note . . . this bootloader can only connect with com ports 1 to 4. The

developer of the bootloader for some odd reason thought there isn't anything wrong with this decision . . . If you need a different port, go to the com port settings and change the port you are using.

Also to note, some of your UART hardware might not be fast enough as the software doesn't wait for hardware to keep up. TheEasy Radio module will not work, for example. A direct serial/USB connection will work without a problem.

PROGRAMMING - COMPUTER VISION TUTORIAL

Introduction to Computer Vision

Computer vision is an immense subject, more than any single tutorial can cover. In the following tutorials I will cover the basics of computer vision in four parts, each focused on need-to-know practical knowledge.

Part 1: Vision in Biology

Part 1 will talk about vision in biology, such as the human eye, vision in insects, etc. By understanding how biology processes visual images, you may then be able to apply what you learned towards your own creations. This will help you turn the 'magic' into an understanding of how vision really works.

Part 2: Computer Image Processing

Part 2 will go into computer image processing. I will talk about how a camera captures an image, how it is stored in a computer, and how you can do basic alterations of an image. Basic machine vision tricks such as heuristics, thresholding, and greyscaling will be covered.

(11)

Part 3: Computer Vision Algorithms

Part 3 covers the typical computer vision algorithms, where I talk about how to do some higher level processing of what your robot sees. Edge detection, blob counting, middle mass, image correlation, facial recognition, and stereo vision will be covered.

Part 4: Computer Vision Algorithms for Motion

Part 4 covers computer vision algorithms for motion. Motion detection, tracking, optical flow, background subtraction, and feature tracking will be explained. There is also a problem set to test you on what you have learned in this computer vision tutorial series.

PROGRAMMING - COMPUTER VISION TUTORIAL Part 1: Vision in Biology

Vision in Biology

So why vision in biology? What does biology have to do with robots? Well,biomimetics is the study of biology to aid in the design of new technology - such as robots. The purpose of this tutorial is so that you can understand how biology approaches the vision problem. As we progress through parts2,3, and4 you will start to draw parallels between how a robot can see the world and how you and I see the world. I will assume you have a basic understanding of biology, so I will try to build upon what you already know with a bottom->up approach, and hopefully not bore you with what you already know.

The Eye

(12)

Light first passes through the iris. The iris is what adjusts for the amount of light entering the eye - an auto-brightness adjuster. This is so no matter how much light the eye sees, it tries to adjust the eye to always gather a set amount. Note that if the light is still too bright, you will feel naturally compelled to cover your eyes with your hands.

Light then passes to the lens, which is stretched and compressed by muscles to focus the image. This is similar to auto-focus on a digital camera. Notice how the lens inverts the image upside-down?

With two eyes creates stereo vision, as they do not look in parallel straight lines. For example, look at your finger, then place your finger on your nose - see how you

automatically become cross eyed? The angle of your eyes to each other generates ranging information which is then sent to your brain. Note: this however is not the only method the eyes use to generate range data.

Cones and Rods

The light then goes into contact with special neurons in the eye (cones for color androds for brightness) that convert light energy to chemical energy. This process is complicated, but the end result is neurons that fire in special patterns that are sent to the brain by way of the optical nerve. Cones and Rods are the biological versions of pixels. But unlike in a camera where each pixel is equal, this is not true for the human eye.

(13)

What the above chart shows is the number of rods and cones in the eye vs location in the eye. At the very center of the eye (fovea = 0) you will notice a huge number of cones, and zero rods. Further out from the center the number of cones sharply decrease, with a gradual increase in rods. What does this mean? It means only the center of your eye is capable of processing color - the information from the rods going to your brain is significantly higher!

Note the section labeled optic disk. This is where the optic nerveattaches to your eye, leaving no space left for light receptors. It is also called your blind spot.

Compound Eyes

Compound eyes work in the same way the human eye above works. But instead of rods and cones being the pixels, each individual compound eye acts as a pixel. Unlike popular folk-lore, the insect doesnt actually see hundreds of images. Instead it is hundreds of pixels, combined.

An robot example of a compound eye would be getting a hundredphotoresistors and combining them into a matrix to form a single greyscale image.

(14)

What advantage does a compound eye have over a human eye? If you poke a human eye out, his ability to see (total pixels gathered) drops to 50%. If you poke an insect eye out, it will still have 99% visual capability. It can also simply regrow an eye.

Optic Nerve 'Image Processing'

Most people dont realize how jumbled the information from the human eye really is. The image is inverted from the lens, rods and cones are not equally distributed, and neither eye sees the exact same image!

This is where the optic nerve comes into play. By reorganizing neurons physically, it can reassemble an image to something more useful.

Notice how the criss-crossing reorganizes the information from the eyes - that which is seen on the left is processed in the right brain, and that which is seen on the right is processed in the left brain. The problem of two eyes seeing two different images is

partially solved. Also interesting to note, there are significantly fewer neurons in the optic nerve then there are cones and rods in the eye. Theory goes that there is summing and averaging going on of 'pixels' that are in close proximity in the eye.

What happens after this is still unknown to science, but significant progress has been made.

Brain Processing

This is where your brain 'magically' assembles the image into something

comprehendable. Although the details are fuzzy, it has been determined that different parts of your brain process different parts of the image. One part may process color, another part detecting motion, yet another determining shape. This should give you clues to how to program such a system, in that everything can be treated as seperate

(15)

And yet more Brain Processing . . .

All of the basic visual information is gathered, and then processed again into yet a higher level. This is where the brain asks, what is it do I really see? Again, science has not entirely solved this problem (yet), but we have really good theories on what probably happens. Supposedly the brain keeps a large database of reference information - such as what a mac-n-cheese dinner looks like. The brain 'observes' something, then goes through the reference library to make conclusions on what is observed.

How could this happen? Well, the brain knows the color should be orange, it knows it should have a shiny texture, and that the shape should be tube-like. Somehow the brain makes this connection, and tells you 'this is mac-n-cheese, yo.' Your other senses work in a similar manner.

More specifically, the theory is about pattern recognition . . . its sorta like me showing you an ink blot, then asking you 'what do you see?' Your brain will try and figure it out, despite the fact it doesnt actually represent anything. Its a subconscious effort.

(16)

Your brain also uses its understanding of the physical world (how things connect together in 3D space) to understand what it sees. Dont believe me? Then tell me how many legs this elephant has.

I highly recommend doing agoogle search on optical illusions. This is when the image processing rules of the brain 'break,' and is often used by scientists to figure out how we understand what we see.

Stereo Image Processing

What has baffled scientists for the longest time, and only recently solved (in my opinion), is what allows us to see a 2D image and yet picture it in 3D. Look at a painting of a scene, and you can immediately determine a fairly accurate measurement and distance away of every object in the picture.Scientists at CMUhave recently solved how a

computer can accomplish this. Basically a computer keeps a huge index of about a 1000 or so images, each with range data assigned (trained) to it. Then by probability analysis, it can make connections with future images that need to be processed.

Here are examples of figuring out 3D from 2D.

ALL lines that are parallel in 3D converge in 2D. This is a picture of a traintrack. Notice how the parellel lines converge to a single point? This is a method the brain uses to guestimate range data.

(17)

The brain uses the relation of objects located on the 2D ground to determine 3D scenes. Here is a picture of a forest. By looking at where the trees are located on the ground, you can quickly figure out how far away the trees are located from each other. What tree is closest to the photographer? Why? How do you program that as an algorithm?

If I removed the ground reference, what then would you rely on to figure out how far each tree is from each other? The next method would probably be size comparisons. You would assume trees that are located closer would appear larger.

But this wouldnt work if you had a giant tree far away and a tiny tree close up - as both would appear the same size! So the brain has yet many more methods, such as

comparisons ofdetails (size of leaves, for example), shading and shadows, etc. The below image is just a circle, but appears as a sphere because of shading. An algorithm that can process shading can convert 2D images to 3D.

(18)

Now that you understand the basics of biological vision processing in ourComputer Vision Tutorial Series, you may continue on to Part 2: Computer Image Processing.

PROGRAMMING - COMPUTER VISION TUTORIAL Part 2: Computer Image Processing

Pixels and Resolution 2D Matrices

Decreasing Resolution Thresholding and Heuristics Image Color Inversion Image Brightness / Darkness Addendum (1D -> 4D) Computer Image Processing

In part 2 of the Computer Vision Tutorial Serieswe will talk about how images are stored in a computer, as well as basic image manipulation algorithms. Mona Lisa (original image above) will be our guiding example throughout this tutorial.

Image Collection

The very first step would be to capture an image. A camera captures data as a stream of information, reading from a single light receptor at a time and storing each complete 'scan' as one single file. Different cameras can work differently, so check the manual on how it sends out image data.

(19)

A CCD transports the charge across the chip and reads it at one corner of the array. An analog-to-digital converter (ADC) then turns each pixel's value into a digital value by measuring the amount of charge at each photosite and converting that measurement to binary form. CMOS devices use several transistors at each pixel to amplify and move the charge using more traditional wires. The CMOS signal is digital, so it needs no ADC. CCD sensors create high-quality, low-noise images. CMOS sensors are generally more susceptible to noise.

Because each pixel on a CMOS sensor has several transistors located next to it, the light sensitivity of a CMOS chip is lower. Many of the photons hit the transistors instead of the photodiode.

CMOS sensors traditionally consume little power. CCDs, on the other hand, use a process that consumes lots of power. CCDs consume as much as 100 times more power than an equivalent CMOS sensor.

CCD sensors have been mass produced for a longer period of time, so they are more mature. They tend to have higher quality pixels, and more of them. Below is how colored pixels are arranged on a CCD chip:

When storing or processing an image, make sure the image is uncompressed - meaning don't use JPG's . . . BMP's, GIF's, and PNG's are often (although not always)

(20)

transmission speed), you will have to uncompress the image before processing. This is important with how the file is understood . . .

Pixels and Resolution

In every image you have pixels. These are the tiny little dots of color you see on your screen, and the smallest possible size any image can get. When an image is stored, the image file contains information on every single pixel in that image.

This information includes two things: color, and pixel location.

Images also have a set number of pixels per size of the image, known as resolution. You might see terms such as dpi (dots per square inch), meaning the number of pixels you will see in a square inch of the image. A higher resolution means there are more pixels in a set area, resulting in a higher quality image. The disadvantage of higher resolution is that it requires more processing power to analyze an image. When programming computer vision into a robot, use low resolution.

The Matrix (the math kind)

Images are stored in 2D matrices, which represent the locations of all pixels. All images have an X component, and a Y component. At each point, a color value is stored. If the image is black and white (binary), either a 1 or a 0 will be stored at each location. If the color is greyscale, it will store a range of values. If it is a color image (RBG), it will store sets of values. Obviously, the less color involved, the faster the image can be processed. For many applications, binary images can acheive most of what you want. Here is a matrix example of a binary image of a triangle:

0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0

It has a resolution of 7 x 5, with a single bit stored in each location. Memory required is therefore 7 x 5 x 1 = 35 bits.

Here is a matrix example of a greyscale (8 bit) image of a triangle: 0 0 55 255 55 0 0 0 55 255 55 255 55 0 55 255 55 55 55 255 55 255 255 255 255 255 255 255 55 55 55 55 55 55 55 0 0 0 0 0 0 0

(21)

It has a resolution of 7 x 6, with 8 bits stored in each location. Memory required is therefore 7 x 6 x 8 = 336 bits.

As you can see, increasing resolution and information per pixel can significantly slow down your image processing speed.

After converting color data to generate greyscale, Mona Lisa looks like this:

Decreasing Resolution

The very first operation I will show you is how to decrease the resolution of an image. The basic concept in decreasing resolution is that you are selectively deleting data from the image. There are several ways you can do this:

The first method is just delete 1 pixel out of every group of pixels in both X and Y directions of the matrix.

For example, using our greyscale image of a triangle above, and deleting one out of every two pixels in the X direction, we would get:

0 55 55 0 0 255 255 0 55 55 55 55 255 255 255 255 55 55 55 55 0 0 0 0

and continuing with the Y direction: 0 55 55 0

55 55 55 55 55 55 55 55

(22)

Another way of decreasing resolution would be to choose a pixel, average the values of all surrounding pixels, store that value in the choosen pixel location, then delete all the surrounding pixels.

For example, 13 112 112 13 145 166 166 145 103 103 103 103

Using the latter method for resolution reduction, this is what Mona Lisa would look like (below). You can see how pixels are averaged along the edges of her hair.

Thresholding and Heuristics

While the above method reduces image file size by resolution reduction, thresholding reduces file size by reducing color data in each pixel.

To do this, you first need to analyze your image by using a method called heuristics. Heuristics is when you statistically look at an image as a whole, such as determining the overall brightness of an image, or counting the total number of pixels that contain a certain color. For an example histogram, here is my sample

greyscale pixel histogram of Mona Lisa, and sample histogram generation code.

An example image heuristic plotting pixel count (Y-axis) versus pixel color intensity (0 to 255, X-axis):

(23)

Often heuristics is used for improving image contrast. The image is analyzed, and then bright pixels is made brighter, and dark pixels is made darker. Im not going to go into contrast details here as it is a little complicated, but this is what an improved contrast of Mona Lisa would look like (before and after):

In this particular thresholding example, we will convert all colors to binary. How do you decide which pixel is a 1 and which is a 0? The first thing you do is determine a

threshold - all pixel values above the threshold becomes a 1, and all below becomes a 0. Your threshold can be chosen arbitrarily, or it can be based on your heuristic analysis. For example, converting our greyscale triangle to binary, using 40 as our threshold, we will get: 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0

If the threshold was 100, we would get this better image: 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

As you can see, setting a good threshold is very important. In the first example, you cannot see the triangle, yet in the second you can. Poor thresholds result in poor images. In the following example, I used heuristics to determine the average pixel value (add all pixels together, and then divide by the total number of pixels in the image). I then set this average as the threshold. Setting this threshold for Mona Lisa, we get this binary image:

(24)

Note that if the threshold was 1, the entire image would be black. If the threshold was 255, the entire image would be white. Thresholding really excels when the background colors are very different from the target colors, as this automatically removes the

distracting background from your image. If your target is the color red, and there is little to no red in the background, your robot can easily locate any object that is red by simply thresholding the red value of the image.

Image Color Inversion

Color image inversion is a simple equation that inverts the colors of the image. I havnt found any use for this on a robot, but it does however make a good example . . . The greyscale equation is simply:

255 - pixel_value = new_pixel_value

The greyscale triangle then becomes: 255 255 200 0 200 255 255 255 200 0 200 0 200 255 200 0 200 200 200 0 200 0 0 0 0 0 0 0 200 200 200 200 200 200 200 255 255 255 255 255 255 255 An RBG of Mona Lisa becomes:

(25)

Brightness (and Darkness)

Increasing brightness is another simple algorithm. All you do is add (or subtract) some arbitrary value to each pixel:

new_pixel_value = pixel_value + 10

You must also make sure that no pixel goes above an exceeded value. With 8 bit greyscale, no value can exceed 255. A simple check can be added like this:

if (pixel_value + 10 > 255) { new_pixel_value = 255; }

else

{ new_pixel_value = pixel_value + 10; }

And for our lovely and now radiant Mona Lisa:

The problem with increasing brightness too much is that it will result in whiteout. For example, if your arbitrarily added value was 255, every pixel would be white. It also does not improve a robot's ability to understand an image, so you probably will not find a use for this algorithm directly.

(26)

Addendum: 1D, 2D, 3D, 4D

A 1D image can be obtained from use of a 1 pixel sensor, such as a photoresistor. As metioned in part 1 of this vision tutorial, if you put several photoresistors together, you can generate an image matrix.

You can also generate a 2D image matrix by scanning a 1 pixel sensor, such as with a scanning Sharp IR. If you use a ranging sensor, you can easily store 3D info into a much more easily processed 2D matrix.

4D images include time data. They are actually stored as a set of 2D matrix images, with each pixel containing range data, and a new 2D matrix being stored after every X seconds of time passing. This makes processing simple, as you can just analyze each 2D matrix seperately, and then compare images to process change in time. This is just like film of a movie, which is actually just a set of 2D images changing so fast it appears to be moving. This is also quite similar to how a human processes temporal information, as we see about 25 images per second - each processed individually.

Actually, biologically, its a bit more complicated than this. Feel free to read an email I recieved from Mr Bill concerning biological fps. But for all intents and purposes, 25fps is an appropriate benchmark.

Now that you understand the basics of computer image processing in ourComputer Vision Tutorial Series, you may continue on to Part 3: Computer Vision Algorithms (coming soon!).

PROGRAMMING - COMPUTER VISION TUTORIAL Part 3: Computer Vision Algorithms

Edge Detection Shape Detection

Middle Mass and Blobs Pixel Classification

(27)

Image Correlation Facial Recognition Stereo Vision

Now that you have learned about biological visionand computer image processing, we now continue on to the basic algorithms of computer vision.

Computer Vision vs Machine Vision

Computer vision and machine vision differ in how images are created and processed. Computer vision is done with everyday real world video and photography. Machine vision is done in oversimplified situations as to significantly increase reliability while decreasing cost of equipment and complexity of algorithms. As such, machine vision is used for robots in factories, while computer vision is more appropriate for robots that operate in human environments. Machine vision is more rudimentary yet more practical, while computer vision relates to AI. There is a lesson in this . . .

Edge Detection

Edge detection is a technique to locate the edges of objects in the scene. This can be useful for locating the horizon, the corner of an object, white line following, or for determing the shape of an object. The algorithm is quite simple:

sort through the image matrix pixel by pixel

for each pixel, analyze each of the 8 pixels surrounding it record the value of the darkest pixel, and the lightest pixel if (darkest_pixel_value - lightest_pixel_value) > threshold) then rewrite that pixel as 1;

else rewrite that pixel as 0;

What the algorithm does is detect sudden changes in color or lighting, representing the edge of an object.

(28)

A challenge you may have is choosing a good threshold. This left image has a threshold thats too low, and the right image has a threshold thats too high. You will need to run an image heuristics programfor it to work properly.

You can also do other neat tricks with images, such as thresholding only a particular color like red.

Shape Detection and Pattern Recognition

Shape detection requires preprogramming in a mathematical representation database of the shapes you wish to detect. For example, suppose you are writing a program that can distinguish between a triangle, a square, and a circle. This is how you would do it:

run edge detection to find the border line of each shape count the number of continuous edges

a sharp change in line direction signifies a different line

(29)

if three lines detected, then its a triangle if four lines, then a square

if one line, then its a circle

by measure angles between lines you can determine more info (rhomboid, equilateral triangle, etc.)

The basic shapes are very easy, but as you get into more complex shapes (pattern recognition) you will have to use probability analysis. For example, suppose your algorithm needed to recognize between 10 different fruits (only by shape) such as an apple, an orange, a pear, a cherry, etc. How would you do it? Well all are circular, but none perfectly circular. And not all apples look the same, either.

By using probability, you can run an analysis that says 'oh, this fruit fits 90% of the characteristics of an apple, but only 60% the characteristics of an orange, so its more likely an apple.' Its the computational version of an 'educated guess.' You could also say 'if this particular feature is present, then it has a 20% higher probability of being an apple.' The feature could be a stem such as on an apple, fuzziness like on a coconut, or spikes like on a pinneapple, etc. This method is known as feature detection.

Middle Mass and Blob Detection

Blob detection is an algorithm used to determine if a group of connecting pixels are related to each other. This is useful for identifying seperate objects in a scene, or counting the number of objects in a scene. Blob detection would be useful for counting people in an airport lobby, or fish passing by a camera. Middle mass would be useful for a baseball catching robot, or a line following robot.

(30)

To find a blob, you threshold the image by a specific color as shown below. The blue dot represents the middle mass, or the average location of all pixels of the selected color.

If there is only one blob in a scene, the middle mass is always located in the center of an object. But what if there were two or more blobs? This is where it fails, as the middle mass is no longer located on any object:

To solve for this problem, your algorithm needs to label each blob as seperate entities. To do this, run this algorithm:

go through each pixel in the array: if the pixel is a blob color, label it '1' otherwise label it 0

go to the next pixel if it is also a blob color and if it is adjacent to blob 1 label it '1'

(31)

repeat until all pixels are done

What the algorithm does is labels each blob by a number, counting up for every new blob it encounters. Then to find middle mass, you can just find it for each individual blob. In this below video, I ran a few algorithms in tandem. First, I removed all non-red objects. Next, I blurred the video a bit to make blobs more connected. Then, using blob detection, I only kept the blob that had the most pixels (the largest red object). This removed background objects such as the fire extinguisher. Lastly, I did center of mass to track the actual location of the object. I also ran a population threshold algorithm that made the object edges really sharp. It doesnt improve the algorithm in this case, but it does make it look nicer as a video.

Feel free to download my

custom blob detection RoboRealm file that I used.

In this video, I programmed my ERP to do nothing but middle mass tracking:

Pixel Classification

Pixel Classification is when you assign each pixel in an image to an object class. For example, all greenish pixels would be grass, all blueish pixels would be sky or water, all greyish pixels would be road, and all yellow would be a road lane divider. There are other ways to classify each pixel, but color is typically the easiest.

This method is clearly useful for picking out the road for road following and obstacles for obstacle avoidance. Its also used in satellite image processing, such as this image of a city (yellow/red for buildings), forest (green), and river (blue):

(32)

If Greenpeace wanted to know how much forest has been cut down, a simple pixel density count can be done. To do this, simply count and compare the forest pixels from before and after the logging.

A major benefit to this bottom-up method to image processing is its immunity to heavy image noise. Blobs do not need to be identified first. By finding the middle mass of these pixels, the center location of each object can be found.

Need an algorithm to identify roads for your driving robot? This below video (from my house front door) is an example of me simply maximizing RBG (red blue green) colors. Pixels that are more blue than any other color become all blue, pixels more green than any other color become all green, and the same for red. What you get is the road being all blue, the grass being all green, and houses being red. Its not perfect, yet still works amazingly well for a simple pixel classification algorithm. This algorithm would well compliment another algorithm(s).

Feel free to download my

custom pixel classification RoboRealm file that I used. <head><version>1.7.3.3</version></head>

<Read_AVI>

<loop_playback>1</loop_playback>

<filename>C:\Documents and Settings\Pika\Desktop\snowpics\MOV03312 mpg.avi</filename> <running>TRUE</running> </Read_AVI> <Scale> <maintain_aspect>1</maintain_aspect> <percent_height>62</percent_height> <percent_width>62</percent_width> <pixel_width>400</pixel_width> <pixel_height>300</pixel_height> </Scale> <Max_RGB_Channel/> <RGB_Filter> <max_value>20</max_value> <min_value>113</min_value> <channel>4</channel> </RGB_Filter> <Write_AVI> <limit_time_type>-1</limit_time_type> <image_to_save>Current</image_to_save> <codec>Indeo? video 5.10</codec>

<filename>C:\Documents and

Settings\Pika\Desktop\snowpics\RBGmaxinga.avi</filename> <real_time>1</real_time>

(33)

Image Correlation (Template Matching)

Image correlation is one of the many forms of template matching for simple object recognition. This method works by keeping a large database of various imaged features, and computing 'intensity similiarity' of an entire image or window with another.

In this example, various features of an adorably cute squirrel (its the species name) are obtained for comparison with other objects.

This method is also used for feature detection (mentioned earlier) and facial recognition . . .

Facial Recognition

Facial recognition is a more advanced type of pattern recognition. With shape recognition you only need a small database of mathematical representations of shapes. But while basic shapes like a triangle can be easily described, how do you mathematically represent a face?

Here is an excercise for you. Suppose you have a friend coming to your family's house and she/he wants to recognize every face by name before arriving. If you could only give

(34)

a written list of facial features of each family member, what would you say about each face? You might describe hair color, length, or style. Maybe your sister has a beard. One person might have a more rounded face, while another person might have a very thin face. For a family of 4 people this excercise is really easy.

But what if you had to do it for everyone in your class? You might also analyze skin tone, eye color, wrinkles, mouth size . . . the list goes on. As the number of people that will be analyzed grows, so would the number of required descriptions for each face.

One popular way of digitizing faces is to measure the distance between each eye, size of the head, distance between eyes and mouth, and length of mouth. By keeping a database of these values, surprisingly you can accurately identify thousands of different faces. Hint: notice how the features on Mona Lisa's face above is much easier to identify and locate after edge detection.

Unfortunately for law enforcement this method does not work outside of the lab. This is because it requires facial images that are really close and clear for the measurements to be done accurately. It is also difficult to control which way a person is looking, too. For example, can you make out the facial measurements of the man in this security cam image?

(35)

Have a look at this below image. Despite these pictures also being tiny and blurry, you can somehow recognize many of them! The human brain obviously has other yet undiscovered methods of facial recognition . . .

Stereo Vision

Stereo vision is a method of determing the 3D location of objects in a scene by

comparing images of two seperate cameras. Now suppose you have some robot on Mars and he sees an alien (at point P(X,Y)) with two video cameras. Where does the robot need to drive to run over this alien (for 20 kill points)?

(36)

First lets analyze the robot camera itself. Although a simplification resulting in minor error, the pinhole camera model will be used in the following examples:

The image plane is where the photo-receptors are located in the camera, and the lensis the lens of the camera. The focal distance is the distance between the lens and the photo-receptors (can be found in the camera datasheet). Point P is the location of the alien, and point p is where the alien appears on the photo-receptors. The optical axis is the direction the camera is pointing. Redrawing the diagram to make it mathematically simpler to understand, we get this new diagram

(37)

with the following equations for a single camera:

x_camL = focal_length * X_actual / Z_actual y_camL = focal_length * Y_actual / Z_actual CASE 1: Parallel Cameras

Now moving on to two parallel facing cameras (L for left camera and R for right camera), we have this diagram:

The Z-axis is the optical axis (the direction the cameras are pointing). b is the distance between cameras, while f is still the focal length. The equations of stereo triangulation (because it looks like a triangle) are:

Z_actual = (b * focal_length) / (x_camL - x_camR) X_actual = x_camL * Z_actual / focal_length Y_actual = y_camL * Z_actual / focal_length

(38)

CASE 2a: Non-Parallel Cameras, Rotation About Y-axis

And lastly, what if the cameras are pointing in different non-parallel directions? In this below diagram, the Z-axis is the optical axis for the left camera, while the Zo-axis is the optical axis of the right camera. Both cameras lie on the XZ plane, but the right camera is rotated by some angle phi. The point where both optical axes (plural for axis, pronounced ACKS - I) intersect at the point (0,0,Zo) is called the fixation point. Note that the

fixation point could also be behind the cameras when Zo < 0.

calculating for the alien location . . .

Zo = b / tan(phi)

Z_actual = (b * focal_length) / (x_camL - x_camR + focal_length * b / Zo) X_actual = x_camL * Z_actual / focal_length

Y_actual = y_camL * Z_actual / focal_length CASE 2b: Non-Parallel Cameras, Rotation About X-axis calculating for the alien location . . .

Z_actual = (b * focal_length) / (x1 - x2) X_actual = x_camL * Z_actual / focal_length

Y_actual = y_camL * Z_actual / focal_length + tan(phi) * Z CASE 2c: Non-Parallel Cameras, Rotation About Z-axis

For simplicity, rotation around the optical axis is usually dealt with by rotating the image before applying matching and triangulation. Given the translation vector T and rotation matrix R describing the transormation from left camera to right camera coordinates, the equation to solve for stereo triangulation is:

(39)

where p and p' are the coordinates of P in the left and right camera coordinates respectively, and RT is the transpose (or the inverse) matrix of R.

Please continue on in the

Computer Vision Tutorial Seriesfor Part 4: Computer Vision Algorithms for Motion. PROGRAMMING - COMPUTER VISION TUTORIAL

Part 4: Computer Vision Algorithms for Motion Motion Detection Tracking Optical Flow Background Subtraction Feature Tracking Practice Problems Download Software

In part 4 of the Computer Vision Tutorial Serieswe will continue with computer vision algorithms for motion.

Motion Detection (Bulk Motion)

Motion detection works on the basis of frame differencing - meaning comparing how pixels (usuallyblobs) change location after each frame. There are two ways you can do motion detection.

The first method just looks for a bulk change in the image:

calculate the average of a selected color in frame 1 wait X seconds

calculate the average of a selected color in frame 2 if (abs(avg_frame_1 - avg_frame_2) > threshold) then motion detected

The other method looks at the motion of the middle mass:

calculate the middle mass in frame 1 wait X seconds

(40)

calculate the middle mass in frame 2

if (mm_frame_1 - mm_frame_2) > threshold) then motion detected

The problem with these motion detection methods is that neither detects very slow moving objects, determined by the sensitivity of the threshold. But if the threshold is too sensitive, it will detect things like shadows and changes in sunlight!

The algorithm also cant handle a rotating object - an object that moves, but which has a middle mass that does not change location.

Tracking

By doing motion detection by calculating the motion of the middle mass, you can run more advanced algorithms such as tracking. By doing vector math, and knowing the pixel to distance ratio, one may calculate the displacement, velocity, and acceleration of a movingblob.

Here is an example on how to calculate speed of a car:

calculate the middle mass in frame 1 wait X seconds

calculate the middle mass in frame 2

speed = (mm_frame_1 - mm_frame_2) * distance / per_pixel

(41)

The major issue with this algorithm is determining the distance to pixel ratio. If your camera is at an angle to the horizon (not looking overhead and pointing straight down), or your camera experiences the lens effect (all cameras do, to some extent), then you need to write a separate algorithm that maps this ratio for a given pixel located at X and Y position.

The below image is an exagerated lens effect, with pixels further down the trail equaling a greater distance than the pixels closer to the camera.

This Mars Rover camera image is a good example of the lens effect:

Lens radial distortion can be modelled by the following equations:

x_actual = xd * (1 + distortion_constant * (xd^2 + yd^2)) y_actual = yd * (1 + distortion_constant * (xd^2 + yd^2))

The variables xd and yd are the image coordinates of the distorted image. The

distortion_constant is a constant depending on the distortion of the lens. This constant can either be determined experimentally, or from data sheets of the lens or camera. Cross over is the other major problem. This is when multiple objects cross over each other (ie one blob passes behind another blob) and the algorithm gets confused which

(42)

blob is which. For an example, here is a video showing the problem. Notice how the algorithm gets confused as the man goes behind the tree, or crosses over another tracked object? The algorithm must remember a decent number of features of each tracked object for crossovers to work.

(video was taken from here)

Optical Flow

This computer vision method completely ignores and has zero interest in identifying observed objects. It works by analyzing the bulk/individual motion of pixels. It is useful for tracking, 3D analysis, altitude measurement, and velocity measurement. This method has the advantage that it can work with low resolution cameras, while the more simple algorithms require minimal processing power.

Optical flow is a vector field that shows the direction and magnitude of these intensity changes from one image to the other, as shown here:

Applications for Optical Flow

Altitude Measurement (for constant speed)

Ever notice when traveling by plane, the higher you are the slower the ground below you seems to move? For aeriel robots that have a known constant speed, by analyzing pixel velocity from a downward facing camera the altitude can be calculated. The slower the pixels travel, the higher the robot. A potential problem however is when your robot rotates in the air, but this can be accounted for by adding additional sensors like gyros and

(43)

Velocity Measurement (for constant altitude)

For a robot that is traveling at some known altitude, by analyzing pixel velocity, the robot velocity can be calculated. This is the converse of the altitude measurement method. It is impossible to gather both altitude and velocity data simultaneously using only optical flow, so a second sensor (such as GPS or an altimeter) needs to be used. If however your robot was an RC car, the altitude is already known (probably an inch above the ground). Velocity can then be calculated using optical flow with no other sensors. Optical flow can be used to directly compute time to impact for missles. Optical flow also is a technique often used by insects to gauge flight speed and direction.

Tracking

Please see tracking above, and background subtraction below. The optical flow method of tracking combines both of those methods together. By removing the background, all that needs to be done is analyze the motion of the moving pixels.

3D Scene Analysis

By analyzing motion of all pixels, it is possible to generate rough 3D measurements of the observed scene. For example, the below image of the subway train: the pixels on the far left are moving fast, and they are both converging and slowing down towards the center of the image. With this information, 3D information of the train can be calculated (including velocity of train, and angle of the track).

Problems with optical flow . . .

Generally, optical flow corresponds to the motion field, but not always. For example, the motion field and optical flow of a rotating barber's pole are different:

(44)

Although it is only rotating about the z-axis, optical flow will say the red bars are moving upwards in the z-axis. Obviously, assumptions need to be made of the expected observed objects for this to work properly.

Accounting for multiple objects gets really complicated . . . especially if they cross each other . . .

And lastly, the equations get yet more complicated when you track not just linear motion of pixels, but rotational motion as well. With optical flow, how do you tell if the center point of this ferris wheel is connected to the outer half?

Background Subtraction

Background subtraction is the method of removing pixels that do not move, focusing only on objects that do. The method works like this:

capture two frames

(45)

if the colors are the same, replace with the color white else, keep the new pixel

Here is an example of a guy moving with a static background. Some pixels did not appear to change when he moved, resulting in error:

The problem with this method as above is that if the object stops moving, then it becomes invisible. If my hand moves, but my body doesnt, all you see is a moving hand. There is also the chance that although something is moving, not all the individual pixels change color because the object is of a uniform color. To correct for this, this algorithm must be combined with other algorithms such as edge detection and blob finding, to make sure all pixels within a moving boundary arent discarded.

There is one other form of background subtraction called blue-screening (or

green-screening, or chroma-key). What you do is physically replace the background with a solid color - a big green curtain (called a chroma-key) typically works best. Then the

computer replaces all pixels of that color with pixels from another scene. This technique is commonly used for weather anchor people, and is why they never wear green ties =P

This blue-screening method is more a machine vision technique, as it will not work in everyday situations - only in studios with expert lighting.

(46)

Here is a video of my

ERP that I made using chroma key. If you look carefully, you'll see various chroma key artifacts as I didn't put much effort into getting it perfect. I used Sony Vegas Movie Studioto make the video.

Feature Tracking

A feature is a specific identified point in the image that a tracking algorithm can lock onto and follow through multiple frames. Often features are selected because they are bright/dark spots, edges or corners - depending on the particular tracking

algorithm.Template matching is also quite common. What is important is that each feature represents a specific point on the surface of a real object. As a feature is tracked it becomes a series of two-dimensional coordinates that represent the position of the feature across a series of frames. This series is referred to as a track. Once tracks have been created they can be used immediately for 2D motion tracking, or then be used to calculate 3D information.

(for a realplayer streaming video example of feature tracking, click the image)

Visual Servoing

Visual servoing is a method of using video data to determine position data of your robot. For example, your robot sees a door and wants to go through it. Visual servoing will allow the front of your robot to align itself with the door and pass through. If your robot wanted to pick something up, it can use visual servoing to move the arm to that location. To drive down a road, visual servoing would track the road with respect to the robots heading.

(47)

To do visual servoing, first you need to use the vision processing methods listed in this tutorial to locate the object. Then your robot needs to decide how to orient itself to reach that location using some type of

PID loop - the error being the distance between where the robot wants to be, and where it sees it is.

If you would like to learn more about robot arms for use in visual servoing, see myrobot arms tutorial.

ROBOT ARM TUTORIAL

Degrees of Freedom Robot Workspace Mobile Manipulators Force Calculations Forward Kinematics Inverse Kinematics Motion Planning Velocity Sensing

(48)

About this Robot Arm Tutorial

The robot arm is probably the most mathematically complex robot you could ever build. As such, this tutorial can't tell you everything you need to know. Instead, I will cut to the chase and talk about the bare minimum you need to know to build an effective robot arm. Enjoy!

To get you started, here is a video of a robot arm assignment I had when I took Robotic Manipulation back in college. My group programmed it to type the current time into the keyboard . . . (lesson learned, don't crash robot arms into your keyboard at full speed while testing in front of your professor)

You might be also interested in a robot arm I built that can shuffle, cut, and deal playing cards.

Degrees of Freedom (DOF)

The degrees of freedom, or DOF, is a very important term to understand. Each degree of freedom is a joint on the arm, a place where it can bend or rotate or translate. You can typically identify the number of degrees of freedom by the number of actuators on the robot arm. Now this is very important - when building a robot arm you want as few degrees of freedom allowed for your application!!! Why? Because each degree requires a motor, often anencoder, and exponentially complicated algorithms and cost.

Denavit-Hartenberg (DH) Convention The Robot Arm Free Body Diagram (FBD)

The Denavit-Hartenberg (DH) Convention is the accepted method of drawing robot arms in FBD's. There are only two motions a joint could make: translate and rotate. There are only three axes this could happen on: x, y, and z (out of plane). Below I will show a few robot arms, and then draw a FBD next to it, to demonstrate the DOF relationships and symbols. Note that I did not count the DOF on the gripper (otherwise known as the end effector). The gripper is often complex with multiple DOF, so for simplicity it is treated as separate in basic robot arm design.

(49)

3 DOF Robot Arm, with a translation joint:

(50)

Notice between each DOF there is a linkage of some particular length. Sometimes a joint can have multiple DOF in the same location. An example would be the human shoulder. The shoulder actually has three coincident DOF. If you were to mathematically represent this, you would just say link length = 0.

Also note that a DOF has its limitations, known as the configuration space. Not all joints can swivel 360 degrees! A joint has some max angle restriction. For example, no human joint can rotate more than about 200 degrees. Limitations could be from wire wrapping, actuator capabilities, servo max angle, etc. It is a good idea to label each link length and joint max angle on the FBD.

(51)

(image credit: Roble.info)

Your robot arm can also be on a mobile base, adding additional DOF. If the wheeled robot can rotate, that is a rotation joint, if it can move forward, then that is a translational joint. This mobile manipulator robot is an example of a 1 DOF arm on a 2 DOF robot (3 DOF total).

Robot Workspace

The robot workspace (sometimes known as reachable space) is all places that the end effector (gripper) can reach. The workspace is dependent on the DOF angle/translation limitations, the arm link lengths, the angle at which something must be picked up at, etc. The workspace is highly dependent on the robot configuration.

Since there are many possible configurations for your robot arm, from now on we will only talk about the one shown below. I chose this 3 DOF configuration because it is simple, yet isnt limiting in ability.

(52)

Now lets assume that all joints rotate a maximum of 180 degrees, because most

servo motorscannot exceed that amount. To determine the workspace, trace all locations that the end effector can reach as in the image below.

Now rotating that by the base joint another 180 degrees to get 3D, we have this

workspace image. Remember that because it uses servos, all joints are limited to a max of 180 degrees. This creates a workspace of a shelled semi-sphere (its a shape because I said so).

(53)

If you change the link lengths you can get very different sizes of workspaces, but this would be the general shape. Any location outside of this space is a location the arm cant reach. If there are objects in the way of the arm, the workspace can get even more complicated.

Here are a few more robot workspace examples:

(54)

Cylindrical Robot Arm

Spherical Robot Arm

Scara Robot Arm

(55)

Mobile Manipulators

A moving robot with a robot arm is a sub-class of robotic arms. They work just like other robotic arms, but the DOF of the vehicle is added to the DOF of the arm. If say you have a differential driverobot (2 DOF) with a robot arm (5 DOF) attached (see yellow robot below), that would give the robot arm a total sum of 7 DOF. What do you think the workspace on this type of robot would be?

Force Calculations of Joints

This is where this tutorial starts getting heavy with math. Before even continuing, I strongly recommend you read the mechanical engineering tutorials forstatics and

dynamics. This will give you a fundamental understanding of moment arm calculations. The point of doing force calculations is for motor selection. You must make sure that the motor you choose can not only support the weight of the robot arm, but also what the robot arm will carry (the blue ball in the image below).

The first step is to label your FBD, with the robot arm stretched out to its maximum length.

(56)

Choose these parameters: o weight of each linkage o weight of each joint o weight of object to lift o length of each linkage

Next you do a moment arm calculation, multiplying downward force times the linkage lengths. This calculation must be done for each lifting actuator. This particular design has just two DOF that requires lifting, and the center of mass of each linkage is assumed to be Length/2.

Torque About Joint 1:

M1 = L1/2 * W1 + L1 * W4 + (L1 + L2/2) * W2 + (L1 + L3) * W3

Torque About Joint 2:

M2 = L2/2 * W2 + L3 * W3

As you can see, for each DOF you add the math gets more complicated, and the joint weights get heavier. You will also see that shorter arm lengths allow for smaller torque requirements.

Too lazy to calculate forces and torques yourself? Try my robot arm calculator to do the math for you.

Forward Kinematics

Forward kinematics is the method for determining the orientation and position of the end effector, given the joint angles and link lengths of the robot arm. To calculate forward kinematics, all you need is highschool trig and algebra.

(57)

For our robot arm example, here we calculate end effector location with given joint angles and link lengths. To make visualization easier for you, I drew blue triangles and labeled the angles.

Assume that the base is located at x=0 and y=0. The first step would be to locate x and y of each joint.

Joint 0 (with x and y at base equaling 0): x0 = 0

y0 = L0

Joint 1 (with x and y at J1 equaling 0): cos(psi) = x1/L1 => x1 = L1*cos(psi) sin(psi) = y1/L1 => y1 = L1*sin(psi) Joint 2 (with x and y at J2 equaling 0):

sin(theta) = x2/L2 => x2 = L2*sin(theta) cos(theta) = y2/L2 => y2 = L2*cos(theta)

End Effector Location (make sure your signs are correct): x0 + x1 + x2, or 0 + L1*cos(psi) + L2*sin(theta) y0 + y1 + y2, or L0 + L1*sin(psi) + L2*cos(theta) z equals alpha, in cylindrical coordinates

The angle of the end effector, in this example, is equal to theta + psi. Too lazy to calculate forward kinematics yourself?

Check out my

(58)

Inverse Kinematics

Inverse kinematics is the opposite of forward kinematics. This is when you have a desired end effector position, but need to know the joint angles required to achieve it. The robot sees a kitten and wants to grab it, what angles should each joint go to? Although way more useful than forward kinematics, this calculation is much more complicated too. As such, I will not show you how to derive the equation based on your robot arm

configuration.

Instead, I will just give you the equations for our specific robot design: psi = arccos((x^2 + y^2 - L1^2 - L2^2) / (2 * L1 * L2))

theta = arcsin((y * (L1 + L2 * c2) - x * L2 * s2) / (x^2 + y^2)) where c2 = (x^2 + y^2 - L1^2 - L2^2) / (2 * L1 * L2);

and s2 = sqrt(1 - c2^2);

So what makes inverse kinematics so hard? Well, other than the fact that it

involvesnon-linear simultaneous equations, there are other reasons too.

First, there is the very likely possibility of multiple, sometimes infinite, number of

solutions (as shown below). How would your arm choose which is optimal, based on

torques, previous arm position, gripping angle, etc.?

There is the possibility of zero solutions. Maybe the location is outside the workspace, or maybe the point within the workspace must be gripped at an impossible angle.

Singularities, a place of infinite acceleration, can blow up equations and/or leave motors

lagging behind (motors cant achieve infinite acceleration). And lastly, exponential equations take forever to calculate on a

microcontroller. No point in having advanced equations on a processor that cant keep up. Too lazy to calculate inverse kinematics yourself?

(59)

Motion Planning

Motion planning on a robot arm is fairly complex so I will just give you the basics.

Suppose your robot arm has objects within its workspace, how does the arm move through the workspace to reach a certain point? To do this, assume your robot arm is just a simple mobile robot navigating in 3D space. The end effector will traverse the space just like a mobile robot, except now it must also make sure the other joints and links do not collide with anything too. This is extremely difficult to do . . .

What if you want your robot end effector to draw straight lines with a pencil? Getting it to go from point A to point B in a straight line is relatively simple to solve. What your robot should do, by using inverse kinematics, is go to many points between point A and point B. The final motion will come out as a smooth straight line. You can not only do this method with straight lines, but curved ones too. On expensive professional robotic arms all you need to do is program two points, and tell the robot how to go between the two points (straight line, fast as possible, etc.). For further reading, you could use the wavefront algorithm to plan this two point trajectory.

Velocity (and more Motion Planning)

Calculating end effector velocity is mathematically complex, so I will go only into the basics. The simplest way to do it is assume your robot arm (held straight out) is a rotating wheel of L diameter. The joint rotates at Y rpm, so therefore the velocity is

Velocity of end effector on straight arm = 2 * pi * radius * rpm

However the end effector does not just rotate about the base, but can go in many directions. The end effector can follow a straight line, or curve, etc.

With robot arms, the quickest way between two points is often not a straight line. If two joints have two different motors, or carry different loads, then max velocity can vary between them. When you tell the end effector to go from one point to the next, you have

(60)

two decisions. Have it follow a straight line between both points, or tell all the joints to go as fast as possible - leaving the end effector to possibly swing wildly between those points.

In the image below the end effector of the robot arm is moving from the blue point to the red point. In the top example, the end effector travels a straight line. This is the only possible motion this arm can perform to travel a straight line. In the bottom example, the arm is told to get to the red point as fast as possible. Given many different trajectories, the arm goes the method that allows the joints to rotate the fastest.

Which method is better? There are many deciding factors. Usually you want straight lines when the object the arm moves is really heavy, as it requires the momentum change for movement (momentum = mass * velocity). But for maximum speed (perhaps the arm isn't carrying anything, or just light objects) you would want maximum joint speeds. Now suppose you want your robot arm to operate at a certain rotational velocity, how much torque would a joint need? First, lets go back to our FBD:

Now lets suppose you want joint J0 to rotate 180 degrees in under 2 seconds, what torque does the J0 motor need? Well, J0 is not affected by gravity, so all we need to consider is momentum and inertia. Putting this in equation form we get this:

(61)

torque = moment_of_inertia * angular_acceleration breaking that equation into sub components we get:

torque = (mass * distance^2) * (change_in_angular_velocity / change_in_time) and

change_in_angular_velocity = (angular_velocity1)-(angular_velocity0) angular_velocity = change_in_angle / change_in_time

Now assuming at start time 0 that angular_velocity0 is zero, we get torque = (mass * distance^2) * (angular_velocity / change_in_time)

where distance is defined as the distance from the rotation axis to the center of mass of the arm:

center of mass of the arm = distance = 1/2 * (arm_length) (use arm mass)

but you also need to account for the object your arm holds: center of mass of the object = distance = arm_length (use object mass)

So then calculate torque for both the arm and then again for the object, then add the two torques together for the total:

torque(of_object) + torque(of_arm) = torque(for_motor)

And of course, if J0 was additionally affected by gravity, add the

torque required to lift the armto the torque required to reach the velocity you need. To avoid doing this by hand, just use the robot arm calculator.

But it gets harder . . . the above equation is for rotational motion and not for straight line motions. Look up something called a Jacobian if you enjoy mathematical pain =P

Another Video!

In order to better understand robot arm dynamics, we had a robot arm bowling competition using the same DENSO 6DOF robot arms as in the clocks video. Each team programs an arm to do two tasks:

(62)

o Try to place all three of its pegs in the opponents' goal o Block opponent pegs from going in your own goal Enjoy! (notice the different arm trajectories)

Arm Sagging

Arm sagging is a common affliction of badly designed robot arms. This is when an arm is too long and heavy, bending when outwardly stretched. When designing your arm, make sure the arm is reinforced and lightweight. Do a finite element analysis to determine bending deflection/stress such as I did on my ERP robot:

Keep the heaviest components, such as motors, as close to the robot arm base as possible.It

might be a good idea for the middle arm joint to be chain/belt driven by a motor located at the base (to keep the heavy motor on the base and off the arm).

The sagging problem is even worse when the arm wobbles between stop-start motions. The solve this, implement a PID controller so as to slow the arm down before it makes a full stop.