Samuel E. Buttrey Naval Postgraduate School Timothy H. Chung Naval Postgraduate School James N. Eagle
Naval Postgraduate School Duncan Temple Lang University of California, Davis
CONTENTS
4.1 Description . . . . 171
4.1.1 Computational Topics . . . . 172
4.2 The Data . . . . 173
4.2.1 Reading an Entire Log File . . . . 175
4.2.2 Exploring Log Files . . . . 179
4.2.3 Visualizing the Path . . . . 184
4.2.4 Exploring a “Look” . . . . 187
4.2.5 The Error Distribution for Range Values . . . . 190
4.3 Detecting a Circular Target . . . . 194
4.3.1 Connecting Segments Behind the Robot . . . . 198
4.3.2 Determining If a Segment Corresponds to a Circle . . . . 200
4.4 Detecting the Target with Streaming Data in Real Time . . . . 213
Bibliography . . . . 215
4.1 Description
In this case study, we explore robots searching for a circular target in a rectangular course that contains numerous obstacles (see Figure 4.1). The robots use a search strategy to move around the course, avoiding the obstacles and searching for the target in the shortest time possible. The robot continuously reports its location and also what it “sees” all around it.
It searches for the target and ends when it determines it has found it, or after 30 minutes of searching. The robot can detect objects up to a distance of 2 meters away. In this chapter, we focus on processing these location and sight records and developing a classifier to detect if the robot is “looking at” the target. We use a statistical approach to determine if the
171
shape the robot currently “sees” is consistent with the circular shape of the target (with known radius).
We look at log files for 100 different experiments (or runs), each log file containing the entire path information for that robot and its search for the target. The data include the location of the robot as it moves and what it “sees” at each of these positions. We explore the characteristics of each of these experiments, e.g., whether they found the target, how long the experiment lasted (up to the 30-minute time limit), how fast the robot moved, the locations of the obstacles, and the variability in the measurements. We develop the classifier for detecting the target and explore its operating characteristics, e.g. type I and type II error rates. We then discuss how to use the functionality to read lines in the log file to do classification from this streaming data in real time.
Figure 4.1: Example of the Course. This shows a sample path through the course. The robot starts in the lower left corner. The circular target can be seen at approximately (4.5, -6.5).
There are two rectangular obstacles and one triangular obstacle. The horizontal dimensions range from -15 to +15, and the vertical from -8 to +8.
4.1.1 Computational Topics
• Text processing of log files
• Visualization
• Non-linear least squares
• Numerical optimization
• Goodness-of-fit criteria
• Streaming data
4.2 The Data
We have numerous data files in the logs/ directory. We can find their names with the list.files() function:
ff <- list.files("logs", full.names = TRUE) We look at these file names and see
[1] "logs/01groundTruth.log"
[2] "logs/JRSPdata_2010_03_10_12_12_31.log"
[3] "logs/JRSPdata_2010_03_10_12_12_50.log"
....
[102] "logs/LASER"
[103] "logs/README"
The files corresponding to the experiments, that is, runs, start with JRSPdata and end with log. So we can specify this pattern to get only these file names with
ff <- list.files("logs", full.names = TRUE, pattern = "JRSPdata.*\\.log")
The pattern is a regular expression. This call returns a vector of 100 file names.
How large are these files? We can usefile.info() to get the size of each file:
info <- file.info(ff)
infocontains information about who created the files, who can modify them, etc. However, we are interested in the sizeelement of this data frame. This gives the number of bytes in each file. We can convert this to megabytes1 and look at the distribution of this with summary(info$size/1024^2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6832 5.4060 7.1640 8.7560 9.3240 31.0500
We see that the smallest file is less than a megabyte and many of the files are between 7 and 9 megabytes. The largest file is about 31 megabytes. We can plot the distribution of the file size, shown in Figure 4.2, with
plot(density(info$size/1024^2), xlab = "megabytes") and we can also compute the upper quantiles with
quantile(info$size/1024^2, seq(.9, 1, by = .01))
90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100%
15.68 16.84 23.92 25.19 25.41 26.19 27.25 29.88 29.93 30.48 31.05 So there are some reasonably large files. In total, there are sum(info$size)/1024^2, i.e., 875.6 megabytes of data. This isn’t enormous, but it is significant so that we have to consider processing it efficiently. This is especially true considering we want to develop code that can process many more log files and also we ultimately want to process the data in
0 10 20 30
0.00 0.05 0.10 0.15
megabytes
Density
Figure 4.2: Log File Size. This shows the distribution of the size of the 100 log files.
real-time, i.e., as the robot is delivering the data to us and needs to know if it has located the target.
The log files are text files. They are not in a simple rectangular format such as comma-separated values CSV. Instead, they are structured in a standardized format defined and used by the Player Project. This project develops and dis-tributes software for robot and sensor applications. The file format is documented at http://playerstage.sourceforge.net/doc/Player-svn/player/group_
_tutorial__datalog.html. The important idea is that each line contains a record, but there are different types of records. Each record type contains different information and has a different structure for the values it contains. This is why the data are non-rectangular, as each record type has a different number of values measuring different characteristics.
The first 12 lines of a particular log file (the fourth) are
## Player version 2.1.3
## File version 0.3.0
## Format:
## - Messages are newline-separated
## - Common header to each message is:
## time host robot interface index type subtype
## (double) (uint) (uint) (string) (uint) (uint) (uint)
## - Following the common header is the message payload
0000000000.100 16777343 6668 laser 00 004 001 +0.000 +0.000
0.000 0.156 0.155
0000000000.200 16777343 6668 position2d 00 004 001 -00.040
+00.000 +0.000 +00.440 +00.380
1A megabyte contains1024^2bytes.
0000000000.200 16777343 6668 position2d 00 001 001 -14.000
-07.000 +0.785 +00.000 +00.000 +00.000 0
0000000000.200 16777343 6668 laser 00 001 001 0001
-3.1416 +3.1416 +0.01740495 +2.0000 0361 1.838 0 1.807
0 1.778 0 1.749 0 1.723 0 1.697 0 1.673 0 1.650
(We have reformatted this to appear on the page. A line that starts with is actually a continuation of the previous line.)
Except for comment lines, which start with the pound sign (#) and can be ignored, every line of a log file starts with 7 common/shared fields of meta-data, separated by space characters. The names of these 7 fields are listed in the sixth line of the data file above.
Of these 7 fields, the first, fourth, and sixth are of interest for our purposes: these give, respectively, the time, the “interface” (which describes the purpose of the record), and the type of the message. A number of different kinds of message are possible, but for our purposes only two combinations of interface and type are important. Lines with interface position2d and type value 001 give the current position, orientation, and yaw of the robot (this last measuring the angle clockwise from East to the direction the robot’s head is facing, in radians). Lines with interface laser and type 001 give the measurements made during the robot’s data collection, which we will call a “look.” The laser line is associated with the previous position2d line and so these form a natural pair.
Data are collected every few seconds. There are two steps. During a look, the robot records its position and heading via a position2d record, and then looks all around itself, starting from the direction immediately behind it and continuing in one-degree increments in a laser record. The last look is, like the first, immediately behind the robot, so each data acquisition consists of 361 readings.
During a look, each reading produces a (Range, Intensity) pair. We ignore the Inten-sity for our purposes. The range gives the extent of the robot’s view, i.e., the distance to something it can detect. The robot’s vision is limited to 2 meters, so if there is no object visible within that distance, the observed value for range will be 2m. Otherwise, of course, the Range will be smaller.
Distance readings potentially contain measurement error, whether they refer to an actual object or whether they represent an observation of the 2m limit. These errors are small (on the order of a couple of centimeters), but in principle some Ranges could exceed 2m, and in some cases a measurement of a Range smaller than 2m will nonetheless be associated with the robot seeing no obstacle or target. We will want to familiarize ourselves with the distribution of times, locations, ranges, and the errors in the measurements.
4.2.1 Reading an Entire Log File
Typically, we have to explore the actual data files in order to empirically discover and understand their structure. We have to identify the patterns and anomalies. In this case, the documentation for these files is quite explicit. While the structure of different lines is not the same, the data are very structured. As a result, with our understanding of the format of the log files, we can set about reading them into R [2]. We will write a function to do this so that we can reuse it for all 100 log files (and potentially others). In creating this function, we should try to keep in mind that in Section 4.4, we will want to sequentially read individual lines and not an entire file. If possible, we should try to structure the code so that we do not have to have separate functions for the off-line and the on-line processing.
However, since there is almost one gigabyte of text to process, we also want the function to be reasonably efficient.
We cannot use any of the common functions such as read.csv()or read.table() to read