str2numconverts string values into numerical varies. The input can be either vector or matrix valued.
>> strvcat(’1’,’2’,’3’) ans = 1 2 3 >> str2num(strvcat(’1’,’2’,’3’)) ans = 1 2 3 >> str2num([’1 2 3’;’4 5 6’]) ans = 1 2 3 4 5 6
str2double
str2doubleconverts string values into numerical varies. Unlikestr2numit only operates only on scalars or cell arrays, and when used on a cell array, each cell must contain only a single string to convert.str2double offers better performance when it is applicable.
num2str
num2strconverts numerical values into strings. The input can be scalar, vector or matrix valued.
>> num2str([1;2;3]) ans = 1 2 3 >> num2str([1 2 3;4 5 6]) ans =
16.2 String Conversion 105
1 2 3 4 5 6
sscanf
sscanfcan be used to convert strings to text, and is by far the fastest method to convert large text blocks to numbers. The generic form ofsscanfis
sscanf(text,format)
where text is a numeric character string and format contains information about the format of the values in text. sscanfoperates column-by-column so that lines must be stored in columns (or if stored in rows, the input can be transposed). The space character is used to delimit the end of an entry and so it is es- sential that the input string must be padded by a space.2 The format string can handle a wide variety of cases, although the most important are%d, which converts a string to a base-10 (32-bit) integer, and %f, which converts a string to a floating point. Consider the following example which generates 10,000 random numeric strings usingrandiand then parses the text usingsscanf,str2numandstr2double.
>> text = char(47+randi(10,10000,6)); % Random numeric string
>> text = [text repmat(’ ’,10000,1)]; % Pad with space
>> tic; numericValues = sscanf(text’,’%d’); toc
Elapsed time is 0.005850 seconds.
>> tic; numericValues = str2num(text); toc
Elapsed time is 0.234914 seconds.
>> tic; for i=1:10000; numericValues(i) = str2double(text(i,:)); end; toc
Elapsed time is 0.597951 seconds.
sscanfis about 100 times faster thanstr2numandstr2double. Format strings can include multiple ele- ments in which case the formats are sequentially applied until the end of the text string is reached.
>> text = num2str([pi floor(exp(1)) (1+sqrt(5))/2]) text = 3.1416 2 1.618 >> sscanf(text’,’%f %d %f’) ans = 3.1416 2 1.618
Note thatsscanfterminate without an error when an unexpected string is encountered.
>> text = [num2str([pi floor(exp(1))]) ’ A ’ num2str((1+sqrt(5))/2)] text =
3.1416 2 A 1.618 >> sscanf(text’,’%f’) ans =
2
Technically,sscanfoperates ontext(:)(which is a single column vector constructed by stacking the input text). This is why it is essential that lines are padded by a space.
106 String Manipulation
3.1416 2
In the example above,sscanfstops when it encounters the A and returns the first two values. It is impor- tant to verify that the strings contain only the expected data (e..g. only numeric types, including .) prior to the command.
fprintf
fprintfallows formatted text to be output to the screen or to files.
16.3
Exercises
1. Load the file hardtoparsetext.mat and inspect the variable string_data. The data in this file are ; delimited and contain stock name, date of observation, shares out standing, and price. Write a pro- gram that will loop over the rows and parse the data into four variables: ticker,date,sharesand
price. Note: Ticker should be a string, date should be a MATLAB serial data, and shares outstanding and price should be numerical. For values of ’N/A’, use NaN. For help converting the dates to serial dates, see chapter15.
Chapter 17
Structures and Cell Arrays
Structures and cell arrays are advanced data storage formats that often provide useful scaffolding for work- ing with mixed (i.e. string and numeric) or structured data.
17.1
Structures
Structures allow related pieces of data to be organized into a single variable. Structures are constructed using
variable_name.field_name
syntax where both variable_name and field_name must be valid variable names. One application of struc- tures is to organize data. Consider the case of working with data that comes in triples which correspond to x-, y- and z-data. One alternative would be to store the data as a 3 by 1 vector. Alternatively, a structure could be used with field names x, y and z to provide added guidance on what is expected.
>> coord.x = 0.5 coord = x: 0.5000 >> coord.y = -1 coord = x: 0.5000 y: -1 >> coord.z = 2 coord = x: 0.5000 y: -1 z: 2
Structures can also be used in arrays (array of structures), which can either be constructed using the com- mandstructor lazily initialized by concatenation. Continuing from the previous example,
>> coord(2).x = 3 coord =
1x2 struct array with fields: x
y z
108 Structures and Cell Arrays
>> coord(2).y = 2 coord =
1x2 struct array with fields: x
y z
>> coord(2).z = -1 coord =
1x2 struct array with fields: x
y z
The elements of the array of structures can be accessed like any other array with the caveat that the as- signment will itself be a structure.
>> newCoord = coord(1) newCoord =
x: 0.5000 y: -1 z: 2
Structures can also be used to store mixed data.
>> contact.phoneNumber = 441865281165 contact =
phoneNumber: 4.4187e+011 >> contact.name = ’Kevin Sheppard’ contact =
phoneNumber: 4.4187e+011 name: ’Kevin Sheppard’
17.1.1 The Problem with Structures
The fundamental problem with structures in MATLAB is that they are difficult to work with, and that op- erating on structures requires operating on the fields one-at-a-time. Structures are also difficult to preal- locate and so performance issues arise when used in large arrays. Structures are still commonly used (for example, inoptimset), although they have been supplanted by a more useful object, the cell array. It is tempting to use structures to push large collections of data, parameters and other values into and out of functions. This is generally a bad practice and should be avoided.