Data Structures
11.6 String Buffers
11.6 String Buffers
Suppose you are building a string piecemeal, for instance reading a file line by line. Your typical code would look like this:
local buff = ""
for line in io.lines() do buff = buff .. line .. "\n"
end
Despite its innocent look, this code in Lua can cause a huge performance penalty for large files: for instance, it takes 1.5 minutes to read a 1 MB file on my old Pentium machine.1
Why is that? To understand what happens, let us assume that we are in the middle of the read loop; each line has 20 bytes and we have already read some 2500 lines, sobuff is a string with 50 kB. When Lua concatenates buff..line.."\n", it allocates a new string with 50020 bytes and copies the 50 000 bytes from buff into this new string. That is, for each new line, Lua moves around 50 kB of memory, and growing. More specifically, the algorithm is quadratic. After reading 100 new lines (only 2 kB), Lua has already moved more than 5 MB of memory. When Lua finishes reading 350 kB, it has moved around more than 50 GB. (This problem is not peculiar to Lua: other languages wherein strings are immutable values present a similar behavior, Java being the most famous example.)
Before we continue, we should remark that, despite all I said, this situation is not a common problem. For small strings, the above loop is fine. To read an entire file, Lua provides the io.read("*a") option, which reads the file at once. However, sometimes we must face this problem. Java offers the structure StringBuffer to ameliorate the problem. In Lua, we can use a table as the string buffer. The key to this approach is thetable.concat function, which returns the concatenation of all the strings of a given list. Usingconcat, we can write our previous loop as follows:
local t = {}
for line in io.lines() do t[#t + 1] = line .. "\n"
endlocal s = table.concat(t)
This algorithm takes less than 0.5 seconds to read the same file that took almost a minute to read with the original code. (Nevertheless, for reading a whole file it is still better to useio.read with the “*a” option.)
We can do even better. The concat function accepts an optional second argument, which is a separator to be inserted between the strings. Using this separator, we do not need to insert a newline after each line:
1My “old Pentium machine” is a single-core 32-bit Pentium 3GHz. I measured all performance data in this book on that machine.
local t = {}
for line in io.lines() do t[#t + 1] = line
ends = table.concat(t, "\n") .. "\n"
Functionconcat inserts the separator between the strings, but we still have to add the last newline. This last concatenation duplicates the resulting string, which can be quite long. There is no option to makeconcat insert this extra separator, but we can deceive it, inserting an extra empty string int:
t[#t + 1] = ""
s = table.concat(t, "\n")
The extra newline thatconcat adds before this empty string is at the end of the resulting string, as we wanted.
11.7 Graphs
Like any reasonable language, Lua allows multiple implementations for graphs, each one better adapted to some particular algorithms. Here we will see a simple object-oriented implementation, where we represent nodes as objects (actually tables, of course) and arcs as references between nodes.
We will represent each node as a table with two fields:name, with the node’s name; andadj, the set of nodes adjacent to this one. Because we will read the graph from a text file, we need a way to find a node given its name. So, we will use an extra table mapping names to nodes. Given a name, functionname2node returns the corresponding node:
local function name2node (graph, name) local node = graph[name]
if not node then
-- node does not exist; create a new one node = {name = name, adj = {}}
graph[name] = node endreturn node
end
Listing 11.1 shows the function that builds a graph. It reads a file where each line has two node names, meaning that there is an arc from the first node to the second. For each line, it usesstring.match to split the line in two names, finds the nodes corresponding to these names (creating the nodes if needed), and connects the nodes.
Listing 11.2 illustrates an algorithm using such graphs. Functionfindpath searches for a path between two nodes using a depth-first traversal. Its first parameter is the current node; the second is its goal; the third parameter keeps the path from the origin to the current node; the last parameter is a set with all
11.7 Graphs 115
Listing 11.1. Reading a graph from a file:
function readgraph () local graph = {}
for line in io.lines() do -- split line in two names
local namefrom, nameto = string.match(line, "(%S+)%s+(%S+)") -- find corresponding nodes
local from = name2node(graph, namefrom) local to = name2node(graph, nameto)
-- adds 'to' to the adjacent set of 'from' from.adj[to] = true
endreturn graph end
Listing 11.2. Finding a path between two nodes:
function findpath (curr, to, path, visited) path = path or {}
visited = visited or {}
if visited[curr] then -- node already visited?
return nil -- no path here
endvisited[curr] = true -- mark node as visited path[#path + 1] = curr -- add it to path if curr == to then -- final node?
return path
end-- try all adjacent nodes for node in pairs(curr.adj) do
local p = findpath(node, to, path, visited) if p then return p end
endpath[#path] = nil -- remove node from path end
the nodes already visited (to avoid loops). Note how the algorithm manipulates nodes directly, without using their names. For instance,visited is a set of nodes, not of node names. Similarly,path is a list of nodes.
To test this code, we add a function to print a path and some code to put it all to work:
function printpath (path) for i = 1, #path do
print(path[i].name) endend
g = readgraph() a = name2node(g, "a") b = name2node(g, "b") p = findpath(a, b)
if p then printpath(p) end
Exercises
Exercise 11.1: Modify the queue implementation so that both indices return to zero when the queue is empty.
Exercise 11.2: Repeat Exercise 10.3 but, instead of using length as the criteria