• No results found

String Buffers

In document Programming in Lua 3ed (Page 132-135)

Data Structures

11.6 String Buffers

11.6 String Buffers

Suppose you are building a string piecemeal, for instance reading a file line by line. Your typical code would look like this:

local buff = ""

for line in io.lines() do buff = buff .. line .. "\n"

end

Despite its innocent look, this code in Lua can cause a huge performance penalty for large files: for instance, it takes 1.5 minutes to read a 1 MB file on my old Pentium machine.1

Why is that? To understand what happens, let us assume that we are in the middle of the read loop; each line has 20 bytes and we have already read some 2500 lines, sobuff is a string with 50 kB. When Lua concatenates buff..line.."\n", it allocates a new string with 50020 bytes and copies the 50 000 bytes from buff into this new string. That is, for each new line, Lua moves around 50 kB of memory, and growing. More specifically, the algorithm is quadratic. After reading 100 new lines (only 2 kB), Lua has already moved more than 5 MB of memory. When Lua finishes reading 350 kB, it has moved around more than 50 GB. (This problem is not peculiar to Lua: other languages wherein strings are immutable values present a similar behavior, Java being the most famous example.)

Before we continue, we should remark that, despite all I said, this situation is not a common problem. For small strings, the above loop is fine. To read an entire file, Lua provides the io.read("*a") option, which reads the file at once. However, sometimes we must face this problem. Java offers the structure StringBuffer to ameliorate the problem. In Lua, we can use a table as the string buffer. The key to this approach is thetable.concat function, which returns the concatenation of all the strings of a given list. Usingconcat, we can write our previous loop as follows:

local t = {}

for line in io.lines() do t[#t + 1] = line .. "\n"

endlocal s = table.concat(t)

This algorithm takes less than 0.5 seconds to read the same file that took almost a minute to read with the original code. (Nevertheless, for reading a whole file it is still better to useio.read with the “*a” option.)

We can do even better. The concat function accepts an optional second argument, which is a separator to be inserted between the strings. Using this separator, we do not need to insert a newline after each line:

1My “old Pentium machine” is a single-core 32-bit Pentium 3GHz. I measured all performance data in this book on that machine.

local t = {}

for line in io.lines() do t[#t + 1] = line

ends = table.concat(t, "\n") .. "\n"

Functionconcat inserts the separator between the strings, but we still have to add the last newline. This last concatenation duplicates the resulting string, which can be quite long. There is no option to makeconcat insert this extra separator, but we can deceive it, inserting an extra empty string int:

t[#t + 1] = ""

s = table.concat(t, "\n")

The extra newline thatconcat adds before this empty string is at the end of the resulting string, as we wanted.

11.7 Graphs

Like any reasonable language, Lua allows multiple implementations for graphs, each one better adapted to some particular algorithms. Here we will see a simple object-oriented implementation, where we represent nodes as objects (actually tables, of course) and arcs as references between nodes.

We will represent each node as a table with two fields:name, with the node’s name; andadj, the set of nodes adjacent to this one. Because we will read the graph from a text file, we need a way to find a node given its name. So, we will use an extra table mapping names to nodes. Given a name, functionname2node returns the corresponding node:

local function name2node (graph, name) local node = graph[name]

if not node then

-- node does not exist; create a new one node = {name = name, adj = {}}

graph[name] = node endreturn node

end

Listing 11.1 shows the function that builds a graph. It reads a file where each line has two node names, meaning that there is an arc from the first node to the second. For each line, it usesstring.match to split the line in two names, finds the nodes corresponding to these names (creating the nodes if needed), and connects the nodes.

Listing 11.2 illustrates an algorithm using such graphs. Functionfindpath searches for a path between two nodes using a depth-first traversal. Its first parameter is the current node; the second is its goal; the third parameter keeps the path from the origin to the current node; the last parameter is a set with all

11.7 Graphs 115

Listing 11.1. Reading a graph from a file:

function readgraph () local graph = {}

for line in io.lines() do -- split line in two names

local namefrom, nameto = string.match(line, "(%S+)%s+(%S+)") -- find corresponding nodes

local from = name2node(graph, namefrom) local to = name2node(graph, nameto)

-- adds 'to' to the adjacent set of 'from' from.adj[to] = true

endreturn graph end

Listing 11.2. Finding a path between two nodes:

function findpath (curr, to, path, visited) path = path or {}

visited = visited or {}

if visited[curr] then -- node already visited?

return nil -- no path here

endvisited[curr] = true -- mark node as visited path[#path + 1] = curr -- add it to path if curr == to then -- final node?

return path

end-- try all adjacent nodes for node in pairs(curr.adj) do

local p = findpath(node, to, path, visited) if p then return p end

endpath[#path] = nil -- remove node from path end

the nodes already visited (to avoid loops). Note how the algorithm manipulates nodes directly, without using their names. For instance,visited is a set of nodes, not of node names. Similarly,path is a list of nodes.

To test this code, we add a function to print a path and some code to put it all to work:

function printpath (path) for i = 1, #path do

print(path[i].name) endend

g = readgraph() a = name2node(g, "a") b = name2node(g, "b") p = findpath(a, b)

if p then printpath(p) end

Exercises

Exercise 11.1: Modify the queue implementation so that both indices return to zero when the queue is empty.

Exercise 11.2: Repeat Exercise 10.3 but, instead of using length as the criteria

In document Programming in Lua 3ed (Page 132-135)