The string data type in Object Pascal is way more sophisticated than a simple array of characters, and has features that go well beyond what most programming lan-guages do with similar data types. In this section I'll introduce the key concepts behind this data type, and in coming sections we'll explore some of these features in more details.
In the following bullet list I've captured the key concepts for understanding how strings work in the language (remember, you can use string without knowing much of this, as the internal behavior is very transparent):
• Data for the string type is dynamically allocated on the heap. A string vari-able is just a reference to the actual data. Not that you have to worry much about this, as the compiler handles this transparently. Like for a dynamic array, as you declare a new string, this is empty.
• While you can assign data to a string in many ways, you can also allocate a specific memory area calling the SetLength function. The parameter is the number of characters (of 2 bytes each), the string should be able to have. When you extend a string, the existing data is preserved (but it might be moved to a new physical memory location). When you reduce the size, some of the content will likely be lost. Setting the length of a string is seldom necessary. The only common case is when you need to pass a string buffer to an operating system function for the given platform.
• If you want to increase the size of a string in memory (by concatenating it with another string) but there is something else in the adjacent memory, then the string cannot grow in the same memory location, and a full copy of the string must therefore be made in another location.
• To clear a string you don't operate on the reference itself, but can simply set it to an empty string, that is ''. Or you can use the Empty constant, which corresponds to that value.
• According to the rules of Object Pascal, the length of a string (which you can obtain by calling Length) is the number of valid elements, not the number of allo-cated elements. Differently from C, which has the concept of a string terminator (#0), all versions of Pascal since the early days tend to favor the use of a specific memory area (part of the string) where the actual length information is stored. At times, however, you'll find strings that also have the terminator.
• Object Pascal strings use a reference-counting mechanism, which keeps track of how many string variables are referring to a given string in memory. Reference
Marco Cantù, Object Pascal Handbook
counting will free the memory when a string isn't used anymore—that is, when there are no more string variables referring to the data... and the reference count reaches zero.
• Strings use a copy-on-write technique, which is highly efficient. When you assign a string to another or pass one to a string parameter, no data is copied and the reference count in increased. However, if you do change the content of one of the references, the system will first make a copy and then alter only that copy, with the other references remaining unchanged.
• The use of string concatenation for adding content to an existing string is generally very fast and has no significant drawback. While there are alternative approaches, concatenating strings is fast and powerful. This is not true for many programming languages these days.
Now I can guess this description can be a little confusing, so let's look at the use of strings in practice. In a while I'll get to a demo showcasing some of the operations above, including reference counting and copy-on-write. Before we do so, however, let me get back to the string helper operations and some other fundamental RTL functions for strings management.
Before we proceed further, let me examine some of the elements of the previous list in terms of actual code. Given string operations are quite seamless it is difficult to fully grasp what happens, unless you start looking inside the strings memory struc-ture, which I'll do later in this chapter, as it would be too advanced for now. So let's start with some simple string operations, extracted from the Strings101 application project:
var
String1, String2: string;
begin
This first snippet, when executed, shows that if you assign two strings to the same content, modifying one won't affect the other. That is, String1 is not affected by the changes to String2:
1: hello world 2: hello world 1: hello world
2: hello world, again
Marco Cantù, Object Pascal Handbook
Still, as we'll figure out better in a later demo, the initial assignment doesn't cause a full copy of the string, the copy is delayed (again, a feature called copy-on-write).
Another important feature to understand is how the length is managed. If you ask for the length of a string, you get the actual value (which is stored in the string meta-data, making the operation very fast). But if you call SetLength, you are allocating memory, which most often will be not initialized. This is generally used when pass-ing the strpass-ing as a buffer to an external system function. If you need a blank strpass-ing, instead, you can use the pseudo-constructor (Create). Finally, you can use
SetLength to trim a string. All of these are demonstrated by the following code:
var
string1: string;
begin
string1 := 'hello world';
Show(string1);
Show ('Length: ' + string1.Length.ToString);
SetLength (string1, 100);
Show(string1);
Show ('Length: ' + string1.Length.ToString);
string1 := 'hello world';
Show(string1);
Show ('Length: ' + string1.Length.ToString);
string1 := string1 + string.Create(' ', 100);
SetLength (string1, 100);
Show(string1);
Show ('Length: ' + string1.Length.ToString);
The output is more or less the following:
hello world
The third concept I want to underline in this section is that of an empty string. A string is empty when its content is an empty string. For both assignment and testing you can use two consecutive quotes, or specific functions:
var
Marco Cantù, Object Pascal Handbook
else
Show('Not empty');
string1 := ''; // or string1.Empty;
if string1.IsEmpty then Show('Empty')
else
Show('Not empty');
With this simple output:
Not empty Empty