Standard C++ is silent on the subject of threads. Unlike Java, C++ has no built-in support for threads, and does not attempt to address thread-safety issues through the language. So why an Item on thread- and concurrency-related issues? Simply because more and more of us are writing multithreaded (MT) programs, and no discussion of copy-on-write
String
implementations is complete if it does not cover thread-safety issues. Good C++ threading classes and frameworks have existed nearly as long as C++ itself. Today, they include facilities such as those in ACE,[2] the omnithread library available aspart of omniORB,[3] and many commercial libraries. Chances are that today you're already using one
of these libraries, or using your own in-house wrapper around your operating system's native services.
[2] See http://www.gotw.ca/publications/mxc++/ace.htm.
[3] See http://www.gotw.ca/publications/mxc++/omniorb.htm.
In conjunction with this Item, see also Appendix : Optimizations That Aren't (In a Multithreaded World). That appendix overlaps with and builds upon the results we'll see in this Item. It demonstrates not only the effects that mistaken optimizations can have on you and your production code, but also how to detect them and protect yourself against them. This Item presents a summary of results for which more details are shown in Appendix : Test Results for Single-Threaded Versus Multithread-Safe String Implementations.
1. Why is
Optimized::String
(from Item 15) not thread-safe? Give examples.It is the responsibility of code that uses an object to ensure that access to the object is serialized as needed. For example, if a certain
String
object could be modified by two different threads, it's not the poorString
object's job to defend itself from abuse. It's the job of the code that uses theString
to make sure that two threads never try to modify the sameString
object at the same time.The problem with the code in Item 15 is twofold. First, the copy-on-write (COW) implementation is designed to hide the fact that two different visible
String
objects could be sharing a common hidden state. Hence, it's theString
class's responsibility to ensure that the calling code never modifies aString
whose representation is shared. TheString
code shown already does precisely that, by performing a deep copy (unsharing the representation), if necessary, the moment a caller tries to modify theString
. This part is fine in general.Unfortunately, it now brings us to the second problem, which is that the code in
String
that unshares the representation isn't thread-safe. Imagine twoString
objectss1
ands2
such that:a.
s1
ands2
happen to share the same representation under the covers (okay, becauseString
is designed for this);b. Thread 1 tries to modify
s1
(okay, because thread 1 knows that no one else is trying to modifys1
);c. Thread 2 tries to modify
s2
(okay, because thread 2 knows that no one else is trying to modifys2
);d. At the same time (error)
The problem is (d). At the same time, both
s1
ands2
will attempt to unshare their sharedrepresentation, and the code to do that is not thread-safe. Specifically, consider the very first line of code in
String::AboutToModify()
:void String::AboutToModify(
size_t n,
bool markUnshareable /* = false */
)
{
if( data_->refs > 1 && data_->refs != Unshareable )
{
/* ... etc. ... */
This
if
-condition is not thread-safe. For one thing, evaluating evendata_->refs > 1
may not be atomic. If so, it's possible that if thread 1 tries to evaluatedata_->refs > 1
while thread 2 is updating the value ofrefs
, the value read fromdata_->refs
might be anything—1, 2, or even something that's neither the original value nor the new value. The problem is thatString
isn't following the basic thread-safety requirement that code that uses an object must ensure that access to the object is serialized as needed. Here,String
has to ensure that no two threads are going to use the samerefs
value in a dangerous way at the same time. The traditional way to accomplish this is to serialize access to the sharedStringBuf
(or just itsrefs
member) using a mutex orsemaphore. In this case, at minimum the "comparing an
int
" operation must be guaranteed to be atomic.This brings us to the second issue: Even if reading and updating
refs
were atomic, and even if we knew we were running on a single-CPU machine so that the threads were interleaved rather than truly concurrent, there are still two parts to theif
condition. The problem is that the thread executing the above code could be interrupted after it evaluates the first part of the condition, but before it evaluates the second part. In our example:Thread 1 Thread 2
enter
s1
'sAboutToModify()
evaluate "data_-
>refs > 1
" (true
, becausedata_->refs
is2
)
context switch
enter
s2
'sAboutToModify()
(runs to completion, including decrementing
data_->refs
to the value1
) exits2
'sAboutToModify()
context switch evaluate "data_->refs != Unshareable" (
true
, becausedata_->refs
is now1
) entersAboutToModify()
's "I'm shared and need tounshare" block, which clones the representation, decrements
data_->refs
to the value0
, and gives up the last pointer to theStringBuf
. Poof, we have a memory leak, because theStringBuf
that had been shared bys1
ands2
can now never be deleted exits1
'sAboutToModify()
Having covered that, we're ready to see how to solve these safety problems.
Protecting COW Strings
In all that follows, remember that the goal is not to protect
Optimized::String
against every possible abuse. After all, the code using theString
object is still responsible for knowing whether any given visibleString
object could be used by different threads and doing the properconcurrency control if it is—that's just normal. The problem here is that because of the extra sharing going on under the covers, the calling code can't just perform its normal duty of care because it doesn't know which
String
objects really are completely distinct and which only look distinct but are actually invisibly joined. The goal, then, is simply forOptimized::String
to get back up to the level at which it's safe enough so that other code that usesString
can assume that distinctString
objects really are distinct, and therefore do the usual proper concurrency control as needed.2. Demonstrate how to make
Optimized::String
thread-safe:a) assuming that there are atomic operations to get, set, and compare integer values; and
b) assuming that there aren't.
I'm going to answer (b) first because it's more general. What's needed here is a lock-management device such as a mutex.[4] The key to what we're about to do is quite simple. It turns out that if we do
things right, we need only lock access to the reference count itself.
[4] If you're using Win32, what's called a "critical section" (a slight hijacking of the term) is a lot more efficient
than a mutex, and should be used unless you really need the Win32 mutex's heavyweight facilities.
Before doing anything else, we need to add a