Quite a while ago, I answered a question about the basic deadlock scenario on Stack Overflow. More recently, I got an interesting comment on it. The poster asked if it was possible to get a deadlock with a single lock and an I/O operation. My first gut reaction was “no, not really”, but it got me thinking. So let’s try to unroll the scenario and see if we can reason at least about my gut feeling.
Let’s quickly recap what the standard deadlock scenario is
You can get ’normal’ deadlock in different ways, but the idiomatic deadlock requires two threads and two locks. Let’s call the two locks ‘a’ and ‘b’. You get into the deadlock situation when the first thread acquires lock a and waits to acquire lock b, while the second thread manages aquire lock b and waits to acquire lock a. To express this in pseudocode, the situation looks something like this:
Global locks:... mutex a; mutex b; ...
First thread:... a.acquire(); // Waits for and manages to acquire lock a b.acquire(); // Now waits to acquire lock b ...
Second thread:... b.acquire(); // Waits and manages to acquire lock b a.acquire(); // Now waiting for lock a ...
The scenario above is ‘harmless’ in the sense that it will work in a lot of cases, until both threads hit the lock acquisition code in parallel. With the timing right, each thread manages to grab one lock end up waiting for the other lock until the end of the universe, or at least until the user kills the process. The part about the two threads waiting forever on each other is the defining characteristic of the deadlock. It is also why certain mechanisms for deadlock mitigation like attempted acquisition with a timeout work.
So how does I/O fit into this deadlock scenario?
I apologise if I’m putting word into the Stack Overflow commentator’s writing, but given the briefness of his question I have to make a few assumptions. For the sake of this blog post, let’s assume that the pseudocode in question looks something like this:
Global locks:... mutex a; ...
First thread:... a.acquire(); myReallyHugeFile.AppendGigabytesOfData(); a.release(); ...
Second thread:... a.acquire(); myReallyHugeFile.ReadMegabytesOfData(); a.release(); ...
Let’s also for the time being ignore that the pseudocode above really would benefit from a reader/writer lock. If we take a closer look at the code, we can reason that under pretty much all normal operation circumstances, we have what I would call a pseudo deadlock. The pseudo deadlock is caused by large I/O operations while the lock is held. That is especially true for the write operation. The reason that it is a pseudo deadlock is that the process will eventually recover from its apparent deadlock condition because one of the two threads will have its I/O operation complete and then release the lock. While the observable behaviour is similar to a proper deadlock, it doesn’t meet the permanence that a real deadlock suffers from.
The above scenario was what I had in mind when I told the commenter that I don’t think it was possible to have a deadlock with a single lock and an I/O operation. But the longer I stared at the pseudocode above, the more I had that nagging feeling that I could come up with a scenario where a simple programming error would lead to an actual deadlock. The scenario in question may not even require I/O, although all the examples I can come up with do include I/O.
Obviously the scenario for this type of deadlock requires a mistake on the part of the programmer. Given that most, if not all, deadlock scenarios are the result of a programming error, this is not exactly a stretch of imagination.
Let’s build a deadlock with a single lock and an I/O operation
To set the background for our scenario, let’s assume that the developer who wrote the code above tries to reduce the wait time of the second thread by chunking the writes. Let’s also assume that a) there are no exceptions and b) he or she made a small mistake when writing the code.
Global locks:mutex a;
First (writer) thread:... a.acquire(); while true { var nextChunk = GetNextChunkToWrite(); myReallyHugeFile.AppendSmallChunkForBetterMultithreading(nextChunk); } a.release(); ...
Let’s also assume that the second thread’s code didn’t change. The second thread will still try to acquire the mutex a and then read a larger chunk of the file.
The way I wrote the pseudocode above should make it very obvious why the two threads would deadlock. After all, the write loop is an infinite loop! It’s blindingly obvious! Why do I even bother mentioning such an obvious mistake? Sorry, I slipped into “Internet commenting god” mode here for a second…
Yes, the reason for the deadlock is pretty obvious if I write the code like this. But an infinite loop isn’t always that easy to spot. Let’s for a moment assume that the code was written to use some non-blocking I/O check to read from a socket and the loop would terminate as soon as there is no more data to be read. That looks like this:
a.acquire();
while DoWeHaveDataWaiting() {
var nextChunk = GetNextChunkToWrite();
myReallyHugeFile.AppendSmallChunkForBetterMultithreading(nextChunk);
}
a.release();
No big deal, right? Works as intended for a long time, until someone refactors the code to do something like this, which is obviously a lot better at showing the intent:
a.acquire();
while (var nextChunk = GetNextChunkToWrite()) {
myReallyHugeFile.AppendSmallChunkForBetterMultithreading(nextChunk);
}
a.release();
All it takes to create a deadlock is to make GetNextChunkToWrite() a blocking operation.
Of course, what the code should have read in the first place to avoid these issues would simply be:
while (var nextChunk = GetNextChunkToWrite()) {
a.acquire();
myReallyHugeFile.AppendSmallChunkForBetterMultithreading(nextChunk);
a.release();
}
As I mentioned, putting the lock into the wrong scope is an easy mistake to make. It’s especially easy when you’re refactoring code, but it’s also easy if you’re writing new code and shaping it as you write it. It’s simple human nature.
Do we really only have a single lock in this case?
Ah, young padwan, that’s the big question. I would say that we actually do have two locks in play here. It’s just that the second lock isn’t labelled as such. Instead, the second lock is the blocking I/O operation itself.
One could argue that this isn’t really a true deadlock scenario, although the observed behaviour is the same. In the canonical scenario I described in the first section, the true deadlock is caused by two threads trying to acquire two locks in different order. In the I/O case, the deadlock is caused by one operation accidentally not releasing the single lock and then blocking while it waits for another resource.
The “blocking while it waits for another resource” sounds awfully like a deadlock, doesn’t it?
Summary
I’m now of the opinion that the original answer I gave on Stack Overflow is incorrect. The reason for my change of mind is simply that a blocking I/O operation essentially functions as a second lock in the scenario I describe. In fact, that’s pretty much a colloquial definition of a mutex - block and wait for a resource to become available.