Christian at the Carbon Five Community recently published an interesting blog article on Multithreaded Testing. Christian showed how to use Java 5 built-in concurrency features to create a nice, clean multithreaded test for an article dispatcher. It’s a good example of using the Java concurrency features, but I wondered about the effectiveness of the test.
The implementation of Article class was not thread safe. This was acceptable because of the assumption that the ORM library would be creating the Article instances in the requestor’s thread. However, those types of assumptions can be dangerous over the long term. Let’s pretend that the performance of the ArticleService was not good and a developer was given the task to increase its performance. To simulate this scenario, I created an implementation of the ArticleService that stored Article instances in memory (as if they were cached or if a framework like Terracotta were being used). The Article class should be thread safe for this implementation, but it wasn’t modified. We hope the test case will fail with the incorrect code. However, the test succeeded almost every time on my dual core workstation. Why was that?
The primary reason the test passed was that it was too fast. Christian warns us about the slow speed of his testing technique, but the technique is not slow. The original code being tested accessed a relational database so it was slow for that reason, not because of the testing technique. With my ArticleService implementation, the test executed in 15-30 milliseconds. My theory is that the short execution time results is what caused the test to pass almost every time. The Hotspot JIT is probably not able to compile and reorder instructions and it’s possible the threads all ran on the same CPU core. When the test did fail it seemed to be when the machine was under a little more load and the execution time was 40 ms or greater. In another experiment, I increased the size of the Article collection to 250 to 2500 and the test failed every time. For a test like this, it might be useful to measure the execute time in the test and signal an error if it runs too fast. Of course, if we didn’t know the test effectiveness was a function of execution speed we’d never think of adding the timing check.
In general, we should be very skeptical about multithread test cases. Just because they pass doesn’t necessarily mean our code is correct. A nondeterministic test that passes 99% of the time will give us false confidence. How can we be sure that our test is worthy of our confidence? Other than advanced tools like ConTest or code reviews by engineers who are experts in the Java Memory Model and multithreading, I’m not sure. It’s a difficult problem with no easy answers.