QuickCheck vs. SmallCheck

Published on ; updated on

QuickCheck and SmallCheck are the two classical libraries for property testing in Haskell. There are newer developments, like SmartCheck and Hedgehog, but I haven’t tried them much yet.

So here is my take on the comparative strengths and weaknesses of the current versions of QuickCheck and SmallCheck (2.11.3 and 1.1.4, respectively). Being the maintainer of SmallCheck, I am obviously somewhat biased towards it. But quite often I would prefer QuickCheck to SmallCheck for a specific task. It also sometimes happens that I think QuickCheck would work better in principle for a given task, but in practice SmallCheck is more convenient for that task — usually it has to do with generic instances or failure reporting. So here are the considerations I take into account when choosing between the two.

If you do not know much about QuickCheck and/or SmallCheck, the main difference between them is that QuickCheck generates random test cases, whereas SmallCheck enumerates test cases exhaustively up to some depth. Some (though not all) of the pros and cons come from this fundamental difference in paradigms.

Advantages of QuickCheck

Control over the number of tests/execution time

With SmallCheck, the number of tests is usually exponential in their depth. Say, when enumerating trees (from Data.Tree), depth 5 gives you 189 tests in less then 0.01s, while depth 6 gives 6479 tests in 4s. If you only have the patience to run 1000 tests, you have to settle for 189. QuickCheck, on the other hand, allows you to set the exact number of tests independently from their depth (which is called size in QuickCheck).

Update (2018-07-05). In smallcheck-1.1.5, I added a function

limit :: Monad m => Int -> Series m a -> Series m a

which acts similarly to take for lists. It is still not as convenient as QuickCheck, because you have to apply it to each “big” generator as opposed to setting a single global limit on the number of tests, but you can use it to fight the exponential blowup.

Floating-point numbers

SmallCheck has instances for floating-point numbers, e.g.

> listSeries 5 :: [Positive Float]
[1.0,2.0,0.5,4.0,0.25,8.0,0.125,16.0,3.0,6.25e-2,6.0,1.5,12.0,0.75,24.0,0.375,48.0,0.1875]

These are probably enough if you need to fill some fractional field in a data type, but the logic of your functions doesn’t depend on the number in a complex way.

But if you are testing any truly numerical code (say, matrix operations or physics simulation), it is much more useful to have QuickCheck generate random floating-point numbers.

Advantages of SmallCheck

Generic deriving

If you have a couple of types that rarely change, writing the instances by hand may not a big deal.

Often, especially when writing tests for commercial code, there is a big number of nested algebraic data types. And even if you have the patience to write the instances, they will become incomplete the next time a constructor is added to one of the types (and the compiler won’t warn you).

So it is essential to be able to derive most of the instances. SmallCheck has had generic instances since 2011, thanks to the contribution from Bas van Dijk. QuickCheck still does not have them (github issue).

There is a separate package, generic-arbitrary, but it doesn’t seem to support recursive data types. The recursive types also seem to be the reason why QuickCheck itself doesn’t have the generic implementation.

The solution to the recursive type problem is straightforward: reduce the depth (size) every time you descend into a nested type. As an example, here’s an instance for trees from Data.Tree:

instance Arbitrary a => Arbitrary (Tree a) where
  arbitrary = scale (\n -> max (n-1) 0) $
    Node <$> arbitrary <*> arbitrary

This approach is not ideal (your size may get down to 0 too quickly), but IMO it is better than no generic instances at all.

Update (2018-07-05). There are some third-party packages that implement generic deriving for QuickCheck and support recursive types, in particular testing-feat and generic-random.

testing-feat looks very cool and powerful from a theoretical perspective, and I’d like to study it in depth some day. In practice it involves definitions like

deriveEnumerable ''MyType
instance Arbitrary MyType
  where
    arbitrary = sized uniform
    shrink = genericShrink

Sometimes you need to define an Enumerable instance for an abstract or primitive type, which I ended up doing like this (for Text):

instance Enumerable T.Text where
  enumerate = share (c1 T.pack)

generic-random is a bit easier to use, since there’s no additional Enumerable class. The declaration looks like this:

instance Arbitrary MyType
  where
    arbitrary = genericArbitraryU'
    shrink = genericShrink

In both cases, genericShrink comes from the QuickCheck package itself, but it’s not the default implementation for shrink; the default is shrink _ = [].

Failure reporting

Say we have a property

prop x = reverse (reverse x) == x

By default, both SmallCheck and QuickCheck only tell you the generated values that led to the failure (i.e. x). But I would also like to know why the test failed — in this case, what reverse (reverse x) was.

There is a StackOverflow question about this with two answers, neither of which is perfect.

Edward Z. Yang says:

Because QuickCheck gives you the inputs to the function, and because the code under test is pure (it is, right?), you can just feed those inputs to the function and get the result. This is more flexible, because with those inputs you can also repeatedly test with tweaks to the original function until it’s correct.

There are many reasons why this may not work:

  1. The code may not be pure by design — both SmallCheck and QuickCheck are capable of testing IO code.
  2. The code may be intended to be pure, but the very bug you are trying to fix may lead to it being impure/non-deterministic.
  3. Copy-pasting values may not be very convenient: think large matrices.
  4. Also, strings produced by Show do not always map back to the original values effortlessly. E.g. Maps, Sets, and Vectors all result in fromList [...] when shown. If you have several of them in the same structure, you’ll have to disambiguate those fromLists.

The other answer, by Paul Johnson, gives a working solution (using MkResult), but it is rather unwieldy, and you have to include the parameter values manually if you need them.

Update (2018-07-05). I’ve since found and posted a much better way to report a failure in QuickCheck:

import Test.QuickCheck.Property (succeeded, failed, reason)

prop a b =
  if res /= []
    then succeeded
    else failed { reason = reason }
   where
      (res, reason) = checkCode a b

In SmallCheck, you can return Either String String instead of Bool, where the Strings give explanation for failure and success, respectively. For example, this property

prop_reverse :: [Int] -> Bool
prop_reverse x = reverse x == x

results in the message

there exists [0,1] such that
  condition is false

But we can change the property to explain itself

prop_reverse :: [Int] -> Either String String
prop_reverse x =
  if reverse x == x
    then Right "Match"
    else Left $ "reverse x == " ++ show (reverse x)

which results in

there exists [0,1] such that
  reverse x = [1,0]

Better than shrinking

QuickCheck tends to produce complex counterexamples and then tries to shrink them to find simpler ones. While shrinking does lead to somewhat simpler examples, they are usually not minimal. Also, it’s on you to implement the shrinking function for your own types (although there is genericShrink).

Because SmallCheck enumerates the values in order, from smaller to bigger, it automatically produces the minimal example.

Determinism and exhaustiveness

Because QuickCheck generates its examples randomly, the set of test cases your property is tested on is rather arbitrary. (It also changes from run to run by default, though this can be fixed by suppplying a fixed seed.)

Sometimes that’s good: if you have a high- or infinite-dimensional space of possibilities, all you can hope for is to test on a few diverse samples from that space. But other times, that space is narrow, and it’s better to test deterministically and exhaustively.