Introduction to golden testing

Published on December 4, 2017

Golden tests are like unit tests, except the expected output is stored in a separate file. I learned about them in 2010 from Max Grigorev at ZuriHac.

Let’s say you want to test Python’s json module. One way to do that would be to encode an object and compare the result to a reference string:

import json

assert(json.dumps([1,2,3]) == "[1, 2, 3]")

Alternatively, you could create a file with contents

[1, 2, 3]

and read it to know the expected output:

import json

with open("example1.json", "r") as ex1_file:
    ex1 = ex1_file.read().rstrip()
    assert(json.dumps([1,2,3]) == ex1)

The file example1.json is called a golden file.

Here are some advantages of golden tests over ordinary unit tests:

If the expected output is large in size, it may be impractical to put it inside the source code.
No need to escape quotes or binary data in the expected output.
When you add a new test, your testing framework can generate the missing golden file from the current output of the function.

It is best if you can write down the expected output without looking at the actual output, but it is not always possible. The output may be too big to type it character by character, or it may be hard to predict. For instance, in the json example, you couldn’t tell in advance whether there would be spaces between array elements or not. So often what you do is launch an interactive interpreter (if your language of choice even has one), run the function, and then copy-paste its output into the test code.

This process can be easily automated if you use golden files.
The expected output can be automatically updated.

Say you changed your json module to replace some of the spaces with newlines to make the output more aesthetically pleasing. You have 40 test cases that need updating. Can you imagine doing this by hand?

With golden tests, you can tell your test framework to update all golden files from the current outputs, then check git diff to ensure that all changes are valid, and commit them.
If some of your tests suddently started failing, you can use diff or other such tools to compare the golden file to the actual file and figure out what exactly changed. Perhaps your testing framework could even show the diff automatically on test failure?

While advantages 1-2 are automatic, 3-5 require special support from your testing framework. The rest of this article will be focused on a Haskell testing framework tasty and its add-on package for golden tests, tasty-golden.

Basic usage

To illustrate how tasty-golden works, consider this yaml-to-json conversion module:

{-# LANGUAGE TypeApplications #-}
module YamlToJson where

import qualified Data.Yaml as Y
import Data.Aeson as J
import qualified Data.ByteString.Lazy as LBS

yamlToJson :: LBS.ByteString -> LBS.ByteString
yamlToJson = J.encode . Y.decode @Value . LBS.toStrict

Because JSON contains quotes and YAML spans multiple lines, it is not very practical to store them as string literals in the source code file. Instead, you will keep them both in files.

Note that the name “golden file” only refers to the file containing the output, not the input. There is no requirement that the input is stored in a file or that there even is any “input” at all; but in practice it is often convenient to store them both in files so that there is an input file for every output file and vice versa.

import Test.Tasty (defaultMain, TestTree, testGroup)
import Test.Tasty.Golden (goldenVsString, findByExtension)
import qualified Data.ByteString.Lazy as LBS
import YamlToJson (yamlToJson)
import System.FilePath (takeBaseName, replaceExtension)

main :: IO ()
main = defaultMain =<< goldenTests

goldenTests :: IO TestTree
goldenTests = do
  yamlFiles <- findByExtension [".yaml"] "."
  return $ testGroup "YamlToJson golden tests"
    [ goldenVsString
        (takeBaseName yamlFile) -- test name
        jsonFile -- golden file path
        (yamlToJson <$> LBS.readFile yamlFile) -- action whose result is tested
    | yamlFile <- yamlFiles
    , let jsonFile = replaceExtension yamlFile ".json"
    ]

This is all the code you need to support one, two, or a thousand test cases. When run, this code will:

find all .yaml files in the current directory
for each .yaml file, construct a golden test that evaluates yamlToJson on the input read from file and compares the result to the golden file, which has the name and the .json extension
put all individual tests in a test group and pass it to defaultMain for execution

To see how this works in practice, create an input file, fruits.yaml, with the following contents:

- orange
- apple
- banana

Now run your test suite (note: in a proper cabalized project, you’d run cabal test or stack test instead):

% stack runghc test.hs
YamlToJson golden tests
  fruits: OK
    Golden file did not exist; created

All 1 tests passed (0.00s)

tasty-golden realized that this is a new test case because the golden file was absent, so it went ahead and initialized the golden file based on the function’s output. You can now examine the file to see if it makes sense:

% cat fruits.json
["orange","apple","banana"]

If you are happy with it, check in both input and output files to git. This is important so that your collaborators can run the tests, but it also helps when dealing with failing tests, as you’ll see next.

% git add fruits.yaml fruits.json && git commit -m "fruits test case"

Dealing with test failures

Occasionally, your tests will fail. A test that cannot fail is a useless test.

A golden test fails when the actual output does not match the contents of the golden file. You then need to figure out whether this is a bug or an intentional code change.

Let’s say you decide that the output of yamlToJson should end with a newline.

The new function definition is

yamlToJson = (<> "\n") . J.encode . Y.decode @Value . LBS.toStrict

Now run the test suite:

% stack runghc test.hs
YamlToJson golden tests
  fruits: FAIL
    Test output was different from './fruits.json'. It was: "[\"orange\",\"apple\",\"banana\"]\n"

1 out of 1 tests failed (0.00s)

Ok, this is not very helpful. There are two main ways to get better diagnostics. One is to use the goldenVsStringDiff function as an alternative to goldenVsString. This will include the diff right in the tasty output.

But my preferred workflow is to use git for this. First, rerun the tests and pass the --accept option. This will update the golden files with the new output:

% stack runghc -- test.hs --accept
YamlToJson golden tests
  fruits: OK
    Accepted the new version

All 1 tests passed (0.00s)

Now, because your golden file is tracked by git, you can examine the differences between the old and new golden files with git diff:

% git diff
diff --git fruits.json fruits.json
index c244c0a..ed447d4 100644
--- fruits.json
+++ fruits.json
@@ -1 +1 @@
-["orange","apple","banana"]
\ No newline at end of file
+["orange","apple","banana"]

Because this is the change you expected, you can now commit the updated file to git.

This workflow lets you use all the powerful git diff options like --color-words, or even launch a graphical diff tool like kdiff3 with git difftool.

Basic usage

Dealing with test failures

See also