New patterns in tasty

Published on January 8, 2018

When I wrote tasty in 2013, I borrowed the pattern language and its implementation from test-framework. I wasn’t fond of that pattern language, but it did the job most of the time, and the task of coming up with a better alternative was daunting.

Over the years, however, the pattern language and implementation received more feature requests and complaints than any other aspect of tasty.

Mikhail Glushenkov wanted to run all tests in a group except a selected few, e.g. --pattern 'test-group/**' --ignore 'test-group/test-3'.
Utku Demir wanted to specify a list of tests to run, such as “A and B, but not C and D”; Philipp Hausmann and Süleyman Özarslan requested something similar in the same issue.
Rob Stewart wanted a pattern that would match Bar but not FooBar.
Daniel Mendler reported that patterns with slashes didn’t work because slashes have special meaning in the patterns.
Levent Erkök, whose package sbv has more than 30k tasty tests, was concerned about the performance of pattern matching.
Allowing regular expressions in patterns would fulfill some (though not all) of these requests. However, using a regular expression library has its downsides. Carter Schonwald repeatedly complained to me that regex-tdfa takes ages to compile. Simon Jakobi noted that regex-tdfa brings along several transitive dependencies (including parsec), which further increases compilation time, makes the package more prone to breakages, and makes it harder for the maintainers of core packages to use tasty.

Every time someone filed an issue about tasty patterns, I would say something like “oh, this will be fixed once I get around to that issue from 2013 about the new patterns”. But I still had no idea what that new pattern language should look like.

The new pattern language had to be expressive, containing at least the boolean operators. It had to allow matching against the test name, the name of any of the groups containing the test, or the full test path, like Foo/Bar/Baz for a test named Baz in the test group Bar, which itself is contained in the top-level group Foo.

Finally, there was an issue of familiarity. Whatever ad-hoc DSL I would come up with, I had to document thoroughly its syntax and semantics, and then I had to convince tasty users to learn a new language and read the docs every time they wanted to filter their tests. (Not that the old patterns were particularly intuitive.)

The insight came to me last summer while I was spending time with my family and working remotely from a cabin in Poltava oblast, Ukraine. The language I needed already existed and was relatively well-known. It’s called AWK!

In AWK, the variable¹ $0 refers to the current line of input (called a “record”), and the variables $1, $2 etc. refer to the fields resulting from splitting the record on the field separator (like a tab or a comma).

The analogy with test names in tasty is straightforward: $0 denotes the full path of the test, $1 denotes the outermost test group name, $2 for the next group name, and so on. The test’s own name is $NF.

Then you can use these variables together with string, numeric, and boolean operators. Some examples:

$2 == "Two" — select the subgroup Two
$2 == "Two" && $3 == "Three" — select the test or subgroup named Three in the subgroup named Two
$2 == "Two" || $2 == "Twenty-two" — select two subgroups
$0 !~ /skip/ or ! /skip/ — select tests whose full names (including group names) do not contain the word skip
$NF !~ /skip/ — select tests whose own names (but not group names) do not contain the word skip
$(NF-1) ~ /QuickCheck/ — select tests whose immediate parent group name contains QuickCheck

The list of all supported functions and operators can be found in the README.

As a shortcut, if the -p/--pattern argument consists of letters, digits, and characters, it is matched against the full test path, so -p foo is equivalent to -p /foo/.

The subset of AWK recognized by tasty contains only expressions (no statements like loops or function definitions), no assignment operators, and no variables except NF. Other than that, the most salient deviation is that pattern matching (as in $3 ~ /foo/) does not use regular expressions, for the reasons stated above. Instead, a pattern match means a simple substring search — an idea suggested by Levent Erkök. So /foo/ in tasty means exactly the same as in AWK, while AWK’s /foo+/ cannot be expressed.

This allowed me to drop regex-tdfa as a dependency and significantly speed up the compilation time. An installation of tasty-1.0 (the new major release featuring AWK patterns) from scratch (a fresh cabal sandbox) takes 24 seconds on my laptop², while an installation of tasty-0.12.0.1 (the previous version, which depends on regex-tdfa) takes 2 minutes 43 seconds.

The performance improved, too. I tried Levent’s example, which runs 30k dummy tests (j @?= j). When run with --quiet (so no time is wasted on output), tasty-0.12.0.1 takes 0.3 seconds to run all tests and 0.6 seconds to run a single test selected by a pattern (-p 9_2729). The new tasty-1.0 takes the same time without a pattern, and less than 0.1 seconds with a pattern (also -p 9_2729, which is equivalent to $0 ~ /9_2729/). The overhead of pattern matching, although it was pretty low already (0.3 seconds per 30k tests), became much smaller — so that it is now outweighed by the benefit of running fewer dummy tests. I haven’t done any performance optimization at all³, so I don’t even know where the speedup came from, exactly.

Earlier I said that I dropped regex-tdfa as a dependency for tasty and that regex-tdfa in turn depended on parsec; but didn’t I have to retain parsec or a similar library to parse the AWK syntax? No! We already have a perfectly fine parser combinators module in the base library, Text.ParserCombinators.ReadP. Its original purpose was to back the standard Read instances, but there is no reason it can’t be used for something more fun.

I did borrow one small module from megaparsec for parsing expressions (Text.Megaparsec.Expr), which I adapted to work with ReadP and to parse ternary operators. The expression parser originally comes from parsec, but Mark Karpov did a great job refactoring it, so I recommend you read Mark’s version instead. The expression parser is an ingenious algorithm deserving a separate blog post.

Enjoy the new tasty patterns!