New patterns in tasty
Published on
When I wrote tasty in 2013, I borrowed the pattern language and its implementation from test-framework. I wasn’t fond of that pattern language, but it did the job most of the time, and the task of coming up with a better alternative was daunting.
Over the years, however, the pattern language and implementation received more feature requests and complaints than any other aspect of tasty.
- Mikhail Glushenkov wanted to run
all tests in a group except a selected few,
e.g.
--pattern 'test-group/**' --ignore 'test-group/test-3'
. - Utku Demir wanted to specify a list of tests to run, such as “A and B, but not C and D”; Philipp Hausmann and Süleyman Özarslan requested something similar in the same issue.
- Rob Stewart wanted a
pattern that would match
Bar
but notFooBar
. - Daniel Mendler reported that patterns with slashes didn’t work because slashes have special meaning in the patterns.
- Levent Erkök, whose package sbv has more than 30k tasty tests, was concerned about the performance of pattern matching.
- Allowing regular expressions in patterns would fulfill some (though not all) of these requests. However, using a regular expression library has its downsides. Carter Schonwald repeatedly complained to me that regex-tdfa takes ages to compile. Simon Jakobi noted that regex-tdfa brings along several transitive dependencies (including parsec), which further increases compilation time, makes the package more prone to breakages, and makes it harder for the maintainers of core packages to use tasty.
Every time someone filed an issue about tasty patterns, I would say something like “oh, this will be fixed once I get around to that issue from 2013 about the new patterns”. But I still had no idea what that new pattern language should look like.
The new pattern language had to be expressive, containing at least
the boolean operators. It had to allow matching against the test name,
the name of any of the groups containing the test, or the full test
path, like Foo/Bar/Baz
for a test named Baz
in
the test group Bar
, which itself is contained in the
top-level group Foo
.
Finally, there was an issue of familiarity. Whatever ad-hoc DSL I would come up with, I had to document thoroughly its syntax and semantics, and then I had to convince tasty users to learn a new language and read the docs every time they wanted to filter their tests. (Not that the old patterns were particularly intuitive.)
The insight came to me last summer while I was spending time with my family and working remotely from a cabin in Poltava oblast, Ukraine. The language I needed already existed and was relatively well-known. It’s called AWK!
In AWK, the variable1 $0
refers to the
current line of input (called a “record”), and the variables
$1
, $2
etc. refer to the fields resulting from
splitting the record on the field separator (like a tab or a comma).
The analogy with test names in tasty is straightforward:
$0
denotes the full path of the test, $1
denotes the outermost test group name, $2
for the next
group name, and so on. The test’s own name is $NF
.
Then you can use these variables together with string, numeric, and boolean operators. Some examples:
$2 == "Two"
— select the subgroupTwo
$2 == "Two" && $3 == "Three"
— select the test or subgroup namedThree
in the subgroup namedTwo
$2 == "Two" || $2 == "Twenty-two"
— select two subgroups$0 !~ /skip/
or! /skip/
— select tests whose full names (including group names) do not contain the wordskip
$NF !~ /skip/
— select tests whose own names (but not group names) do not contain the wordskip
$(NF-1) ~ /QuickCheck/
— select tests whose immediate parent group name containsQuickCheck
The list of all supported functions and operators can be found in the README.
As a shortcut, if the -p
/--pattern
argument
consists of letters, digits, and characters, it is matched against the
full test path, so -p foo
is equivalent to
-p /foo/
.
The subset of AWK recognized by tasty contains only expressions (no
statements like loops or function definitions), no assignment operators,
and no variables except NF
. Other than that, the most
salient deviation is that pattern matching (as in
$3 ~ /foo/
) does not use regular expressions, for the
reasons stated above. Instead, a pattern match means a simple substring
search — an idea suggested by Levent Erkök. So /foo/
in
tasty means exactly the same as in AWK, while AWK’s /foo+/
cannot be expressed.
This allowed me to drop regex-tdfa as a dependency and significantly speed up the compilation time. An installation of tasty-1.0 (the new major release featuring AWK patterns) from scratch (a fresh cabal sandbox) takes 24 seconds on my laptop2, while an installation of tasty-0.12.0.1 (the previous version, which depends on regex-tdfa) takes 2 minutes 43 seconds.
The performance improved, too. I tried Levent’s
example, which runs 30k dummy tests (j @?= j
). When run
with --quiet
(so no time is wasted on output),
tasty-0.12.0.1 takes 0.3 seconds to run all tests and 0.6 seconds to run
a single test selected by a pattern (-p 9_2729
). The new
tasty-1.0 takes the same time without a pattern, and less than 0.1
seconds with a pattern (also -p 9_2729
, which is equivalent
to $0 ~ /9_2729/
). The overhead of pattern matching,
although it was pretty low already (0.3 seconds per 30k tests), became
much smaller — so that it is now outweighed by the benefit of running
fewer dummy tests. I haven’t done any performance optimization at all3, so I don’t even know where the
speedup came from, exactly.
Earlier I said that I dropped regex-tdfa as a dependency for tasty
and that regex-tdfa in turn depended on parsec; but didn’t I have to
retain parsec or a similar library to parse the AWK syntax? No! We
already have a perfectly fine parser combinators module in the base
library, Text.ParserCombinators.ReadP.
Its original purpose was to back the standard Read
instances, but there is no reason it can’t be used for something more
fun.
I did borrow one small module from megaparsec for parsing expressions
(Text.Megaparsec.Expr),
which I adapted to work with ReadP
and to parse ternary
operators. The expression parser originally comes from parsec, but Mark
Karpov did a great job refactoring it, so I recommend you read Mark’s
version instead. The expression parser is an ingenious algorithm
deserving a separate blog post.
Enjoy the new tasty patterns!