Roman Cheplyaka

New patterns in tasty

Published on January 8, 2018

When I wrote tasty in 2013, I borrowed the pattern language and its implementation from test-framework. I wasn’t fond of that pattern language, but it did the job most of the time, and the task of coming up with a better alternative was daunting.

Over the years, however, the pattern language and implementation received more feature requests and complaints than any other aspect of tasty.

Every time someone filed an issue about tasty patterns, I would say something like “oh, this will be fixed once I get around to that issue from 2013 about the new patterns”. But I still had no idea what that new pattern language should look like.

The new pattern language had to be expressive, containing at least the boolean operators. It had to allow matching against the test name, the name of any of the groups containing the test, or the full test path, like Foo/Bar/Baz for a test named Baz in the test group Bar, which itself is contained in the top-level group Foo.

Finally, there was an issue of familiarity. Whatever ad-hoc DSL I would come up with, I had to document thoroughly its syntax and semantics, and then I had to convince tasty users to learn a new language and read the docs every time they wanted to filter their tests. (Not that the old patterns were particularly intuitive.)

The insight came to me last summer while I was spending time with my family and working remotely from a cabin in Poltava oblast, Ukraine. The language I needed already existed and was relatively well-known. It’s called AWK!

My workspace in Poltava oblast
My workspace in Poltava oblast

In AWK, the variable1 $0 refers to the current line of input (called a “record”), and the variables $1, $2 etc. refer to the fields resulting from splitting the record on the field separator (like a tab or a comma).

The analogy with test names in tasty is straightforward: $0 denotes the full path of the test, $1 denotes the outermost test group name, $2 for the next group name, and so on. The test’s own name is $NF.

Then you can use these variables together with string, numeric, and boolean operators. Some examples:

The list of all supported functions and operators can be found in the README.

As a shortcut, if the -p/--pattern argument consists of letters, digits, and characters, it is matched against the full test path, so -p foo is equivalent to -p /foo/.

The subset of AWK recognized by tasty contains only expressions (no statements like loops or function definitions), no assignment operators, and no variables except NF. Other than that, the most salient deviation is that pattern matching (as in $3 ~ /foo/) does not use regular expressions, for the reasons stated above. Instead, a pattern match means a simple substring search — an idea suggested by Levent Erkök. So /foo/ in tasty means exactly the same as in AWK, while AWK’s /foo+/ cannot be expressed.

This allowed me to drop regex-tdfa as a dependency and significantly speed up the compilation time. An installation of tasty-1.0 (the new major release featuring AWK patterns) from scratch (a fresh cabal sandbox) takes 24 seconds on my laptop2, while an installation of tasty- (the previous version, which depends on regex-tdfa) takes 2 minutes 43 seconds.

The performance improved, too. I tried Levent’s example, which runs 30k dummy tests (j @?= j). When run with --quiet (so no time is wasted on output), tasty- takes 0.3 seconds to run all tests and 0.6 seconds to run a single test selected by a pattern (-p 9_2729). The new tasty-1.0 takes the same time without a pattern, and less than 0.1 seconds with a pattern (also -p 9_2729, which is equivalent to $0 ~ /9_2729/). The overhead of pattern matching, although it was pretty low already (0.3 seconds per 30k tests), became much smaller — so that it is now outweighed by the benefit of running fewer dummy tests. I haven’t done any performance optimization at all3, so I don’t even know where the speedup came from, exactly.

Earlier I said that I dropped regex-tdfa as a dependency for tasty and that regex-tdfa in turn depended on parsec; but didn’t I have to retain parsec or a similar library to parse the AWK syntax? No! We already have a perfectly fine parser combinators module in the base library, Text.ParserCombinators.ReadP. Its original purpose was to back the standard Read instances, but there is no reason it can’t be used for something more fun.

I did borrow one small module from megaparsec for parsing expressions (Text.Megaparsec.Expr), which I adapted to work with ReadP and to parse ternary operators. The expression parser originally comes from parsec, but Mark Karpov did a great job refactoring it, so I recommend you read Mark’s version instead. The expression parser is an ingenious algorithm deserving a separate blog post.

Enjoy the new tasty patterns!

  1. Actually, unlike in Perl, in AWK $0 is an expression: the operator $ applied to the number 0. Instead of $0 you could write $(1-1). The most practically relevant implication is that you can write $NF for the last field in the record, $(NF-1) for the second to last field and so on.

  2. Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz, 2 cores (4 virtual), SSD.

  3. I had an idea to do partial evaluation of the expression, so that the condition $2 == "X" would prune the whole test subgroup with name $2 == "Y" without having to consider each test individually. But after doing the measurement, I saw that matching is already fast enough. and the extra complexity is not justified.