Lazy validation
Published on
Why lazy validation?
The Validation applicative functor is a convenient tool for checking anything for errors. It’s similar to the Either type, but its Applicative instance works differently. Whereas an Either computation stops at the very first Left value, a Validation computation continues, collecting all other errors so they all can be presented to the user at once. I gave a more detailed introduction to the Validation applicative in the article Smarter validation.
However, most of the time you don’t want all of the errors; you only want some reasonable number of them. Once you reach, say, 100 errors, the computation can be aborted. In particular, if we only request a single error, then the Validation applicative should be equivalent to the Either applicative.
This is precisely the issue that I tackle in Smarter validation. In that article, I describe an implementation that maintains a bounded list of errors internally. The maximum number of errors (“capacity”) is specified on the type level so that it is accessible to the Applicative instance.
While that approach works, it has a few drawbacks:
- The implementation is somewhat complicated.
- Keeping track of the list capacity on the type level is inconvenient.
- The capacity has to be specified in advance, before running the computation. For instance, it is not possible to get the first 10 errors, then request another 10.
There is an alternative that is more lightweight and in the spirit of Haskell: just make the Validation applicative produce the errors incrementally (lazily) as the computation progresses. Then you can take (as in Data.List.take) any number of errors you need, and the computation will only be run until it either completes or produces the requested number of errors.
An implementation
Here’s one way to implement a lazy Valdation applicative:
newtype Validation a b = Validation { runValidation :: Either a b }
deriving Functor
instance Monoid a => Applicative (Validation a) where
pure = Validation . Right
<*> ax =
af case runValidation af of
Right f -> fmap f ax
Left e1 ->
let e2 =
case runValidation ax of
Right _ -> mempty
Left e -> e
in Validation (Left (e1 <> e2))
Notice that when the first argument of <*>
, namely
af
, is Left
, we return the result
Validation (Left (e1 <> e2))
without analyzing the
second argument, namely ax
, although the value
e2
does depend on it. But without knowing anything about
e2
, we can already tell that the result is a
Left
, and the inner value (the list of errors) starts with
e1
.
The laziness of this Validation applicative means that, for instance, the code
case runValidation a of
Right r -> r
Left errs -> take 10 errs
can return the first 10 errors as soon as they are available and stop
the further evaluation of a
. It can even run on an infinite
computation, such as
Validation (Left ["error"]) *>
Validation (Right ()) *>)) fix (
Comparing different implementations
I tested the laziness of the different implementations, including the ones I could find on hackage. The test suite is available here, and its results are shown below:
fix (f *>)
|
f *> f *> fix (s *>)
|
f *> (f *> fix (s *>))
|
f *> fix (s *>)
|
|
---|---|---|---|---|
transformers-0.5.5.0 | ✗ | ✗ | ✗ | ✗ |
transformers-0.5.5.0 (patched) | ✓ | ✓ | ✓ | ✓ |
either-5.0.1 | ✓ | ✗ | ✓ | ✗ |
validation-1 | ✓ | ✗ | ✓ | ✗ |
This article | ✓ | ✓ | ✓ | ✓ |
Smarter (cap = 1) | ✓ | ✓ | ✓ | ✓ |
Smarter (cap = 2) | ✓ | ✓ | ✓ | ✗ |
Each test attempts to extract the first error from an infinite Validation computation. These computations are:
fix (f *>)
, an infinite stream of failures.fix (f *>)
simply meansf *> (f *> (f *> ... ))
.f *> f *> fix (s *>)
, two failures followed by an infinite stream of successes. Since*>
is defined asinfixl
, this really means(f *> f) *> fix (s *>)
.f *> (f *> fix (s *>))
, again two failures followed by an infinite stream of successes, but this time the grouping is different. The Applicative laws say that this shouldn’t be any different from the previous test, but the table shows that they do differ in practice.f *> fix (s *>)
, a single failure followed by an infinite stream of successes.
The implementations tested are:
- The Errors type from transformers-0.5.5.0, based on the Lift applicative.
- The same Errors type, but with a patched Applicative Lift instance (see below).
- The Validation type from either-5.0.1.
- The Validation type from validation-1, which, from what I can tell, is more or less the same code as in either-5.0.1.
- The Validation type defined above in this article.
- The
SmartValidation
type defined in the earlier article, configured to retain no more than 1 error. - The same
SmartValidation
type, but configured to retain no more than 2 errors.
So what conclusions can we draw from this experiment?
The Errors type from transformers is usually my go-to implementation, since it is available in the de-facto standard package and therefore doesn’t incur any extra dependencies. As the table shows, it doesn’t pass any tests, so it’s not lazy at all. This is because the current Applicative Lift instance assumes that both sides of
<*>
are evaluated:instance (Applicative f) => Applicative (Lift f) where pure = Pure Pure f <*> Pure x = Pure (f x) Pure f <*> Other y = Other (f <$> y) Other f <*> Pure x = Other (($ x) <$> f) Other f <*> Other y = Other (f <*> y)
However, this instance can be easily made lazy:
instance (Applicative f) => Applicative (Lift f) where pure = Pure Pure f <*> ax = f <$> ax Other f <*> ax = Other $ f <*> case ax of (Pure x -> pure x Other y -> y )
This is the “transformers-0.5.5.0 (patched)” entry in the table. I also opened an issue to put the lazy version into transformers, and Ross Paterson promptly made the change, so starting from the next version, transformers will provide a lazy Validation applicative.
The either-5.0.1 and validation-1 versions of Validation appear to be identical, and they pass the same set of tests. For practical purposes they can be considered lazy, although they do fail on some edge cases. These failures, however, are not accidental: they are a consequence of relying only on the Semigroup instance of errors, not Monoid. On the one hand, this is a good property, because it makes it less likely that you’ll end up with
Failure mempty
.On the other hand, because there’s no
mempty
, the implementation of<*>
has a “latency” of 1 error: in order to emit an error, it needs to see not only this error but also the next one (or reach the end of the computation to find out that it was the last one).Let’s see why this is the case. Say you see an error
e
. If the error type is a Monoid, you can emite <> thunk
, wherethunk
will evaluate either toe2 <> thunk2
ormempty
.Now suppose the error type is a Semigroup but not a Monoid. Say you emitted
e <> thunk
when you first saw it, but then the computation ended without any additional errors. What should the thunk evaluate to?To work around that issue, wait until you see either the next error,
e2
, or the end of the computation. In the first case, emite <> thunk
, knowing that the thunk can at least evaluate toe2
. In the second case, simply returne
.To summarize, either-5.0.1 and validation-1 are not perfectly lazy, but they come pretty close and have other advantages.
The “smart validation” applicative, when restricted to produce a single error, does pass all the tests. But it’s not really lazy—we just cheated when we told it in advance how many errors we were going to request. When we told it to hold no more than 2 errors and then gave it an infinite stream with only one error, it never returned that error.
That said, notice that “Smarter (cap = 2)” passed the
f *> f *> fix (s *>)
test, which either-5.0.1 and validation-1 did not pass. Because the smart validation functor is continuation-based, it doesn’t care how the original computation was grouped.This has also practical implications regarding performance. The “smart validation” approach has linear complexity no matter how the computation is oragnized, whereas the more straightforward approaches have quadratic worst-case complexity.
In the following test, we requested not the first, but the last error from a long left-nested stream of errors. The test passes only if it finishes in under 100ms.
iterate (*> f) f !! 10000
transformers-0.5.5.0
✗
transformers-0.5.5.0 (patched)
✗
either-5.0.1
✗
validation-1
✗
This article
✗
Smarter (cap = 1)
✓
Smarter (cap = 2)
✓