Smarter validation
Published on ; updated on
Today we’ll explore different ways of handling and reporting errors
in Haskell. We shall start with the well-known Either
monad, proceed to a somewhat less common Validation
applicative, and then improve its efficiency and user experience.
The article contains several exercises that will hopefully help you better understand the issues that are being addressed here.
See also: Lazy validation.
Running example
{-# LANGUAGE GeneralizedNewtypeDeriving, KindSignatures, DataKinds,
ScopedTypeVariables, RankNTypes, DeriveFunctor #-}
import Text.Printf
import Text.Read
import Control.Monad
import Control.Applicative
import Control.Applicative.Lift (Lift)
import Control.Arrow (left)
import Data.Functor.Constant (Constant)
import Data.Monoid
import Data.Traversable (sequenceA)
import Data.List (intercalate, genericTake, genericLength)
import Data.Proxy
import System.Exit
import System.IO
import GHC.TypeLits
Our running example will consist of reading a list of integer numbers from a file, one number per line, and printing their sum.
Here’s the simplest way to do this in Haskell:
printSum1 :: FilePath -> IO ()
= print . sum . map read . lines =<< readFile path printSum1 path
This code works as expected for a well-formed file; however, if a line in the file can’t be parsed as a number, we’ll get unhelpful
Prelude.read: no parse
Either monad
Let’s rewrite our function to be aware of possible errors.
parseNum :: Int -- line number (for error reporting)
-> String -- line contents
-> Either String Integer
-- either parsed number or error message
=
parseNum ln str case readMaybe str of
Just num -> Right num
Nothing -> Left $
"Bad number on line %d: %s" ln str
printf
-- Print a message and exit
die :: String -> IO ()
= do
die msg
hPutStrLn stderr msg
exitFailure
printSum2 :: FilePath -> IO ()
=
printSum2 path either die print .
sum .
liftM sequence . zipWith parseNum [1..] .
lines =<< readFile path
Now, upon reading a line that is not a number, we’d see something like
Bad number on line 2: foo
This is a rather standard usage of the Either monad, so I won’t get into details here. I’ll just note that there are two ways in which this version is different from the first one:
- We call
readMaybe
instead ofread
and, upon detecting an error, construct a helpful error message. For this reason, we keep track of the line number. - Instead of throwing a runtime exception right away (using the
error
function), we return a pureEither
value, and then combine theseEither
s together using theMoand Either
isntance.
The two changes are independent; there’s no reason why we couldn’t
use error
and get the same helpful error message. The
exceptions emulated by the Either
monad have the same
semantics here as the runtime exceptions. The benefit of the pure
formulation is that the semantics of runtime exceptions is built-in; but
the semantics of the pure data is programmable, and we will take
advantage of this fact below.
Validation applicative
You get a thousand-line file with numbers from your accountant. He asks you to sum them up because his enterprise software crashes mysteriously when trying to read it.
You accept the challenge, knowing that your Haskell program won’t let you down. The program tells you
Bad number on line 378: 12o0
— I see! Someone put o instead of zero. Let me fix it.
You locate the line 378 in your editor and replace 12o0
with 1200
. Then you save the file, exit the editor, and
re-run the program.
Bad number on line 380: 11i3
— Come on! There’s another similar mistake just two lines below. Except now 1 got replaced by i. If you told me about both errors from the beginning, I could fix them faster!
Indeed, there’s no reason why our program couldn’t try to parse every line in the file and tell us about all the mistakes at once.
Except now we can’t use the standard Monad
and
Applicative
instances of Either
. We need the
Validation
applicative.
The Validation
applicative combines two Either values in
such a way that, if they are both Left
, their left values
are combined with a monoidal operation. (In fact, even a
Semigroup
would suffice.) This allows us to collect errors
from different lines.
newtype Validation e a = Validation { getValidation :: Either e a }
deriving Functor
instance Monoid e => Applicative (Validation e) where
pure = Validation . Right
Validation a <*> Validation b = Validation $
case a of
Right va -> fmap va b
Left ea -> either (Left . mappend ea) (const $ Left ea) b
The following example demonstrates the difference between the
standard Applicative
instance and the
Validation
one:
> let e1 = Left "error1"; e2 = Left " error2"
> e1 *> e2
Left "error1"
> getValidation $ Validation e1 *> Validation e2
Left "error1 error2"
A clever implementation of the same applicative functor exists inside the transformers package. Ross Paterson observes that this functor can be constructed as
type Errors e = Lift (Constant e)
(see Control.Applicative.Lift
).
Anyway, let’s use this to improve our summing program.
printSum3 :: FilePath -> IO ()
=
printSum3 path either (die . intercalate "\n") print .
sum .
liftM . sequenceA .
getValidation map (Validation . left (\e -> [e])) .
zipWith parseNum [1..] .
lines =<< readFile path
Now a single invocation of the program shows all the errors it can find:
Bad number on line 378: 12o0
Bad number on line 380: 11i3
Exercise. Could we use Writer [String]
to collect error messages?
Exercise. When appending lists, there is a danger of
incurring quadratic complexity. Does that happen in the above function?
Could it happen in a different function that uses the
Validation
applicative based on the list monoid?
Smarter Validation applicative
Next day your accountant sends you another thousand-line file to sum up. This time your terminal gets flooded by error messages:
Bad number on line 1: 27297.
Bad number on line 2: 11986.
Bad number on line 3: 18938.
Bad number on line 4: 22820.
...
You already see the problem: every number ends with a dot. This is trivial to diagnose and fix, and there is absolutely no need to print a thousand error messages.
In fact, there are two different reasons to limit the number of reported errors:
- User experience: it is unlikely that the user will pay attention to more than, say, 10 messages at once. If we try to display too many errors on a web page, it may get slow and ugly.
- Efficiency: if we agree it’s only worth printing the first 10 errors, then, once we gather 10 errors, there is no point processing the data further.
Turns out, each of the two goals outlined above will need its own mechanism.
Bounded lists
We first develop a list-like datatype which stores only the first
n
elements and discards anything else that may get
appended. This primarily addresses our first goal, user experience,
although it will be handy for achieving the second goal too.
Although for validation purposes we may settle with the limit of 10, it’s nice to make this a generic, reusable type with a flexible limit. So we’ll make the limit a part of the type, taking advantage of the type-level number literals.
Exercise. Think of the alternatives to storing the limit in the type. What are their pros and cons?
On the value level, we will base the new type on difference lists, to avoid the quadratic complexity issue that I allude to above.
data BoundedList (n :: Nat) a =
BoundedList
!Integer -- current length of the list
Endo [a]) (
Exercise. Why is it important to cache the current length instead of computing it from the difference list?
Once we’ve figured out the main ideas (encoding the limit in the type, using difference lists, caching the current length), the actual implementation is straightforward.
singleton :: KnownNat n => a -> BoundedList n a
= fromList [a]
singleton a
toList :: BoundedList n a -> [a]
BoundedList _ (Endo f)) = f []
toList (
fromList :: forall a n . KnownNat n => [a] -> BoundedList n a
= BoundedList (min len limit) (Endo (genericTake limit lst ++))
fromList lst where
= natVal (Proxy :: Proxy n)
limit = genericLength lst
len
instance KnownNat n => Monoid (BoundedList n a) where
mempty = BoundedList 0 mempty
mappend b1@(BoundedList l1 f1) (BoundedList l2 f2)
| l1 >= limit = b1
| l1 + l2 <= limit = BoundedList (l1 + l2) (f1 <> f2)
| otherwise = BoundedList limit (f1 <> Endo (genericTake (limit - l1)) <> f2)
where
= natVal (Proxy :: Proxy n)
limit
full :: forall a n . KnownNat n => BoundedList n a -> Bool
BoundedList l _) = l >= natVal (Proxy :: Proxy n)
full (
null :: BoundedList n a -> Bool
null (BoundedList l _) = l <= 0
SmartValidation
Now we will build the smart validation applicative which stops doing
work when it doesn’t make sense to collect errors further anymore. This
is a balance between the Either
applicative, which can only
store a single error, and Validation
, which collects all of
them.
Implementing such an applicative functor is not as trivial as it may appear at first. In fact, before reading the code below, I recommend doing the following
Exercise. Try implementing a type and an applicative instance for it which adheres to the above specification.
Did you try it? Did you succeed? This is not a rhetorical question, I am actually interested, so let me know. Is your implementation the same as mine, or is it simpler, or more complicated?
Alright, here’s my implementation.
newtype SmartValidation (n :: Nat) e a = SmartValidation
getSmartValidation :: forall r .
{Either (BoundedList n e) (a -> r) -> Either (BoundedList n e) r }
deriving Functor
instance KnownNat n => Applicative (SmartValidation n e) where
pure x = SmartValidation $ \k -> k <*> Right x
SmartValidation a <*> SmartValidation b = SmartValidation $ \k ->
let k' = fmap (.) k in
case a k' of
Left errs | full errs -> Left errs
-> b r r
And here are some functions to construct and analyze
SmartValidation
values.
-- Convert SmartValidation to Either
fatal :: SmartValidation n e a -> Either [e] a
= left toList . ($ Right id) . getSmartValidation
fatal
-- Convert Either to SmartValidation
nonFatal :: KnownNat n => Either e a -> SmartValidation n e a
= SmartValidation $ (\k -> k <+> left singleton a)
nonFatal a
-- like <*>, but mappends the errors
<+>)
( :: Monoid e
=> Either e (a -> b)
-> Either e a
-> Either e b
<+> b = case (a,b) of
a Right va, Right vb) -> Right $ va vb
(Left e, Right _) -> Left e
(Right _, Left e) -> Left e
(Left e1, Left e2) -> Left $ e1 <> e2 (
Exercise. Work out what fmap (.) k
does
in the definition of <*>
.
Exercise. In the definition of
<*>
, should we check whether k
is full
before evaluating a k'
?
Exercise. We developed two mechanisms —
BoundedList
and SmartValidation
, which seem to
do about the same thing on different levels. Would any one of these two
mechanisms suffice to achieve both our goals, user experience and
efficiency, when there are many errors being reported?
Exercise. If the SmartValidation
applicative was based on ordinary lists instead of difference lists,
would we be less or more likely to run into the quadratic complexity
problem compared to simple Validation
?
Conclusion
Although the Validation
applicative is known among
Haskellers, the need to limit the number of errors it produces is rarely
(if ever) discussed. Implementing an applicative functor that limits the
number of errors and avoids doing extra work is somewhat tricky. Thus, I
am happy to share my solution and curious about how other people have
dealt with this problem.