Smarter validation
Published on ; updated on
Today we’ll explore different ways of handling and reporting errors
in Haskell. We shall start with the well-known Either
monad, proceed to a somewhat less common Validation
applicative, and then improve its efficiency and user experience.
The article contains several exercises that will hopefully help you better understand the issues that are being addressed here.
See also: Lazy validation.
Running example
{-# LANGUAGE GeneralizedNewtypeDeriving, KindSignatures, DataKinds,
ScopedTypeVariables, RankNTypes, DeriveFunctor #-}
import Text.Printf
import Text.Read
import Control.Monad
import Control.Applicative
import Control.Applicative.Lift (Lift)
import Control.Arrow (left)
import Data.Functor.Constant (Constant)
import Data.Monoid
import Data.Traversable (sequenceA)
import Data.List (intercalate, genericTake, genericLength)
import Data.Proxy
import System.Exit
import System.IO
import GHC.TypeLitsOur running example will consist of reading a list of integer numbers from a file, one number per line, and printing their sum.
Here’s the simplest way to do this in Haskell:
printSum1 :: FilePath -> IO ()
printSum1 path = print . sum . map read . lines =<< readFile pathThis code works as expected for a well-formed file; however, if a line in the file can’t be parsed as a number, we’ll get unhelpful
Prelude.read: no parse
Either monad
Let’s rewrite our function to be aware of possible errors.
parseNum
:: Int -- line number (for error reporting)
-> String -- line contents
-> Either String Integer
-- either parsed number or error message
parseNum ln str =
case readMaybe str of
Just num -> Right num
Nothing -> Left $
printf "Bad number on line %d: %s" ln str
-- Print a message and exit
die :: String -> IO ()
die msg = do
hPutStrLn stderr msg
exitFailure
printSum2 :: FilePath -> IO ()
printSum2 path =
either die print .
liftM sum .
sequence . zipWith parseNum [1..] .
lines =<< readFile pathNow, upon reading a line that is not a number, we’d see something like
Bad number on line 2: foo
This is a rather standard usage of the Either monad, so I won’t get into details here. I’ll just note that there are two ways in which this version is different from the first one:
- We call
readMaybeinstead ofreadand, upon detecting an error, construct a helpful error message. For this reason, we keep track of the line number. - Instead of throwing a runtime exception right away (using the
errorfunction), we return a pureEithervalue, and then combine theseEithers together using theMoand Eitherisntance.
The two changes are independent; there’s no reason why we couldn’t
use error and get the same helpful error message. The
exceptions emulated by the Either monad have the same
semantics here as the runtime exceptions. The benefit of the pure
formulation is that the semantics of runtime exceptions is built-in; but
the semantics of the pure data is programmable, and we will take
advantage of this fact below.
Validation applicative
You get a thousand-line file with numbers from your accountant. He asks you to sum them up because his enterprise software crashes mysteriously when trying to read it.
You accept the challenge, knowing that your Haskell program won’t let you down. The program tells you
Bad number on line 378: 12o0
— I see! Someone put o instead of zero. Let me fix it.
You locate the line 378 in your editor and replace 12o0
with 1200. Then you save the file, exit the editor, and
re-run the program.
Bad number on line 380: 11i3
— Come on! There’s another similar mistake just two lines below. Except now 1 got replaced by i. If you told me about both errors from the beginning, I could fix them faster!
Indeed, there’s no reason why our program couldn’t try to parse every line in the file and tell us about all the mistakes at once.
Except now we can’t use the standard Monad and
Applicative instances of Either. We need the
Validation applicative.
The Validation applicative combines two Either values in
such a way that, if they are both Left, their left values
are combined with a monoidal operation. (In fact, even a
Semigroup would suffice.) This allows us to collect errors
from different lines.
newtype Validation e a = Validation { getValidation :: Either e a }
deriving Functor
instance Monoid e => Applicative (Validation e) where
pure = Validation . Right
Validation a <*> Validation b = Validation $
case a of
Right va -> fmap va b
Left ea -> either (Left . mappend ea) (const $ Left ea) bThe following example demonstrates the difference between the
standard Applicative instance and the
Validation one:
> let e1 = Left "error1"; e2 = Left " error2"
> e1 *> e2
Left "error1"
> getValidation $ Validation e1 *> Validation e2
Left "error1 error2"
A clever implementation of the same applicative functor exists inside the transformers package. Ross Paterson observes that this functor can be constructed as
type Errors e = Lift (Constant e)(see Control.Applicative.Lift).
Anyway, let’s use this to improve our summing program.
printSum3 :: FilePath -> IO ()
printSum3 path =
either (die . intercalate "\n") print .
liftM sum .
getValidation . sequenceA .
map (Validation . left (\e -> [e])) .
zipWith parseNum [1..] .
lines =<< readFile pathNow a single invocation of the program shows all the errors it can find:
Bad number on line 378: 12o0
Bad number on line 380: 11i3
Exercise. Could we use Writer [String]
to collect error messages?
Exercise. When appending lists, there is a danger of
incurring quadratic complexity. Does that happen in the above function?
Could it happen in a different function that uses the
Validation applicative based on the list monoid?
Smarter Validation applicative
Next day your accountant sends you another thousand-line file to sum up. This time your terminal gets flooded by error messages:
Bad number on line 1: 27297.
Bad number on line 2: 11986.
Bad number on line 3: 18938.
Bad number on line 4: 22820.
...
You already see the problem: every number ends with a dot. This is trivial to diagnose and fix, and there is absolutely no need to print a thousand error messages.
In fact, there are two different reasons to limit the number of reported errors:
- User experience: it is unlikely that the user will pay attention to more than, say, 10 messages at once. If we try to display too many errors on a web page, it may get slow and ugly.
- Efficiency: if we agree it’s only worth printing the first 10 errors, then, once we gather 10 errors, there is no point processing the data further.
Turns out, each of the two goals outlined above will need its own mechanism.
Bounded lists
We first develop a list-like datatype which stores only the first
n elements and discards anything else that may get
appended. This primarily addresses our first goal, user experience,
although it will be handy for achieving the second goal too.
Although for validation purposes we may settle with the limit of 10, it’s nice to make this a generic, reusable type with a flexible limit. So we’ll make the limit a part of the type, taking advantage of the type-level number literals.
Exercise. Think of the alternatives to storing the limit in the type. What are their pros and cons?
On the value level, we will base the new type on difference lists, to avoid the quadratic complexity issue that I allude to above.
data BoundedList (n :: Nat) a =
BoundedList
!Integer -- current length of the list
(Endo [a])Exercise. Why is it important to cache the current length instead of computing it from the difference list?
Once we’ve figured out the main ideas (encoding the limit in the type, using difference lists, caching the current length), the actual implementation is straightforward.
singleton :: KnownNat n => a -> BoundedList n a
singleton a = fromList [a]
toList :: BoundedList n a -> [a]
toList (BoundedList _ (Endo f)) = f []
fromList :: forall a n . KnownNat n => [a] -> BoundedList n a
fromList lst = BoundedList (min len limit) (Endo (genericTake limit lst ++))
where
limit = natVal (Proxy :: Proxy n)
len = genericLength lst
instance KnownNat n => Monoid (BoundedList n a) where
mempty = BoundedList 0 mempty
mappend b1@(BoundedList l1 f1) (BoundedList l2 f2)
| l1 >= limit = b1
| l1 + l2 <= limit = BoundedList (l1 + l2) (f1 <> f2)
| otherwise = BoundedList limit (f1 <> Endo (genericTake (limit - l1)) <> f2)
where
limit = natVal (Proxy :: Proxy n)
full :: forall a n . KnownNat n => BoundedList n a -> Bool
full (BoundedList l _) = l >= natVal (Proxy :: Proxy n)
null :: BoundedList n a -> Bool
null (BoundedList l _) = l <= 0SmartValidation
Now we will build the smart validation applicative which stops doing
work when it doesn’t make sense to collect errors further anymore. This
is a balance between the Either applicative, which can only
store a single error, and Validation, which collects all of
them.
Implementing such an applicative functor is not as trivial as it may appear at first. In fact, before reading the code below, I recommend doing the following
Exercise. Try implementing a type and an applicative instance for it which adheres to the above specification.
Did you try it? Did you succeed? This is not a rhetorical question, I am actually interested, so let me know. Is your implementation the same as mine, or is it simpler, or more complicated?
Alright, here’s my implementation.
newtype SmartValidation (n :: Nat) e a = SmartValidation
{ getSmartValidation :: forall r .
Either (BoundedList n e) (a -> r) -> Either (BoundedList n e) r }
deriving Functor
instance KnownNat n => Applicative (SmartValidation n e) where
pure x = SmartValidation $ \k -> k <*> Right x
SmartValidation a <*> SmartValidation b = SmartValidation $ \k ->
let k' = fmap (.) k in
case a k' of
Left errs | full errs -> Left errs
r -> b rAnd here are some functions to construct and analyze
SmartValidation values.
-- Convert SmartValidation to Either
fatal :: SmartValidation n e a -> Either [e] a
fatal = left toList . ($ Right id) . getSmartValidation
-- Convert Either to SmartValidation
nonFatal :: KnownNat n => Either e a -> SmartValidation n e a
nonFatal a = SmartValidation $ (\k -> k <+> left singleton a)
-- like <*>, but mappends the errors
(<+>)
:: Monoid e
=> Either e (a -> b)
-> Either e a
-> Either e b
a <+> b = case (a,b) of
(Right va, Right vb) -> Right $ va vb
(Left e, Right _) -> Left e
(Right _, Left e) -> Left e
(Left e1, Left e2) -> Left $ e1 <> e2Exercise. Work out what fmap (.) k does
in the definition of <*>.
Exercise. In the definition of
<*>, should we check whether k is full
before evaluating a k'?
Exercise. We developed two mechanisms —
BoundedList and SmartValidation, which seem to
do about the same thing on different levels. Would any one of these two
mechanisms suffice to achieve both our goals, user experience and
efficiency, when there are many errors being reported?
Exercise. If the SmartValidation
applicative was based on ordinary lists instead of difference lists,
would we be less or more likely to run into the quadratic complexity
problem compared to simple Validation?
Conclusion
Although the Validation applicative is known among
Haskellers, the need to limit the number of errors it produces is rarely
(if ever) discussed. Implementing an applicative functor that limits the
number of errors and avoids doing extra work is somewhat tricky. Thus, I
am happy to share my solution and curious about how other people have
dealt with this problem.