# Smarter validation

Published on

Today we’ll explore different ways of handling and reporting errors in Haskell. We shall start with the well-known `Either`

monad, proceed to a somewhat less common `Validation`

applicative, and then improve its efficiency and user experience.

The article contains several exercises that will hopefully help you better understand the issues that are being addressed here.

## Running example

```
{-# LANGUAGE GeneralizedNewtypeDeriving, KindSignatures, DataKinds,
ScopedTypeVariables, RankNTypes, DeriveFunctor #-}
import Text.Printf
import Text.Read
import Control.Monad
import Control.Applicative
import Control.Applicative.Lift (Lift)
import Control.Arrow (left)
import Data.Functor.Constant (Constant)
import Data.Monoid
import Data.Traversable (sequenceA)
import Data.List (intercalate, genericTake, genericLength)
import Data.Proxy
import System.Exit
import System.IO
import GHC.TypeLits
```

Our running example will consist of reading a list of integer numbers from a file, one number per line, and printing their sum.

Here’s the simplest way to do this in Haskell:

This code works as expected for a well-formed file; however, if a line in the file can’t be parsed as a number, we’ll get unhelpful

`Prelude.read: no parse`

## Either monad

Let’s rewrite our function to be aware of possible errors.

```
parseNum
:: Int -- line number (for error reporting)
-> String -- line contents
-> Either String Integer
-- either parsed number or error message
parseNum ln str =
case readMaybe str of
Just num -> Right num
Nothing -> Left $
printf "Bad number on line %d: %s" ln str
-- Print a message and exit
die :: String -> IO ()
die msg = do
hPutStrLn stderr msg
exitFailure
printSum2 :: FilePath -> IO ()
printSum2 path =
either die print .
liftM sum .
sequence . zipWith parseNum [1..] .
lines =<< readFile path
```

Now, upon reading a line that is not a number, we’d see something like

`Bad number on line 2: foo`

This is a rather standard usage of the Either monad, so I won’t get into details here. I’ll just note that there are two ways in which this version is different from the first one:

- We call
`readMaybe`

instead of`read`

and, upon detecting an error, construct a helpful error message. For this reason, we keep track of the line number. - Instead of throwing a runtime exception right away (using the
`error`

function), we return a pure`Either`

value, and then combine these`Either`

s together using the`Moand Either`

isntance.

The two changes are independent; there’s no reason why we couldn’t use `error`

and get the same helpful error message. The exceptions emulated by the `Either`

monad have the same semantics here as the runtime exceptions. The benefit of the pure formulation is that the semantics of runtime exceptions is built-in; but the semantics of the pure data is programmable, and we will take advantage of this fact below.

## Validation applicative

You get a thousand-line file with numbers from your accountant. He asks you to sum them up because his enterprise software crashes mysteriously when trying to read it.

You accept the challenge, knowing that your Haskell program won’t let you down. The program tells you

`Bad number on line 378: 12o0`

*— I see! Someone put o instead of zero. Let me fix it.*

You locate the line 378 in your editor and replace `12o0`

with `1200`

. Then you save the file, exit the editor, and re-run the program.

`Bad number on line 380: 11i3`

*— Come on! There’s another similar mistake just two lines below. Except now 1 got replaced by i. If you told me about both errors from the beginning, I could fix them faster!*

Indeed, there’s no reason why our program couldn’t try to parse every line in the file and tell us about all the mistakes at once.

Except now we can’t use the standard `Monad`

and `Applicative`

instances of `Either`

. We need the `Validation`

applicative.

The `Validation`

applicative combines two Either values in such a way that, if they are both `Left`

, their left values are combined with a monoidal operation. (In fact, even a `Semigroup`

would suffice.) This allows us to collect errors from different lines.

```
newtype Validation e a = Validation { getValidation :: Either e a }
deriving Functor
instance Monoid e => Applicative (Validation e) where
pure = Validation . Right
Validation a <*> Validation b = Validation $
case a of
Right va -> fmap va b
Left ea -> either (Left . mappend ea) (const $ Left ea) b
```

The following example demonstrates the difference between the standard `Applicative`

instance and the `Validation`

one:

```
> let e1 = Left "error1"; e2 = Left " error2"
> e1 *> e2
Left "error1"
> getValidation $ Validation e1 *> Validation e2
Left "error1 error2"
```

A clever implementation of the same applicative functor exists inside the transformers package. Ross Paterson observes that this functor can be constructed as

(see `Control.Applicative.Lift`

).

Anyway, let’s use this to improve our summing program.

```
printSum3 :: FilePath -> IO ()
printSum3 path =
either (die . intercalate "\n") print .
liftM sum .
getValidation . sequenceA .
map (Validation . left (\e -> [e])) .
zipWith parseNum [1..] .
lines =<< readFile path
```

Now a single invocation of the program shows all the errors it can find:

```
Bad number on line 378: 12o0
Bad number on line 380: 11i3
```

**Exercise.** Could we use `Writer [String]`

to collect error messages?

**Exercise.** When appending lists, there is a danger of incurring quadratic complexity. Does that happen in the above function? Could it happen in a different function that uses the `Validation`

applicative based on the list monoid?

## Smarter Validation applicative

Next day your accountant sends you another thousand-line file to sum up. This time your terminal gets flooded by error messages:

```
Bad number on line 1: 27297.
Bad number on line 2: 11986.
Bad number on line 3: 18938.
Bad number on line 4: 22820.
...
```

You already see the problem: every number ends with a dot. This is trivial to diagnose and fix, and there is absolutely no need to print a thousand error messages.

In fact, there are two different reasons to limit the number of reported errors:

- User experience: it is unlikely that the user will pay attention to more than, say, 10 messages at once. If we try to display too many errors on a web page, it may get slow and ugly.
- Efficiency: if we agree it’s only worth printing the first 10 errors, then, once we gather 10 errors, there is no point processing the data further.

Turns out, each of the two goals outlined above will need its own mechanism.

### Bounded lists

We first develop a list-like datatype which stores only the first `n`

elements and discards anything else that may get appended. This primarily addresses our first goal, user experience, although it will be handy for achieving the second goal too.

Although for validation purposes we may settle with the limit of 10, it’s nice to make this a generic, reusable type with a flexible limit. So we’ll make the limit a part of the type, taking advantage of the type-level number literals.

**Exercise.** Think of the alternatives to storing the limit in the type. What are their pros and cons?

On the value level, we will base the new type on difference lists, to avoid the quadratic complexity issue that I allude to above.

**Exercise.** Why is it important to cache the current length instead of computing it from the difference list?

Once we’ve figured out the main ideas (encoding the limit in the type, using difference lists, caching the current length), the actual implementation is straightforward.

```
singleton :: KnownNat n => a -> BoundedList n a
singleton a = fromList [a]
toList :: BoundedList n a -> [a]
toList (BoundedList _ (Endo f)) = f []
fromList :: forall a n . KnownNat n => [a] -> BoundedList n a
fromList lst = BoundedList (min len limit) (Endo (genericTake limit lst ++))
where
limit = natVal (Proxy :: Proxy n)
len = genericLength lst
instance KnownNat n => Monoid (BoundedList n a) where
mempty = BoundedList 0 mempty
mappend b1@(BoundedList l1 f1) (BoundedList l2 f2)
| l1 >= limit = b1
| l1 + l2 <= limit = BoundedList (l1 + l2) (f1 <> f2)
| otherwise = BoundedList limit (f1 <> Endo (genericTake (limit - l1)) <> f2)
where
limit = natVal (Proxy :: Proxy n)
full :: forall a n . KnownNat n => BoundedList n a -> Bool
full (BoundedList l _) = l >= natVal (Proxy :: Proxy n)
null :: BoundedList n a -> Bool
null (BoundedList l _) = l <= 0
```

### SmartValidation

Now we will build the smart validation applicative which stops doing work when it doesn’t make sense to collect errors further anymore. This is a balance between the `Either`

applicative, which can only store a single error, and `Validation`

, which collects all of them.

Implementing such an applicative functor is not as trivial as it may appear at first. In fact, before reading the code below, I recommend doing the following

**Exercise.** Try implementing a type and an applicative instance for it which adheres to the above specification.

Did you try it? Did you succeed? This is not a rhetorical question, I am actually interested, so let me know. Is your implementation the same as mine, or is it simpler, or more complicated?

Alright, here’s my implementation.

```
newtype SmartValidation (n :: Nat) e a = SmartValidation
{ getSmartValidation :: forall r .
Either (BoundedList n e) (a -> r) -> Either (BoundedList n e) r }
deriving Functor
instance KnownNat n => Applicative (SmartValidation n e) where
pure x = SmartValidation $ \k -> k <*> Right x
SmartValidation a <*> SmartValidation b = SmartValidation $ \k ->
let k' = fmap (.) k in
case a k' of
Left errs | full errs -> Left errs
r -> b r
```

And here are some functions to construct and analyze `SmartValidation`

values.

```
-- Convert SmartValidation to Either
fatal :: SmartValidation n e a -> Either [e] a
fatal = left toList . ($ Right id) . getSmartValidation
-- Convert Either to SmartValidation
nonFatal :: KnownNat n => Either e a -> SmartValidation n e a
nonFatal a = SmartValidation $ (\k -> k <+> left singleton a)
-- like <*>, but mappends the errors
(<+>)
:: Monoid e
=> Either e (a -> b)
-> Either e a
-> Either e b
a <+> b = case (a,b) of
(Right va, Right vb) -> Right $ va vb
(Left e, Right _) -> Left e
(Right _, Left e) -> Left e
(Left e1, Left e2) -> Left $ e1 <> e2
```

**Exercise.** Work out what `fmap (.) k`

does in the definition of `<*>`

.

**Exercise.** In the definition of `<*>`

, should we check whether `k`

is full before evaluating `a k'`

?

**Exercise.** We developed two mechanisms — `BoundedList`

and `SmartValidation`

, which seem to do about the same thing on different levels. Would any one of these two mechanisms suffice to achieve both our goals, user experience and efficiency, when there are many errors being reported?

**Exercise.** If the `SmartValidation`

applicative was based on ordinary lists instead of difference lists, would we be less or more likely to run into the quadratic complexity problem compared to simple `Validation`

?

## Conclusion

Although the `Validation`

applicative is known among Haskellers, the need to limit the number of errors it produces is rarely (if ever) discussed. Implementing an applicative functor that limits the number of errors and avoids doing extra work is somewhat tricky. Thus, I am happy to share my solution and curious about how other people have dealt with this problem.