Testing FFT with R

Published on

In the previous article, we have built a simple FFT algorithm in Haskell. Now it’s time to test it, for correctness and for (asymptotic) efficiency. If you expect the conclusion to be “it’s good” on both accounts, read on; it may get a bit more interesting than that.

This is also a good occasion to introduce and play with inline-r, a Haskell package that allows to run R code from within Haskell.

Testing correctness

In R, FFT is available straight out of the box, without a need to import a single package.

We use tasty and quickcheck to generate random lists of complex numbers, run R’s FFT and our implementation on these random inputs and compare the results.

{-# LANGUAGE QuasiQuotes, ScopedTypeVariables, DataKinds, PartialTypeSignatures, GADTs #-}
import Test.Tasty
import Test.Tasty.QuickCheck
import Test.QuickCheck.Monadic
import Data.Complex

-- inline-r imports
import qualified Foreign.R as R (cast)
import Foreign.R.Type as R -- SComplex
import H.Prelude (runRegion, withEmbeddedR, defaultConfig)
import Language.R.QQ (r)
import Language.R.HExp as HExp (HExp(Complex),hexp)
import Data.Vector.SEXP as R (toList)

-- the module we wrote in the previous article
import FFT

-- Call R's FFT
r_fft :: [Complex Double] -> IO [Complex Double]
r_fft nums = runRegion $ do
  r_result1 <- [r|fft(nums_hs)|]
  let r_result2 = R.cast R.SComplex r_result1
      HExp.Complex r_result3 = hexp r_result2
      r_result4 = R.toList r_result3
  return r_result4

main =
  withEmbeddedR defaultConfig
  . defaultMain
  . testProperty "Haskell vs. R"
  $ \(nums :: [Complex Double]) -> monadicIO $ do

    r_result <- run $ r_fft nums

    let haskell_result = fst $ fft nums

    assert $
      Prelude.length r_result == Prelude.length haskell_result &&
      all ((<1e-8) . magnitude) (zipWith (-) r_result haskell_result)

The result?

Haskell vs. R: OK (0.28s)
  +++ OK, passed 100 tests.

All 1 tests passed (0.28s)

That’s reassuring; at least we are computing the right thing. Now let’s see if we’ve managed to stay within our \(O(n \log n)\) bound.

Testing complexity

Luckily, our implementation already records the number of arithmetic operations it takes to compute the answer.

> snd . fft $ replicate 13 1
312

And we can follow this number as \(n\) grows:

> map (\n -> snd . fft $ replicate n 1) [1..20]
[0,4,12,16,40,36,84,48,144,100,220,96,312,196,420,128,544,324,684,240]

But these numbers don’t tell us much. How do we know if this is \(\Theta(n \log n)\) or \(\Theta(n^2)\)? For this, we will again turn to R.

R’s lm function approximates one value as a linear combination of other values. In this case, we’ll try to find a linear combination of \(n\), \(n \log n\), and \(n^2\) that approximates the number of operations it takes our FFT implementation to complete on an input of size \(n\). If it turns out that the coefficient of \(n^2\) in this combination is a non-negligible positive number, then the complexity of our algorithm is probably not \(O(n \log n)\).

{-# LANGUAGE QuasiQuotes #-}
import H.Prelude (runRegion, withEmbeddedR, defaultConfig)
import qualified H.Prelude as H
import Language.R.QQ (r)

import FFT

fft_ops :: Int -> Int
fft_ops n = snd . fft $ replicate n 1

analyze
  :: [Int] -- sequence of n's
  -> IO ()
analyze ns = runRegion $ do
  let ops = map fft_ops ns

      ns_dbl, ops_dbl :: [Double]
      ns_dbl = map fromIntegral ns
      ops_dbl = map fromIntegral ops

  H.print =<< [r| ops <- ops_dbl_hs; n <- ns_dbl_hs; lm(ops ~ I(n^2) + I(n*log(n)) + n)|]

main = withEmbeddedR defaultConfig $ do
  analyze [ 2^k   | k <- [1..10]]
  analyze [ 2*k-1 | k <- [1..30]]

First, we try \(n\)s which are powers of 2. Then the number of points for evaluation halves (while remaining even!) at each level of our divide-and-conquer algorithm.

Call:
lm(formula = ops ~ I(n^2) + I(n * log(n)) + n)

Coefficients:
  (Intercept)         I(n^2)  I(n * log(n))              n
    3.036e-13      1.840e-17      2.885e+00      3.399e-14

The number of steps is rather well approximated by \(2.885\, n \log n\). That’s a win! (By the way, can you tell or guess where the number 2.885 comes from?)

But remember, the number of points only shrinks when it’s even. What the number is odd from the very beginning? Then it’ll never reduce at all!

Call:
lm(formula = ops ~ I(n^2) + I(n * log(n)) + n)

Coefficients:
  (Intercept)         I(n^2)  I(n * log(n))              n
    2.657e-12      2.000e+00      1.834e-13     -2.000e+00

Our fears are confirmed; this time the \(n \log n\) coefficient is negligible, and the number of operations appears to be \(2(n^2-n)\).

For an arbitrary number \(n\), the efficiency of the algorithm depends on the largest power of 2 that divides \(n\). If \(n = 2^m q\), where \(m\) is odd, then \(n\) will halve during the first \(m\) steps, and then stabilize at \(q\).

Concluding remarks on inline-r

The way inline-r works is very cool. It takes advantage of some unique features of R; for example, that the type of R’s AST and the type of its runtime values is the same type, just like in Lisp.

So the quasi-quote [r|...|] works by calling a normal R function that parses the expression and returns its AST as an R value. Then inline-r traverses the AST, replaces metavariables like foo_hs by values dynamically constructed from foo, and passes the expression-as-a-value back to R for evaluation.

To learn more, refer to the paper Project H: Programming R in Haskell. Also, if you understand Russian, listen to our podcast where we discuss inline-r with Alexander Vershilov, one of its authors.

On the other hand, the code using inline-r may get somewhat bulky, as in the r_fft function. It’s hard for me to say yet whether this complexity is justified. The module organization is questionable; in our first program, accessing fairly basic functionality required importing 6 different modules. Finally, the documentation lacks in content and organization.

Yet this is a very impressive young project, and I would love to see its continued development.