Trying to implement recursion schemes but bumped into issue when testing in script

Hello there!
While I was trying to play with contingent-claims library, I followed adventures in uncertainty: Recursion Schemes, Part II: A Mob of Morphisms to learn recursion schemes
I was trying to implement cata in daml following the post as below

import Daml.Script
import Daml.Control.Arrow

data Expr a = 
    Literal Int
  | Indent Text
  | Unary with sign: Text; sub: a 
  | Binary with l: a; sign: Text; r: a
  | Call with func: a; args: [a]
  | Paren a
  deriving (Show, Eq, Functor)

newtype Term f = Term { out : f (Term f) }

cata : Functor f => (f b -> b) -> Term f -> b
cata f = out >>> fmap (cata f) >>> f

testScript = script do
  let 
    ten = Term (Literal 10)
    five = Term (Literal 5)
    add = Term (Indent "add")
    call = Term (Call with func=add; args = [ten, five])

    countNodes : Expr Int -> Int    
    countNodes (Literal _) = 1
    countNodes (Indent _) = 1
    countNodes (Unary _ arg) = arg + 1
    countNodes (Binary left _ right) = left + right + 1
    countNodes (Paren arg) = arg + 1
    countNodes (Call fn args) = fn + sum args
    
    totalNodes = cata countNodes call
  debug totalNodes
  pure ()

trying to run the above script will spawn up a java process taking all my cpu/memory and ultimately crash my pc, I am not sure what is wrong with this implementation.
Can someone points me a direction?
(I also notice that in daml-ctl library Recursion.daml we have a similar but different implementation, maybe there’s a reason that we don’t use structure like Term in daml as a wrapper?)

That’s a great blog - I learned this stuff from there too.

I suspect I know what the problem is - I remember when I wrote it I ran into a problem in how cata is defined. Can you try changing it to

cata f a = (out >>> fmap  (cata f) >>> f) a

And see if that fixes it?

Yes, what’s called Term in the blog is the Fix type in Haskell. The approach we use is a little different; it’s explained in one of the last blog posts in that series, so keep reading :smile:! If you’re interested you can also look at the recursion-schemes Haskell library, which implements both approaches.

The reason we were forced to chose the latter representation is that Fix (or Term) can not be serialized in Daml, which doesn’t support serialization of higher-order kinds (the f : * -> *) parameter.

@Luciano it worked after adding “a” back! Thanks a lot.
May I ask why does compiler view the two expressions differently?

And cool, I will keep reading :stuck_out_tongue:

I suspect this is a bug. Let’s hear what someone from the core engineering teams has to say.

Haven’t looked into it deeply but this sounds like the classical issues you run into when copying code from a lazy language to a strict language. You start evaluating once you have applied all arguments which in this case is too soon and you just end up looping. By adding the argument you delay evaluation.

When you look at the use of cata it’s natural to think of how it will apply to Term Expr, which as you’ve defined it indeed is valid as a strict data structure. This isn’t always the case; Term (Int,) isn’t valid as a strict data structure; Term [] is, because you can use the empty list.

However, the issue of Term Expr’s validity is totally different from the evaluation of cata, in a strict language like Daml.

In Haskell, the evaluation of the subexpression (cata f) is demand-driven; when Expr’s fmap reaches a base case, e.g. Literal, it won’t evaluate that subexpression (because fmap on Literal doesn’t use the function), thus effectively terminating the recursion. So the fact that you’re using Expr matters.

However, that isn’t true in a strict language like Daml. When you apply cata to f, the whole definition and every subexpression must be evaluated before a valid value is produced. And that includes the recursive call, which includes another recursive call, ad infinitum. So Expr and its fmap never come into consideration; in fact they are never even called before you run out of memory, because none of the subexpressions to produce fmap’s argument ever terminate.

@Luciano’s variation redefines the subexpression cata f to evaluate, instead of to another copy of the same definition, to the lambda \a -> (out >>> fmap (cata f) >>> f) a. Which is already fully evaluated, and will only be expanded on-demand, and more importantly will not be expanded for the fmap base cases e.g. Literal. It comes from the observation that for all expressions k : a -> b, also (\x -> k x) : a -> b but with delayed evaluation of k. Thus delaying evaluation as @cocreature mentioned.

3 Likes

Thank all of you for the swift replies. (and wow, @Stephen your answer is so detailed and educational!!)

Actually, I wonder if it’s possible for the compiler to tell me that this is ad infinitum (so noob like me won’t walk right into it, :joy:)
Anyway great thanks to all, got my answer and will continue my journey in recursion-schemes!!