Constructors with multiple fields must give explicit field names

Hey,

When I write:

data Person = Person Text Int Int

I get the error:
Constructors with multiple fields must give explicit field names, e.g. Foo with bar : Int; baz : Int

I ran into this recently and I was wondering if there are any technical reasons for this (LF), or if this is more of a code style enforcing feature.

Both the ledger API (with verbose on) and LF require record fields to have names. As a result anything downstream can, and often does, exploit this: Java codegenned records use the field names, the field names are a required part of the JSON encoding for the JSON API, &c.

The most obvious objection is “you could just infer names like _1”. However, there is a big problem with that. It’s true that from Daml the language, the only difference between labelled and unlabelled records is convenience: you access in different ways but they represent the same data. However, the presence of labels in the API means that adding or removing labels would be an incompatible API change. So it is worth making that kind of change fully intentional, not to mention making the labels you present in the first place an explicit part of your interface.

Thank you!

So just to make sure I understand correctly, Is the issue that if I write:

data Foo = Foo Int Int

And Daml were to turn this into the equivalent of

data Foo = Foo with _1: Int; _2: Int

at some point in the compilation process, then later someone would change the original definition to

data Foo = Foo with myField: Int; myOtherField: Int

Then _1 would have to change to myField in the API thereby breaking it?

Rather: then the API would change and all your integrations would break.

Would it be a solution to this if _1 meant “the first field in the record, regardless if it has a name or not”?

I don’t think so. What if you suddenly change its type?

Note that if you really want your fields to be called _x, you can just use a tuple.

Wouldn’t that break API, regardless of what the field is called?

I don’t want my fields to be called that, I’m just trying to understand what is the reason that the example in my original code is not allowed by asking clarifying questions, and I was responding to @Stephen’s point about this.

Looking at the docs for Java codegen, I’m still not sure why this is not an issue with constructors with a single field.

If I have:

data Foo = Foo Int

and then I change it to

data Foo = Foo with bar: Int

doesn’t that lead to the same API change as with multiple field names? Why is that allowed then?

No, it would not, because the real issue is response formats, not input formats.

You are already allowed to leave off field names for inputs. Even JSON API allows this, though the feature is obscure.

But a verbose response, the field names in codegens, the response JSON format for JSON API, and everything else that incorporates field names due to metaprogramming or parsing records by looking up field names would break.

But why is this not an issue with single field constructors?

I understand that Daml is Daml and not Haskell, but it does use GHC as a frontend, so I often find that my colleagues and myself approach it with a Haskell mindset. I think it would be great if Daml helped me understand why I’m not allowed to write code a certain way, eg. if error messages referenced some documentation that explained why a certain restriction is necessary.

Daml already sort of does this with some stuff eg: Modules compiled with the QuantifiedConstraints language extension might not work properly with data-dependencies. This might stop the whole package from being extensible or upgradable using other versions of the SDK. Use this language extension at your own risk.

Which also makes me wonder why one is outright forbidden and the other is at my own risk.

First, I don’t think the compatibility arguments really apply. A type is addressed by the triple (packageid, modulename, typename). Any change to a type is breaking because the package id changes.

The real reason imho is still related to client libraries here: We generally try to not have any compiler-generated names in serializable types as those are relevant over the ledger API so users have to understand how they are generated which is just confusing. Generating record field names automatically would violate that. You can argue that the generation is simple enough here that it doesn’t matter which may be true but I’d argue that your APIs get much cleaner if you do actually specify field names.

You could imagine the compiler only enforcing this for serializable types and allowing no field names on non-serializable types which would definitely be possible and somewhat reasonable imho but also adds implementation complexity.

Which also makes me wonder why one is outright forbidden and the other is at my own risk.

Allowing QuantifiedConstraints is effectively a noop. Nothing in the compiler tries to handle them specially. Allowing unnamed fields would require the compiler to actually do work to support those (e.g. generate _1, _2, … field names). Ofc that’s doable but it’s not free.

But why is this not an issue with single field constructors?

Those are generated as variants not records.

We do not say that “all your client code is invalid when you change package ID”. There is “incompatible” and there is incompatible, and it behooves us to help users make as many of the former and as few of the latter cases as possible.

As it turns out, this is documented, I just wasn’t able to find it: Banned declarations