useStreamQuery disconnecting

Hello,

I noticed that in our @daml/react based app, the websockets that get opened by the useStreamQuery hook will typically close with error code 1006 after several minutes (both on Chrome/Firefox). This will happen even without triggering any changes or re-renders to the React components, and just letting the app sit open idly. We’re using SDK version 1.5.0 for DAML and the JS libraries, and I’m seeing this behavior when the frontend is deployed on DABL.

In a local sandbox environment, I noticed that after a few minutes the websocket will close with the console error WebSocket connection to 'ws://localhost:3000/v1/stream/query' failed: One or more reserved bits are on: reserved1 = 1, reserved2 = 0, reserved3 = 0

I’m not sure if this is related to the error seen when deployed, but thought it worth mentioning. Any ideas for what’s going on?

2 Likes

Hi @Alex_Matson,

The WebSocket connection can fail for a number of reasons, but should gracefully and transparently reconnect. Error code 1006 is a pretty low-level error, which may indicate a connection issue between your browser and the server.

How could I reproduce this? Do you have a minimal repro for the local issue? Does it happen with the raw create-daml-app template? Is the dabl-hosted app accessible?

3 Likes

The connection issue is looking to be a likely culprit. Typically as I’ve been testing this, throughout yesterday and today, I open the app and leave it running while doing other things, then check back on it and see the errors. While away from it, I don’t notice any significant network disruptions (such as my wi-fi disconnecting), but it seems plausible that there might be some momentary interruption that causes the websocket to break.

I have also had my coworker test this, and he has been able to observe the issue as well, but less often than I do. One thing that seems to increase the odds of triggering it is to open the app in multiple tabs.

There’s a hosted version available here: https://f9i31vsdzjj5ulhs.projectdabl.com/

Log in with DABL, choose the Investor role (first option), and fill out the sign up form (can be fake info of course). Afterwards you should be on the investor home screen and within 3 minutes, hopefully you’ll observe this

The source for that particular landing page is here da-marketplace/Investor.tsx at 5be53aae3e11382a8835f7fbde98cc9d5a429b8a · digital-asset/da-marketplace · GitHub

1 Like

Hi @Alex_Matson,

I have opened the application as instructed, then forgot about it for a day. I’ve had losses of connection, VPN connections/disconnections, and put my laptop to sleep, so it’s hard to isolate a single cause. However, I did have a handful of similar messages in the console when I checked again today.

It is completely expected that a WebSocket connection may fail for various reasons. Therefore, our library (the underlying daml/ledger one) tries to reconnect when that happens; you have some control over that behaviour through the reconnectThreshold parameter: the stream will only try to reconnect if it had previously been live for at least that amount of milliseconds (default 30 000).

In most cases, if these disconnects are infrequent enough (i.e. less frequent than reconnectThreshold), this should have no impact on your application, assuming the stream wan establish a new WebSocket connection. However, prior to v1.6.0-snapshot.20200922.5258.0.cd4a06db (and that seems to apply to you as the linked code is on 1.5.0), there was a bug in the behaviour of streamSubmit whereby upon any connection issue, the close event would trigger, regardless of whether the stream was able to reconnect. That close event is handled by our React wrapper (useStreamQuery) to indicate the component should be put in a loading state (and log that error you’re seeing), which can thus trigger too eagerly.

Hopefully v1.6.0-snapshot.20200922.5258.0.cd4a06db should fix that by making the behaviour more actionable: the close handlers on the stream will now only get called when the connection is really broken (i.e. it has failed too quickly and will not attempt to reconnect), at which point your application should try to handle it somehow.

Would you mind trying a recent snapshot (v1.6.0-snapshot.20200922.5258.0.cd4a06db or v1.6.0-snapshot.20200930.5312.0.b9a1905d) and see if that improves your situation? You should be able to run 1.6-snapshot JS code against 1.5 ledger with no issue.

1 Like

I noticed similar behavior when running with snapshot v1.6.0-snapshot.20200930.5312.0.b9a1905d, however this time the errors do seem to be getting caught at a higher level in the library (code 4001 rather than 1006). This one is hosted on https://on9yz8uh8kokqosf.projectdabl.com/ if you’d like to take a look

I’ll experiment with trying different values of reconnectThreshold, but as an alternative is there a way to detect and respond to close events from the application level?

1 Like

Hi @Alex_Matson,

Looking through the useStreamQuery code (not very familiar with this part), it looks like there is no provision to notify the application on stream close. As far as I can tell this means the application cannot recover from a stream close.

That seems problematic, so I’ll look into fixing that. I don’t really have a workaround for you in the meantime, unfortunately.

1 Like

What’s the supported method of modifying reconnectThreshold? I see it’s an option to the Ledger class in @daml/ledger, but it doesn’t seem to be available as a prop when creating a <DamlLedger> component with @daml/react (which is how I would expect to supply it)

1 Like

Yes, that is also what I would expect. It appears you have no way to set this from the React bindings at the moment. I’m really sorry I don’t have a better answer for you at this time.

1 Like

Hi @Alex_Matson,

I have just merged two PRs that will allow users to:

  • Specify the reconnectThreshold value as an optional field in LedgerProps (value in ms, default 30 000, i.e. 30s).
  • Pass in an optional 4th argument to useStreamQuery and useFetchByKey; this argument should be typed (StreamCloseEvent) => void and will be called when the stream closes.

It’s unfortunately too late for inclusion in the 1.6 line of releases, but should be part of the very first 1.7-snapshot, which should be published this Wednesday. While I can’t stress enough that you shouldn’t rely on snapshots for production use, I would appreciate your feedback on this feature, so please consider testing the 1.7-snapshots.

2 Likes

Hi @Alex_Matson,

We have just released our weekly snapshot, dubbed 1.7.0-snapshot.20201006.5358.0.0c1cadcf, which includes the above changes. Would you mind trying it out? Any feedback you provide can be used to make this better before 1.7 rolls out.

1 Like

@Gary_Verhaegen thank you for all the assistance and attention you’ve been giving it to this issue, it is much appreciated!

I tested out the 1.7.0 snapshot, and the situation appears to be much improved. I should also mention that the code has a lot of templates that are being streamed, so we have about 15-16 concurrent websocket connections open at a time. I’m not sure if that’s responsible for what I’ve been seeing, but it’s something to consider.

That said, whereas initially I was noticing some websockets getting dropped after a few minutes, they persist on average much longer. I tested today and saw about ~20 minutes (though there are some still alive as I type this), and last night they lasted up to ~45 minutes.

I would say that is quite reasonable, and at that point the user can refresh/reload the page to get new connections going again. Thus, I will mark this thread as solved for now, and let you know if any further developments arise.

The only thing that I find curious is that despite having a reconnectThreshold of 0, when a websocket closes I see the library attempts to reopen a new one, but the new one closes immediately, and no further reconnect attempts are made. I wonder if this is due to the react library, or perhaps something on the DABL server side that rejected them?

Here’s a screenshot that highlights four such websockets over time

The console only reports these closings with the error 4001 and reason ws connection failed . I thought you might be interested to know about that.

1 Like

Glad I could help!

What you describe matches the behaviour I would expect given the code we currently have. For reference, you can find the relevant code here. The logic, as currently defined, is to only try to reconnect if the connection had been alive for reconnectThreshold milliseconds on the previous attempt. So in other words (see the isLiveSince variable in that block), if the connection fails directly after failing, it will stop trying to reconnect, which seems to be what you’re observing.

I suppose we could try to have a more sophisticated behaviour, say some sort of exponential backoff with setTimeout, but ultimately I don’ think we can make up a generic approach that works for all applications. I’m happy to take suggestions on that, but in the meantime I think the best path forward is for you to handle the reconnection behaviour you want (probably set the component to some sort of loading state so the user knows it’s disconnected, perhaps display a timer of when you’ll try to reconnect next, and offer a button to retry now or something?) using the new closeHandler hook.

3 Likes

Hi @Gary_Verhaegen @Alex_Matson
I encountered same error with nginx proxy settings. Any updates to keep live webSocket connection?
Can we have something like ping mechanism from dabl lib to keep the connection active?

Hi @rtly,

I’m not sure what more I can say than what I said above. As far as I’m aware that part fo the code has not significantly changed in the past two years. I still don’t know what else to do in a generic library.