1password connect tokens randomly stop working
We've been using 1password connect for infrastructure secret management in our company for a few months now, and have bumped into this mysterious issue a couple of times already.
What happens is that, for reasons unknown to us the access tokens used by the connect servers to authenticate with the 1password backend (the one contained in the JSON file that we download when deploying a new connect server) just stops working. Where "stops working" means that the 1password/connect-sync logs fill up with the error: "Signin credentials are not compatible with the provided user auth from server", and the connect server no longer syncs with 1password.
We run several independent deployments of our infrastructure (for different clients), each with it's own connect servers that grant access to different vaults. Until now this breakage of times seems to happen for all connect servers at more or less the same time. For example the timestamp with the first of these messages on 1 server is 2024-04-20T14:54:54.063924705Z, on another server it's 2024-04-20T14:54:06.450593881Z.
To add to the mystery we also saw that with two redundant connect servers running in the same deployment (with the same 1password-credentials.json), both logging the same error as of April already, one of the two was serving outdated secrets while the other was somehow still serving up-to-date secrets (up to date as in changes made very recently and while the error in the logs was already showing for a long time).
Restarting the connect server doesn't help when this happens, the only way that we've found to get stuff back to normal is to completely re-create (all) the connect servers through https://olisto.1password.com/developer-tools/infrastructure-secrets/connect/, re-deploy them and then also re-provision all the applications with new access tokens.
Obviously it's extremely unfortunate that this happens. It means we have to do quite some work to to get everything back in working order, but much worse than that it causes outages and malfunctioning in our systems that can be very hard to track down (partly because the connect servers do keep serving -outdated- secrets, so the resulting problems can be quite subtle). We would really like to understand what's happening here and how to prevent it from happening again. We have also started monitoring the connect-sync's output for these errors to catch them earlier when this happens again.
I'm pretty sure I've seen similar things happening (independently) with the access tokens used by the application (SDK) itself and/or with service account tokens, but I have no logs of that at hand.
1Password Version: Not Provided
Extension Version: Not Provided
OS Version: Ubuntu 22.04.4 LTS
Browser: Not Provided