1.6.2 -> 2.0.2 upgrade on AKS results in "LetsEncrypt not available on port :3002"

JimL · June 2021

I am running the SCIM Bridge on AKS. I have just upgraded from 1.6.2 to 2.0.2, following the instructions to update the configuration, but the container no longer listens on port 8080 or 8443. The logs from the pod are below:

6:37PM INF 1Password SCIM bridge, starting up application=op-scim version=2.0.2
6:37PM INF registering new health component application=op-scim component=RedisCache service=health version=2.0.2
6:37PM INF starting to poll components for health reports application=op-scim service=health version=2.0.2
6:37PM INF registering new health component application=op-scim component=SetupServer service=health version=2.0.2
6:37PM INF starting setup server addr=:3002 application=op-scim service=SetupServer version=2.0.2
6:37PM INF LetsEncrypt not available on port :3002. Unless you are using a custom load balancer, ensure OP_LETSENCRYPT_DOMAIN is set in your configuration. Refer to the documentation for more information. addr=:3002 application=op-scim service=SetupServer version=2.0.2

My op-scim-config.yaml definitely contains the hostname I've chosen as OP_LETSENCRYPT_DOMAIN; If I exec into the container and cat /proc/1/environ I can see that host name listed there as the value of OP_LETSENCRYPT_DOMAIN., so it's definitely making it into the container.

apiVersion: v1
kind: ConfigMap
metadata:
  name: op-scim-configmap
data:
  # Set this to the FQDN you've selected for your SCIM Bridge deployment
  OP_LETSENCRYPT_DOMAIN: "<my_hostname_redacted>"
  # (advanced) only change the options below if you need to
  OP_REDIS_URL: "redis://op-scim-redis:6379"
  OP_SESSION: "/secret/scimsession"
  OP_PRETTY_LOGS: "0"
  OP_DEBUG: "0"

I have updated the DNS entry for my hostname since the redeployment, to have the value of the LoadBalancer Ingress IP address returned by kubectl describe service/op-scim-bridge

Even with OP_DEBUG set to "1" I don't get any more debugging information from the pod.

If I port forward port 3002 from the pod out to my local machine, in the browser I see the setup page, but entering my hostname gives:

Couldn't verify domain. Check your configuration and try again. Ensure the DNS record has had time to propagate, and that port 80 and 443 are open on your firewall.

But ports 80 and 443 forward to 8080 and 8443 as specified in the defailt op-scim-service.yaml, but neither of those are listening in the pod.

How can I find out what is going wrong?

Many thanks.

1Password Version: Not Provided
Extension Version: Not Provided
OS Version: Not Provided
Sync Type: Not Provided
Referrer: forum-search:LetsEncrypt not available

1P_Amanda · June 2021

It looks to me that your SCIM bridge is missing the scimsession file - did it get lost in the upgrade somehow? I would recommend following the steps to add the scimsession file as a secret then restarting your SCIM bridge and seeing if that helps.

JimL · June 2021

Hi Amanda. Thanks for coming back to me. It was good to know that the problem was the scimsession file. It wasn't missing, as such, but digging into it I found the problem and have got further. Thinking back, after I initially upgraded, I had a timeout issue connecting to Let's Encrypt. Having confirmed outbound connectivity with a test container, in desperation I decided to recreate the scimsession file. I deleted the scimsession secret and recreated it with the new file, as the documentation suggested:

kubectl create secret generic scimsession --from-file=/path/to/scimsession

However, when I took a look at the secret I found this:

kubectl describe secret scimsession

Name:         scimsession
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
scimsession-1:  1049 bytes

The -1 in scimsession-1 rang alarm bells. When I had downloaded a new scimsession file, my Mac had named it scimsession-1, as there was already a scimsession file in my Downloads directory (errr, from two years ago). I didn't think the filename would matter in the --from-file argument but clearly it does. Perhaps this is worth a note in the docs?

This has now got me back to the problem I had immediately after the upgrade, where the SCIM Bridge times out connectng to Let's Encrypt.

5:43PM INF 1Password SCIM bridge, starting up application=op-scim version=2.0.2
5:43PM INF registering new health component application=op-scim component=RedisCache service=health version=2.0.2
5:43PM INF starting to poll components for health reports application=op-scim service=health version=2.0.2
5:43PM INF registering new health component application=op-scim component=SCIMServer service=health version=2.0.2
5:43PM INF registering new health component application=op-scim component=CertificateManager service=health version=2.0.2
5:43PM INF registering new health component application=op-scim component=ChallengeServer service=health version=2.0.2
5:43PM INF starting LetsEncrypt challenge server addr=:8080 application=op-scim service=ChallengeServer version=2.0.2
5:43PM DBG redicrypt: getting cert for key redicrypt/<my_hostname_redacted>+rsa application=op-scim version=2.0.2
5:43PM DBG redicrypt: getting cert for key redicrypt/acme_account+key application=op-scim version=2.0.2
5:44PM ??? Server: (failed to run 1Password SCIM bridge), Wrapped: (failed to GenerateCertificate), Network: (failed to getCertificateWithTimeout), Wrapped: (updateCertificateWithTimeout timed out on certManager.GetCertificate), LetsEncrypt timed out application=op-scim version=2.0.2

As I mentioned, I have tested outbound connectivity to the cluster with another pod (if there is a good way to test it with tools available within the SCIM Bridge container, do let me know).

I can also get inbound connectivity via curl to the SCIM Bridge on port 80 on my chosen hostname in the period between LetsEncrypt challenge server startup and the timeout, thus proving the inbound connectivity and the DNS entry pointing to the correct ingress IP.

Any tips on what might be wrong at this point?

Thanks again,

Jim

1P_Amanda · June 2021

My best guess is that you're getting rate limited by LetsEncrypt - can you try with a different subdomain?

JimL · June 2021

Hi Amanda. Thanks for the quick response. Yes, that's it. If I create a new DNS entry and update OP_LETSENCRYPT_DOMAIN in op-scim-config.yaml and delete the pod, it works:

6:23PM DBG redicrypt: writing cert for key redicrypt/<my_new_hostname_redacted>+rsa application=op-scim version=2.0.2
6:23PM INF starting 1Password TLS SCIM bridge server addr=:8443 application=op-scim component=SCIMServer version=2.0.2

Thank you very much for your help!

1P_Amanda · June 2021

No problem, happy to help!

1.6.2 -> 2.0.2 upgrade on AKS results in "LetsEncrypt not available on port :3002"

Comments