How to diagnose/fix CloudFormation/autoscaling SSL errors on file download

I have an autoscaling group that was created by AWS CloudFormation. It runs on Amazon Linux 2. Last week, it was working fine. Now, new instances throw a "certificate has expired" error when trying to download phpMyAdmin. It appears cfn-init is running yum fine, and grabbing files from S3 fine, but not working when it goes to grab remote files (which it appears it does with Python).

EDIT: see below, but I initially thought this was some problem with the Python scripts. I'm no longer sure that's true, because it works fine in a different autoscaling group grabbing different files. So ... now I'm thinking it's actually some problem with the certificate on phpMyAdmin, but ... I'm not sure. And I still don't know how to diagnose further.

/var/log/cfn-init.log:

2021-10-06 10:08:24,895 [INFO] -----------------------Starting build------------
-----------
2021-10-06 10:08:24,896 [DEBUG] Not setting a reboot trigger as scheduling suppo
rt is not available
2021-10-06 10:08:24,898 [INFO] Running configSets: default
2021-10-06 10:08:24,899 [INFO] Running configSet default
2021-10-06 10:08:24,906 [INFO] Running config config
2021-10-06 10:08:34,173 [DEBUG] Installing/updating ['mod_ssl', 'ksh', 'httpd'] 
via yum
2021-10-06 10:08:37,523 [INFO] Yum installed ['mod_ssl', 'ksh', 'httpd']
2021-10-06 10:08:37,523 [DEBUG] No groups specified
2021-10-06 10:08:37,523 [DEBUG] No users specified
2021-10-06 10:08:37,523 [DEBUG] No sources specified
 ... [successfully installs some files from S3 ... ]

2021-10-06 10:08:37,621 [DEBUG] Writing content to /tmp/phpmyadmin.tgz
2021-10-06 10:08:37,621 [DEBUG] Retrieving contents from https://www.phpmyadmin.
net/downloads/phpMyAdmin-latest-all-languages.tar.gz
2021-10-06 10:08:37,640 [ERROR] SSLError
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages
/urllib3/connectionpool.py", line 544, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages
/urllib3/connectionpool.py", line 341, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages
/urllib3/connectionpool.py", line 762, in _validate_conn
    conn.connect()
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages
/urllib3/connection.py", line 238, in connect
    ssl_version=resolved_ssl_version)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/packages/requests/packages/urllib3/util/ssl_.py", line 265, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib64/python3.7/ssl.py", line 423, in wrap_socket
    session=session
  File "/usr/lib64/python3.7/ssl.py", line 870, in _create
    self.do_handshake()
  File "/usr/lib64/python3.7/ssl.py", line 1139, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1091)
...

but, if I log into the machine, downloading with wget works just fine, EDIT no longer true: so I don't think it's a problem on the remote site, I think it's something with cfn-init or the associated Python scripts.

EDIT: From a different autoscaling group, also created by CloudFormation, the remote downloads work:

...
2021-10-07 04:21:10,855 [DEBUG] Writing content to /tmp/mysql80.rpm
2021-10-07 04:21:10,855 [DEBUG] Retrieving contents from https://dev.mysql.com/g
et/mysql80-community-release-el7-2.noarch.rpm
2021-10-07 04:21:11,425 [DEBUG] Setting mode for /tmp/mysql80.rpm to 000600
2021-10-07 04:21:11,425 [DEBUG] Setting owner 0 and group 0 for /tmp/mysql80.rpm
...

So, it's successfully downloading from https://dev.mysql.com/, but failing with an "expired certificate" from https://www.phpmyadmin.net, even though wget and going directly there through a browser don't show any errors.

end EDIT

Any idea on how to debug this?


Solution 1:

Short answer:

Use openssl to find if there is an expired certificate:

openssl s_client -showcerts -connect <falingwebsite.com>:443 | grep "certificate has expired"

Get the new, unexpired certificate and attach it to the AWS package ca_bundle:

cat <newcert>.pem >> /path/to/your/python/env/lib/python3.9/site-packages/cfnbootstrap/packages/requests/cacert.pem

Long answer:

I ended up writing some python scripts to try and download the server side certificate, to see if there was something going awry with the download. I never did get the certificates directly downloaded with urllib3, but while chasing that down on StackOverflow, @clockwatcher discovered the actual problem, and it seems likely it will affect others, and not just with phpMyAdmin.

It has to do with the DST Root CA X3 Expiration, which happened on September 30th. AWS does not include the new certificate in their bundled approved certificates, so you have to manually add the new one.

You can grab the self-signed ISRG Root X1 pem file from LetsEncrypt, and then add it to cfnbootstrap's ca_cert.pem file, as per above.

Now, this solves the problem in the case where you have a running machine; however, it's still an issue for autoscaling groups, as there's no easy way to use the

  AWS::CloudFormation::Init:
    config:
    files:
        /path/to/file:
          source: https://somewhere.using.letsencrypt.com/file   

behavior. So, anywhere in your CloudFormation templates using that paradigm, will need to be rewritten so that both the file you actually want and the LetsEncrypt certificate are downloaded with wget as part of the commands: section. For example,

        01_get_phpmyadmin:
          command: !Join ["",
                       ["wget -O  phpmyadmin.tgz ",
                        "https://www.phpmyadmin.net/downloads/",
                        "phpMyAdmin-latest-all-languages.tar.gz"]]
          cwd: /tmp