Elevated SSL Issuance Disruption
Incident Report for AutoPrint Software
Postmortem

After extensive investigation we were able to fully identify the root cause of the connectivity interruptions which turned out to be a byproduct of the Let’s Encrypt SSL bug that occurred on 02.29.2020.

In effort to avoid any erroneous certificate disruptions from the aforementioned Let’s Encrypt SSL bug we had to revoke and then reissue all TLS/SSL certificates prior to Wednesday 03.04.2020 00:00 UTC.

The revocation of all certificates successfully completed in a timely manner, but the reissuing portion of the process stalled and eventually timed out. Normally the revoking and reissuing of certificates can take anywhere between 3-10 seconds but in this case each certificate was taking onwards of 300 seconds which caused sites to either load extremely slow or not load at all. The root cause of the SSL reissue delay was due to Let’s Encrypt hard coded rate limits being exceeded. The rate limitation restriction affected a multitude of companies located throughout the world. One example of a company affected is Atlassian which hosts the status page you are currently viewing. See their explanation here: https://metastatuspage.com

The Let’s Encrypt developers implemented a temporary but substantial limit increase to their architecture that allowed the SSL reissue process to continue at the appropriate speed and successfully complete.

The official Let’s Encrypt explanation as well as a timeline of the actions taken by them: https://community.letsencrypt.org/t/revoking-certain-certificates-on-march-4/114864

From what we can tell these two incidents were isolated and should not reoccur.

Posted Mar 03, 2020 - 21:03 UTC

Resolved
This incident has been fully resolved. Please read the Postmortem for a full explanation of the events that unfolded.
Posted Mar 03, 2020 - 20:24 UTC
Update
We are continuing to monitor for any further issues.
Posted Mar 03, 2020 - 19:38 UTC
Monitoring
A fix was implemented and we will continue to monitor the results for the remainder of the day.
Posted Mar 03, 2020 - 16:09 UTC
Identified
The issue has been identified and a fix has been implemented.
Posted Mar 03, 2020 - 16:07 UTC
Investigating
We're experiencing an elevated level of errors and are currently looking into the issue.
Posted Mar 03, 2020 - 15:00 UTC
This incident affected: AutoPrint Core Software (ACS) (Frontend (Store.*), Backend (Admin.*)) and 3rd Party Integrations (Let's Encrypt).