SSL certificate error preventing access to web application in US Government environment
Incident Report for AchieveIt
Postmortem

Summary

On Monday, November 29, 2021 at approximately 15:56 UTC we became aware that the SSL certificate for the domain gov-api.achieveit.com expired and was not properly updated with a new certificate. This domain is used by the AchieveIt web application to send and retrieve data between the browser and the server. When the SSL certificate expired, the user’s browser began rejecting requests to that domain causing required data connections to fail. This interrupted the service for most customers using the system in our US Government environment. We remediated the problem by manually updating the SSL certificate and by 16:40 UTC all service was fully restored.

Root Cause

We use an automated process to update the SSL certificate for each service as that certificate approaches its expiration. Most of those SSL certificates have a one year valid lifespan and are renewed between 15-30 days before the expiration date. In the the case of the gov-api.achieveit.com domain, the SSL certificate was updated as expected but the process that makes the certificate available to use failed. This resulted in the previous version of the certificate being used after it expired and subsequently browsers contacting the gov-api.achieveit.com domain rejecting the responses due to the expired certificate.

Our investigation uncovered that the root cause was a missing permission that caused the process failure. Upon correcting the missing permission, we were able to trigger the automated certificate update successfully.

Mitigation Actions

In order to reduce the likelihood of a similar failure interrupting the service in the future, we have implemented additional monitoring to detect expired SSL certificates. We also corrected the missing permission to ensure that future automated certificate renewals are propagated successfully.

Posted Nov 30, 2021 - 14:24 EST

Resolved
We have verified that all system access has returned to normal.
Posted Nov 29, 2021 - 14:07 EST
Monitoring
We have updated the SSL certificate and confirmed that the errors preventing the app from functioning are cleared. We will continue to monitor the issue for the next hour.
Posted Nov 29, 2021 - 11:40 EST
Identified
We have identified that one of the SSL certificates in our US Government environment was not automatically renewed as it should have been. We're working to update that certificate and restore service.
Posted Nov 29, 2021 - 11:20 EST
Investigating
We are currently investigating this issue.
Posted Nov 29, 2021 - 10:56 EST
This incident affected: Web Application - US Government Environment.