On Wednesday, January 12, 2022 at approximately 14:52 UTC (9:52 AM EST) we were notified that some customers were experiencing errors when trying to load plan item data in certain views, including List View and list widgets in Custom Dashboards. We were able to replicate the errors in certain scenarios and began investigating the cause within 10 minutes of the initial report. We also confirmed that most application features and operations on plans and plan items continued to function normally. By 17:00 UTC (12:00 PM), we confirmed that the issue was likely related to an increase in the amount of time it was taking to return some data from queries for plan items and that the issue only began on January 12. Over the succeeding 9 hours, we continued to research and make adjustments to the database. Some of these adjustments caused increase overall load, particularly between 21:00 UTC and 23:00 UTC (4:00 PM - 6:00 PM EST). During this time customers likely experienced significant slowness through the application. At approximately 2:00 UTC on January 13 (9:00 PM EST on January 12) we identified that there was a misconfiguration in the primary production database. We corrected that configuration and saw all system operations immediately return to normal.
This incident did not affect the US Government production environment. No data was lost or corrupted in either environment.
The database misconfiguration was caused by maintenance activities we performed during a production release the evening of January 11, 2022. During the maintenance we inadvertently missed a specific configuration related to the performance of retrieving comments on plan item progress updates. This caused queries that included this comment data to, in certain instances, take much longer to complete. At times, the application would time out waiting on the query and display an error to the user.
We manage all configurations for our environments in version control just as if it were code. We have already reviewed and corrected that the database configuration under version control to ensure that it is not missing any other configuration information. We are also in the process of changing how we store database configuration changes, which are done from time to time to manage system performance and security, to be in a more centralized location to improve review of these changes in our quality processes.