Overview:
On March 10th, 2023 at 12:55 PM CST, Fleetio experienced a complete outage of our systems. This outage lasted for approximately 48 minutes and impacted all users and services, including the browser app, iOS and Android apps, and the API. The root cause of the issue was a failed deployment initiated by our team, which resulted in an interruption to our database services.
Root Cause:
The root cause of the outage was identified as a failed deployment in our Amazon Web Services (AWS) instance that was intended to improve internal system configuration. An unexpected error occurred during the deployment process, causing all database connections to fail. As a result, our production systems were unable to connect to these databases and our users were met with error messages when attempting to access Fleetio.
Impact:
The outage resulted in a significant impact on our users and services. During the event, users were completely unable to access Fleetio applications.
Resolution:
To resolve the issue, our team worked to identify the errors which led to the deployment failure. We were able to find the root cause and rolled back these changes. We then performed a comprehensive review of our deployment processes to identify areas for improvement, and to prevent similar incidents from occurring in the future.
Conclusion:
As always, we understand that Fleetio is mission critical for our customers, and that any disruption has a real world impact on their operations.We apologize for any inconvenience caused by this outage and appreciate your patience and understanding as we worked to resolve the issue. We remain committed to providing a reliable and resilient system for our users and will continue to prioritize the ongoing improvement of our processes and systems.