System Drive out of space
Incident Report for RATELIMITED
Postmortem

Due to a misconfiguration regarding (forensic) webserver logs, log rotation was not working properly, and the logs were stored on the main system drive (/) instead of the data drive (/d2/).

With the increase in requests due to our Hackerone page, and the hackers that use automated tools to discover vulnerabilities (You know who you are), our logs started filling up around to 10 gigabytes of space per day, and the cleanup process today didn't work.

We have now moved the logs to our data drive (/d2/) and will avoid having such issues in the future by validating our configuration to the maximum extent before deploying.

Posted 5 months ago. Jan 01, 2019 - 22:33 UTC

Resolved
We have validated that the service is now stable again.
Posted 5 months ago. Jan 01, 2019 - 22:29 UTC
Monitoring
We have moved the logs over and have given Apache a kick. Everything should be working.

We will keep monitoring this for the next hour or so.
Posted 5 months ago. Jan 01, 2019 - 22:13 UTC
Update
We have begun the move operations. An estimate of about 40 minutes is in place.
Posted 5 months ago. Jan 01, 2019 - 21:57 UTC
Update
This issue has been located down to a webserver log rotation issue.

We are currently applying a patch and moving logs to our data drive.
Posted 5 months ago. Jan 01, 2019 - 21:54 UTC
Identified
We have identified that our system Drive (*not* our data drive) has run out of storage space.

We have stopped all services in order to avoid data loss and we are currently working on fixing the issue.
Posted 5 months ago. Jan 01, 2019 - 21:52 UTC
This incident affected: Frontend (Frontend (Origin)), APIs (File Uploads, File Storage (Minio)), and Service Functions (User Dashboard).