System Drive out of space
Incident Report for RATELIMITED
Postmortem

Due to a misconfiguration regarding (forensic) webserver logs, log rotation was not working properly, and the logs were stored on the main system drive (/) instead of the data drive (/d2/).

With the increase in requests due to our Hackerone page, and the hackers that use automated tools to discover vulnerabilities (You know who you are), our logs started filling up around to 10 gigabytes of space per day, and the cleanup process today didn't work.

We have now moved the logs to our data drive (/d2/) and will avoid having such issues in the future by validating our configuration to the maximum extent before deploying.

Posted 15 days ago. Jan 01, 2019 - 22:33 UTC

Resolved
We have validated that the service is now stable again.
Posted 15 days ago. Jan 01, 2019 - 22:29 UTC
Monitoring
We have moved the logs over and have given Apache a kick. Everything should be working.

We will keep monitoring this for the next hour or so.
Posted 15 days ago. Jan 01, 2019 - 22:13 UTC
Update
We have begun the move operations. An estimate of about 40 minutes is in place.
Posted 15 days ago. Jan 01, 2019 - 21:57 UTC
Update
This issue has been located down to a webserver log rotation issue.

We are currently applying a patch and moving logs to our data drive.
Posted 15 days ago. Jan 01, 2019 - 21:54 UTC
Identified
We have identified that our system Drive (*not* our data drive) has run out of storage space.

We have stopped all services in order to avoid data loss and we are currently working on fixing the issue.
Posted 15 days ago. Jan 01, 2019 - 21:52 UTC
This incident affected: Frontend, APIs (Upload API, File Storage (Minio)), and Service Functions (Panel).