I am sure many of us have noticed that the site had problems over the last four or five days.
The downtimes were caused because of unrecoverable hardware issues with our server. The result was that he had to rebuild the setup from scratch on a machine and restore from a backup. We have also switched to a new provider since the previous one was consistently causing issues.
After this was done, image uploads were still broken for some time. There was no good fix for this because one of the database files used by pictrs had become corrupted due to the aforementioned hardware issues. To resolve this, we had to use an non-corrupted database file from a very old backup. Now image uploads should work fine. The downside is that images uploaded recently will not show up properly. There is not fix for this other than to reupload these images. Sorry about this.
Most of the technical problems with the site should be fixed now.
There is one maintenance task remaining related to moving the image storage to a new S3 provider which will require some downtime. We will inform about this beforehand.
Thanks for your patience.
Yep, it’s been pretty unstable for the past few days and I appreciate you taking the time to fix it.
Do/should we have a status page so that users know that something is wrong and the maintainers are aware of it?
No status page but we can be more diligent about at least acknowledging these issues in a pinned announcement much like this one. I think that would be the best way.
Uptime Kuma is a FOSS status page software if you’re ever considering one. Allows for status pages and updates as well as automated alerting to basically any message platform.
Is this something that is supposed to be public facing? Or something that some website’s administration use internally? I understand what it does but I don’t know about how it is put into practice in real life.
It can be both. The status pages are meant to be public. Once you make a status page with a tracker on it you can make that the default page that visitors see when they go to the services address (status.lemmygrad.ml for example). You’ll likey want this running on a deferent location so that if the VPS Lemmy is running on fails, the status page stays up and can report the outage.
You can then configure notifications via a huge selection of services. You can have it send a public message from Mastodon, or into a matrix channel, or a private message via email or a private matrix channel. It can sms text you if you have the right services to hook into.
You can track a web service via a Ping, HTTP response code, and a few other methods. Set the frequency it checks at, and if it should wait and retry before reporting an outage.
I use it at work internally to get alerts about switches and servers. It’s simple to host, manage, and back up.
I have one I use to test stuff with and I have a status page for some Lemmy instances here: https://status.redwizard.party/
Thanks a lot
Any time comrade.
Would it be worth making a post on lemmy.ml if lemmygrad becomes inaccessible? I had thought there was a ddos attack, as one of the big servers was hit yesterday, maybe lemmy.world.
We will make a mastodon account for the website for when communication through here is not possible.
A status page is a good idea I would say