Backup status check

Hey @cintiadr,

It happened again that backups for OCL were not being created and pushed to AWS. Ignoring the reason, which is now fixed, do you know of any way to have a check, which would warn us, if it happens again? The issue is that I discovered backups are broken after 3 months, but fortunately nothing bad happened in the meantime.

I could write some script to do that, which could be run by our CI, but maybe there’s already a solution for that? Basically check, if there’s a new backup on AWS every 24hs.

The ticket would be

Ideally that would be a datadog check, that’s my favourite way to be alerted. I’m already pushing pending files and failing files:

I just created an alert if we don’t have any files pending to be uploaded for more than 24h. I think it should catch most problems?

Can you also fix backups for nakuru?

Nakuru should be fixed now, thanks for double checking.

Yes, the alert you’ve just created should be good enough for most cases, thanks!

Should we be using latest or 1.0.0 for the cron-backup image? I have both of them.

Latest did not work for OCL, see

Thus I’ve created the 1.0.0 tag (from an older revision) and fixed to use that instead. I didn’t know it’s used anywhere else…

Looks good to me.

Do you reckon we should revert that merge them? If that’s what broke it.

We are using latest in most places, but I haven’t deployed it in a lot of machines.

The change breaks scheduled backups. You can still trigger backup manually though.

I think it should be reverted, unless some service depends on changes it introduced. It changed the backup/restore commands syntax from TIMESTAMP to backup restore TIMESTAMP

Also it’s safer not to use the latest tag for services…

1 Like

I will try to work on it tomorrow. I haven’t realised there was a breaking change there.

I think I reverted the commit and ‘latest’ tag got updated. I will allow give it a day to make sure it’s working.

After that, I do plan on update all docker compose configurations to use ‘latest’ and remove ‘1.0.0’, so maybe you’d like to do another test?