[Scheduled Outage] ID dashboard Upgrade

Tags: #<Tag:0x00007f78ffe9f320> #<Tag:0x00007f78ffe9f258>

I will be bringing ID Dashboard down for a period of around 30-45 minutes, but will allow for 1 hour in the maintainmance window.

This upgrade will occur likely Sunday at some point. During this time, you will be unable to authenticate with all services. I will do everything I can to bring it back up as quickly as possible. I have not yet decided when on Sunday I will do the upgrade.

OpenMRS ID is the global identity system for OpenMRS. During this time you will not be able to login to the following services:

  • JIRA(Issues)
  • Confluence(Wiki)
  • Talk – You’re using this to read this
  • Modulus

Please plan to not need to use any of the following, or remain logged in. You should be able to continue to use it if you remain authenticated. This should be true for all the services above.

2 Likes

Also as an FYI – I will attempt to give at least a few hours notice, but expect it anytime.

Update

This will happen at 5:00 PM EDT (9:00 PM UTC). I have scheduled the outage on the OpenMRS Infra status system.

I am delaying this a bit – will begin at 7:00 PM

#Update

It should be back up… I’m not sure if Atlas and Desk are working.

Reverted back. There’s some things I can’t test in the staging environment and I don’t want the pressure to fix it hanging over me. Will try to deploy again when I fix the issue.

Is there any timeline when can we expect the next attempt to deploy new version of ID?

I had the impression the new version should handle better openldap timeouts, preventing the huge amount of tickets we have related to that.

@r0bby, as @cintiadr was taking me through this today, we were not able to create any account because of time out problems. Is there anything we could do to help reduce the pressure on you such that the new version gets deployed? :slight_smile:

So yeah I seem to be bad with after action reports:

Everything works great – except for SSO with multipass SSO clients (Atlas and Desk)…I’m not sure I have a way to test whether things work without actually deploying – or at least I don’t know how to do it. I PM’d @burke on Telegram but got no response – I need access to Desk – I have the necessary access to the Atlas server.

@pascal, @surangak, @burke or @jeffneiman, can any of you give @r0bby access to Desk? A number of people are failing to create openmrs ids and i cannot even help them to manually create these accounts because of time out errors.

Yeah – I can’t see why the code is failing either. So yay. I double checked the code, it should work…something is weird

Update

I got Atlas running locally – and tested it – using Docker \o/ – SSO works perfectly – I suspect it’s an issue in production – though this shouldn’t cause an issue as I’m running the same code…

As for Desk.com – That’s something I can only test in production – but I’m happy that the code isn’t broken.

@r0bby did you actually do the upgrade? Whatever you did, i just wanted to report that it now works acceptably well. I had it fail only two times (out of the many that i have been using it since you did this) and no more. So thanks again for the awesome work! :smile:

No, I did not. I wound up rolling back. The reason for rolling back was because I did not want to spend a week babysitting it to see what’s broken. I’d rather NOT have the identity system in a broken state…that’s bad. I won’t be attempting the upgrade again until I figure out why SSO with Desk and Atlas is failing – but only in production…this is too critical of a system to play with.

I didn’t change anything with regards to it failing, it was a configuration setting for the LDAP connection to ID Dashboard that was bad - that part wasn’t documented in the best way…

Oh i see! Thanks for figuring out the configuration setting for the LDAP connection to ID Dashboard that was bad. :slight_smile:

@r0bby looks like we are back to the same problem of LDAPError: Timeout Do you think a restart will do the trick? Do you have any pointers? Or anything i can do to help?

@dkayiwa – keep trying – sometimes it happens. It shouldn’t take more than 2-3 tries.