Thanks so much for the guidance I will strictly follow each step. Working with you has been a blessing to me.
Found one more small thing: apparently nvm.sh
was not being marked as executable, so the builds depending on nvm failed, but otherwise, xindi seems to have successfully run around 80 builds.
Kindly I don’t have access to AWS how do I go about it?
Ok, I found another small issue but it only affects on artefact: coreapps uses PhantomJS 1 to run some browser-based tests. The problem here is that PhantomJS 1 depends on an old version, vulnerable version of OpenSSL. I’m going to see if those tests can be changed to work with Chrome or Firefox, since we seem to have headless versions of those available on the Bamboo agents.
I’m sorry for leaving the party without notice. I had to deal with some infection and stayed in bed for the last few days. Thanks @ibacher for stepping in.
I see you fixed the coreapps issue already.
Yeah. Switching to Firefox in headless mode seems to work. The agents have Chromium rather than Chrome which, apparently, the (very old) version of karma doesn’t recognise.
Let me get the latest update here before I go to sleep and confuse everyone else in the process.
- All vms should be now created - full list. I can successfully run ansible in all of them, while the actual docker containers/services/atlassian suite haven’t been installed or are probably broken
- I was forced to upgrade datadog and change our custom monitoring to get them working again. Seems fine.
- I’ve cleaned up a bunch of things which I reckon aren’t being used anymore. If I deleted something I shouldn’t have, there’s always git to recover it!
- Instead of all the forks of multiple ansible roles, I just added them to custom_roles folder. It will be simpler for us to maintain it
- I removed Jetstream 1 machines from our ansible inventory, as the changes I’m doing are, most likely than not, incompatible. If you need to change something there, do it manually for the time being, and let me know.
I’m not sure exactly which machines I will be migrating this weekend, but I will let you know when I have a step-by-step on each machine if someone with jetstream/terraform/ansible access would like to help
@cintiadr, you are a ROCK STAR!
Setup
- Follow the instructions on terraform readme file to generate a credentials file with both Jetstream 1 and 2. It only works if you already had terraform permissions
- Ensure you can run
./build.rb plan <machine>
on both a Jetstream 1 and 2. Check theregion
on our docs to differentiate Jetstream 2 (region: v2
) from Jetstream 1 - Ensure you have access to our backups in AWS S3. Read about how to recover docker backups
Per machine
- Change ansible on the machine until you are satisfied with the status
- Verify what needs to have a backup, and how to extract backups. For docker compose apps, check the
backups
. Otherwise, atlassian apps will have their data either in a database or home folder in/data
. - Create a maintenance notification in our status page
- Go to terraform and edit the previous DNS records and add
-v1
via terraform. Change the-v2
record to the new one. Apply viaterraform plan/apply
. Please note that most our DNS entries have a TTL of 5 minutes. - Update the terraform docs via
./build.rb docs && ./build.rb plan docs && ./build.rb apply docs
- Update ansible variables with the new value. You will need to recreate the letsencrypt cert and nginx config (tags
tls
andweb
). Follow the instructions on the README. - Please note that the previous server do not have ansible. So if you need to access it, you may need to modify things manually
- Generate the backups and move them accross to the new server. Apply them.
- Confirm things work as expected
- End the maintenance in out status page
Current state of new and old machines:
Migrated machines
-
bele
andbonga
are ready from my point of view. I’m not sure which parts ofemr-3
need to be migrated? I migrated some. Feel free to edit terraform/ansible to get it back, @ibacher . Also, @raff , feel free to point theOCL
dns entries to the new machines.bonga
will _probably need to be upgraded fromquad
tomedium
soon, so go ahead if you need to do it. From my point of view,balaka
,dowa
,nairobi
,nakuru
andnarok
can be powered off as soon as you let me know -
worabe
, our new CI server, took me a while! It was broken in several different ways, our backups weren’t working, it seems like storing artefacts in S3 was also not working, took me forever to upgrade.lobi
can be powered off. -
adaba
, our new ID server. I migrated crowd, ldap and ID there, but I discovered in the process that our SMTP server isn’t working anymore. I had to do some ungodly tricks with symlinks to get ldap to work with TLS. So new sign ups aren’t working.ako
,ambam
andbaragoi
can be powered off. -
mojo
, our new jira server, seems file.maroua
can be powered off. -
mota
, our new wiki server, seems ok.menji
andsalima
can be powered off
Pending machines:
-
maji
: I’m also struggling to get the new discourse/talk up, it’s complaining about some ruby things I’m clueless about. To be discovered, but I don’t want to migrate before we fix the SMTP issues anyway. Let me know if you’d like to investigate. -
goba
andgode
: miscellaneous services, haven’t even started. Will probably do during the week. -
jinka
: website and several redirects. Haven’t even started. I guess I might to it, at least partially, during the week.
If you think you can help me, please pick goba
, gode
or jinka
.
Updated bonga to include oclclient-prd in addition to stg, qa, demo and pointed to OCL DNS entries. @ibacher is oclclient-dev or oclclient-clone still needed? It’s currently not deployed to bonga. I haven’t updated the VM to medium yet.
oclclient-dev would probably be good to keep up (it’s essentially tracking the master
branch). oclclient-clone was intended to be a short-lived part and can be dropped, AFAIK.
@raff I had deployed oclclient-prd
to bele
, not sure if we should delete from there, then?
Also, we seem to be having certificate issues there with https://openmrs.openconceptlab.org/
Remaining machines
Please note I’m keeping vms docs always updated
jinka
: redirects and website migrated successfully .mua
andcampo
can be powered off. I’ll keep an eye, but if there’s any issues, you can manually change the DNS to the old server and it should automatically work.maji
: same as before. Discourse wasn’t starting last time I checkedgoba
andgode
: miscellaneous services, I haven’t even started.
Known issues
- I reckon backups for atlassian jira/wiki/bamboo is probably not working well and successfully
- We still have the issue of having to restart LDAP every couple of months to pick up new certs. To be honest, I might have added new certificate issues there…
- I was forced to add a
-refresh=false
on our terraform plans as it was attempting to create new data volumes. Not sure what’s happening, maybe it will solve itself on Jetstream
My bad I didn’t notice it. I deleted oclclient-prd
from bonga and let it run on bele. Fixed the certificate issue.
Added oclclient-dev to bonga. I’ll leave the oclclient-clone config around, but I won’t deploy it unless someone asks for it.
Update of the day:
-
maji
has a discourse running. I was forced to move tostable
. Will migrate talk over the weekend. -
gode
is down? Not sure what happened, didn’t touch it.
Thanks @cintiadr, @ibacher, and @raff for the migration. Great to see the progress!
I noticed we can’t edit any pages on the OpenMRS wiki (trying to edit any page returns a System Error page). It appears to be caused by: Confluence MySQL database migration causes content_procedure_for_denormalised_permissions does not exist error. The solution is to include --routines
in the mysqldump
command when backing up to include stored procedures that were introduced since Confluence 7.11.0. I see a mysqldump.sh.j2
ansible template. I’m guessing we’d want to add --routines
to its OPTIONS
, assuming this is what is used to backup our Confluence data. I’m leery to make these changes, since I don’t want to break things when we only have 10 days left to complete the migration.
Can we make a new backup from our Jetstream1 Confluence instance using the --routines
option? I think we need this before our wiki will work again.
In case you haven’t seem, Burke was correct, I copied the routines and it seemed to do the trick.
I come with bad news about talk. I spent hours trying to get the migration going. I wanted to finish it during the weekend, to disrupt you the least. It wasn’t successful at all.
Let’s see what they have to say.
Today’s update:
Please note I’m keeping vms docs always updated
-
maji
: I’m worried about talk. Hopefully the request we open will be enough help. -
gode
: staging for addons and atlas. Done -
goba
: migrated addons and atlas. Missingimplementation
,quizgrader
,shields
andradarproxy
. Should be done this week.
I will continue to delete Jetstream 1 machines as the week progresses.
Known issues
- I reckon backups for atlassian jira/wiki/bamboo is probably not working well and successfully
- We still have the issue of having to restart LDAP every couple of months to pick up new certs. To be honest, I might have added new certificate issues there…
- I was forced to add a
-refresh=false
on our terraform plans as it was attempting to create new data volumes. Not sure what’s happening, maybe it will solve itself on Jetstream
Maybe it’s because our split config only upgrades web by default. So, while our Talk might report itself as, say 2.9.0.beta7, it’s really only that version for the web component and an older version (last manual rebuild of data) for the data component. That could cause havoc for a migration that expects the data to be tests-passed
but is getting data from some arbitrary older state.
Did you rebuild both web and data on prod before creating the backup for migration?
I’m creating a whole new server from scratch. I did delete all the data and rebuild both containers dozens of times.
So turns out the problem was the branch we were using to clone the discourse launcher. Somewhere along the line it changed from master
to main
, but our ansible continued to point to master
.
The new talk server is empty, but finally up! with the new version I needed, 2.9.0.beta7
. I will schedule to migrate talk probably in a few hours from now, my lunch time. I think it will be the least disrupting time.
maji
: New talk is up. I will attempt to migrate it again tomorrow.goba
: I migrate all little things there. I’m not sure ifradarproxy
andshields
are working… they had an empty screen when accessing from the browser, so I’m not if I broke something else.
- Somehow
bonga
machine was tainted (marked for full recreation) in terraform. I undid that because I don’t think we need to delete it right now. - Previous known issues still apply