Migrating to Jetstream 2!

cintiadr · June 13, 2022, 10:38pm

hello!

Let me put all my thought here so others can chime in.

@burke had already mentioned, but we have until the end of July to move all machines to new Jetstream.

Here’s the complete list of machines: https://docs.openmrs.org/infrastructure/vms.html

I don’t think it’s sensible for us to move all machines in one go, as we are also upgrading the operational system. I think it’s a lot safer if we create new machines and do the change per machine. I’m putting names below, but they are just suggestions, ok?

Requirements:

Upgrade Bamboo, Jira and Confluence. Using such an old version with new versions of the OS is likely to give us all sort of headaches. The docs is here: Upgrade atlassian product · openmrs/openmrs-contrib-itsmresources Wiki · GitHub , and I think anyone can do it?
Upgrade crowd, for the same reason. Honestly, I’m not sure if @permissionerror would be keen on doing this?
@burke if you’d like, you can definitely give us a list of the names of the new machines I’d be keen on you naming them <3

So I’m thinking in this order:

This week, either @permissionerror or myself create the first machine via terraform. Commit that, so others can create machines and run ansible
Let’s make the first machine a bamboo agent, @raff has been keen on making a new shiny bamboo agent for ages. In order to connect a new bamboo agent to the existing bamboo server, we need to whitelist the new IP in terraform.
When @raff is happy with the new agent, more can be created by anyone, and the old ones deleted
After we manage to run terraform and basic ansible, anyone can create any machines except ID (ako, ambam, baragoi) and confluence/jira/bamboo due to the database (salima). Anyone can migrate those machines.
On the meantime, either @permissionerror and myself will be working on the database and ID.

I don’t know if we have anything better than Backups Strategy · openmrs/openmrs-contrib-itsmresources Wiki · GitHub to migrate data from old Jetstream. If we could copy the volume it would be ideal, but I’m not sure yet.

Thoughts?

ibacher · June 14, 2022, 1:14pm

I also want a shiny new Bamboo agent or 3, so this seems a good place to start. I know there’s already a lot on this migration, but do you envision doing this in Ansible or will we continue managing the Bamboo agents with Puppet?

Are there steps we need to take to take secondary DNS names offline before / during the migration? E.g., to migrate things like qa-refapp.openmrs.org (where the Bamboo agent logs in to qa-refapp.openmrs.org to download and pull the latest Docker images).

Thanks for this write-up! It looks like a great way forward!

r0bby · June 15, 2022, 12:47am

I know I’ve been MIA but the I’ve done migrations is create old.new.openmrs.org

So like refapp.new.openmrs.org and set a TTL of 5m (300s) and then when it’s working. Nuke the old and change DNS to refapp.openmrs.org and you’re done.

Lower the TTL on the existing record to 5m (300s) – then when you’re done – change it back 1h (3600s)

cintiadr · June 15, 2022, 8:40am

Let’s keep puppet for now. It’s a lot to deal already.

My recommendation would be using a temp DNS while provisioning. Then powering off the old one, copy the data across, and change to the new DNS. Nothing too fancy.

Bamboo agents are disposable and don’t need that at all.

burke · June 23, 2022, 6:28pm

FYI – it looks like Jetstream1 will be turned off 31 July… so, any infrastructure we haven’t managed to migrate by the end of July will likely be turned off without warning.

@cintiadr would we just (temporarily) double the number of entries in openmrs-contrib-itsm-terraform or create a separate branch for jetstream2? While global_variables.tf will be nearly the same (just need to add ubuntu_22 entries), I think there will be several changes needed within the /base-network folder (e.g., TACC doesn’t really exist anymore, gateways are likely different, etc.).

And is there a need to do some initial network config on Jetstream2? Or will terraform automatically recognize the lack of network settings (routers, gateways, security groups, etc.) and rebuild them when the first machine is provisioned?

cintiadr · June 26, 2022, 9:20am

As we already have multiple cloud providers already, it’s easier to duplicate some things and delete older ones, keeping the same terraform stacks.

Just did that. Now base-network has configs for v1-IU, v1-TACC and v2.

In order to use the CLI, I changed the docs in Moving to Jetstream 2 by cintiadr · Pull Request #5 · bmamlin/jetstream-api-cli · GitHub

In order to use terraform, one needs to generate V2 creds, and update the openrc-personal file to have both V1 and V2 creds. V1 will eventually go away next month.

To create a new machine, I think all you need is:

./build.rb create <new machine name>
<edit variables file inside new folder>
./build plan <new machine name>
./build apply <new machine name>

By default, this machine won’t have any ansible there. You can SSH into it using:

ssh -i terraform/conf/provisioning/ssh/terraform-api.key ubuntu@<machine>.openmrs.org

I’ve created a new machine, xindi, and I’m currently working to get ansible running there successfully.

cintiadr · June 26, 2022, 12:37pm

Got a new bamboo agent connected. I didn’t test it at all, if someone can please

Its name is xindi, and it’s currently disabled in bamboo: Log in as a Bamboo user - OpenMRS Bamboo

I changed the puppet files, please don’t run puppet on any of the older files!

Ansible is working, at least the most basic tasks.

To create a new machine:

./build.rb create <new machine name>
<edit variables file inside new folder, but leave ansible disabled>
./build plan <new machine name>
./build apply <new machine name>

With ssh -i terraform/conf/provisioning/ssh/terraform-api.key ubuntu@<machine>.openmrs.org, you can verify that the machine was successfully created.

Then you edit ansible files to add to the appropriate inventory, and create the adequate hosts var files. Commit, and push to master.

<edit  'ansible=true' in terraform variables >
./build plan <new machine name>
./build apply <new machine name>

That should run ansible as expected.

Please note I didn’t test machines with backups yet.

cintiadr · June 27, 2022, 7:55am

Deleted the following ID staging machines: chitipa, gede and ruiru. We won’t need them for a while.

raff · June 27, 2022, 9:57am

I’m testing it right now. So far just one easy to resolve issue with docker:

27-Jun-2022 09:50:31	6d55eb628d12: Pulling fs layer
27-Jun-2022 09:50:31	open /var/lib/docker/tmp/GetImageBlob2358979331: no such file or directory

https://ci.openmrs.org/browse/TRUNK-MASTER-2940/log

A simple restart of the docker service fixed it: systemctl restart docker. I’ll do some maven and js builds next and let you know how it went.

UPDATE:

Another fix which was needed was related to permissions on .m2 (maven repo). Some dirs got created with root owner whereas it should have been bamboo-agent, thus failing a build with:

27-Jun-2022 10:54:38	[ERROR]   The project org.openmrs.module:uiframework-omod:3.23.0-SNAPSHOT (/home/bamboo-agent/bamboo-agent/xml-data/build-dir/UIFR-UIFR-JOB1/omod/pom.xml) has 1 error
27-Jun-2022 10:54:38	[ERROR]     Unresolveable build extension: Plugin org.openmrs.maven.plugins:maven-openmrs-plugin:1.0.1 or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.apache.maven:maven-plugin-api:jar:2.0, org.apache.maven:maven-artifact:jar:2.0: Could not transfer artifact org.apache.maven:maven-plugin-api:jar:2.0 from/to openmrs-repo (https://mavenrepo.openmrs.org/nexus/content/repositories/public): /home/bamboo-agent/.m2/repository/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.jar.part.lock (Permission denied) -> [Help 2]

https://ci.openmrs.org/browse/UIFR-UIFR-338/log

Fixed running chown -R bamboo-agent /home/bamboo-agent/. Not sure how we ended up with those root owned dirs.

Other than that all seems good! I’ll keep it enabled and be looking out for errors.

cintiadr · June 27, 2022, 11:40am

sawla is our new database. It’s probably ready, we’ll discover as we go That was one of the trickiest ones.

I will start working on the wordpress one, I’ve left in a pretty inconsistent state last time.

I will keep Terraform'd VMs - OpenMRS community infrastructure updated; if region is v2, it means Jetstream 2.

@raff I assume we need all three openconceptlab in new Jetstream, correct? I can provision the machines tomorrow, and I can leave the ansible details for you.

raff · June 27, 2022, 12:04pm

The only thing deployed to these machines is oclclient, which is an ngnix serving some static files. I would use the smallest VMs for them (no need for extra volumes as well). Let me know once provisioned and I can run ansibles.

ibacher · June 27, 2022, 3:57pm

Do we have a Burke-approved list of names?

cintiadr · June 28, 2022, 8:24am

I see, I assume all openconceptlab.org DNS entries aren’t using that machine anymore, and we only have the openmrs client.

Can we have two, one for QA+Staging and one for production?

Nope, we usually went with cities in countries that hosted OpenMRS conferences ‘recently’. I just randomly picked Ethiopia for now, I think it’s a great country for me to learn cities and towns from

ibacher · June 28, 2022, 11:02am

That would be ideal. Honestly, if we have a production instance I’d probably deploy the “prod” instance of 3.x there (o3.openmrs.org) which needs enough resources to run the OpenMRS backend, but shouldn’t be much of a resource hog beyond that.

That’s a happy coincidence since we’ve currently got a serious implementation effort there!

cintiadr · June 28, 2022, 12:15pm

As I’m migrating things over, I’m consolidating some services. I’m also consolidating git repos (if I need to change an ansible module, I’m copying it across to our repository); that can help as we need to apply modifications, and should require less workarounds. So, that’s why a bunch of new files are showing up in custom_roles, I’m tired of having this random modules in other git repos making it so hard to update it. People aren’t updating their roles anyway.

So here’s the current state for new machines:

xindi: can I get a confirmation I can provision other two machines like this, and potentially let me know when can I turn off the previous agents?
salwa: ready for my usage later, when I get any atlassian tools moved over
gode: has basic ansible; it has a (temp) DNS for addons-stg and atlas-stg. Missing deploying all these docker things.
goba: has basic ansible; it has a (temp) DNS for addons, atlas, implementation, quizgrader, shields, radarproxy and sonar. Missing deploying all these docker things.

We need to sort out the docker configuration. We need to change the docker data folder to the /data/, but we probably don’t want to install docker via ansible as it’s already part of our base image. Will do that tomorrow.

Datadog monitoring isn’t working well for device usage. We probably need to upgrade the module.

Extra questions:

Do we need all the openmrs sync instances? Or can we just not move them as I’m not sure they are still being used?
I assume that we are only hosting the openconceptlab client, not openconceptlab anymore. Please confirm if that’s correct
Do we still need openhim and others? The description was PLIR PoC Test Servers, I’m not sure if they are still being used.

I can do it, but what is inside that instance? I can reallocate it somewhere else more appropriate if I have more context.

ibacher · June 28, 2022, 12:37pm

It’s essentially the equivalent of demo.openmrs.org but for the 3.x version of OpenMRS (3.x is a work in progress… it’s not yet the production version but hopefully will be soon). The stack has a database, a Tomcat server and two nginx containers… one that acts as a gateway and one that just serves some static files (the 3.x frontend). In the ansible repo, it’s the emr-3-demo folder. The dev version is in emr-3-dev.

I’m happy for that to happen. Do we need to make updates to the scripts for the things Raff caught above? I can verify they work for what we need and then let you know to turn the jetstream1 ones off.

I don’t think there’s any real point in migrating sonar. We’re not actively using it and it’s quite out of date so if we wanted to setup an on-premise Sonar instance, we’d likely want to start from scratch anyway.

AFAIK they aren’t actively used. @burke?

Yes. OCL itself was migrated to AWS… maybe a year ago? It’s been a while in any case.

@jennifer Do we need to keep the PLIR PoC up? I’m inclined to drop it (the source code is still available).

jennifer · June 28, 2022, 2:01pm

I’m fine dropping it. AFAIK, no one is actively using it.

cintiadr · June 29, 2022, 8:35am

Alright. Registering here for posterity things are not being migrated:

Sonar is not going to be migrated. Backups will be available in mokolo folder, files starting with mysql_data and sonar_extensions
PLIR PoC Test Servers won’t be migrated. Backups will be available in namanga folder, file plir_openmhinm_data (mongodb data)
I’m considering sync servers aren’t being used. They are currently running in goura, but not backups are configured.

I will start working now, and in a couple of hours will update the current state.

jwnasambu · June 29, 2022, 8:38am

@cintiadr I really admire your dedication.

cintiadr · June 29, 2022, 2:34pm

You are so kind

So this is the current state:

there are three bamboo agents in Jetstream 2, xindi, xiao and yu. The last two are currently marked as disabled in Bamboo. If you are confident, go ahead and enable them, and disable yue/yokobue/yak. I will turn them off on the weekend unless needed
goba and gode have docker and the docker-compose files installed correctly. Took me a hot minute. They are ready for the data migration
sawla still waiting for any atlassian migration, probably anything for the weekend

Feel free to pick the data migration for addons, atlas, implementationID, quizgrader, shields and radarproxy (I’m not sure which ones have data or not).

Requirements:

make sure you can ssh the machines
make sure you have access to our backups in AWS
configure your terraform to access Jetstream 2 and 1 (as per docs and personal creds).

Doing the migration:

Take a backup of the existing server. This wiki page should help. You can put it on S3, or you can SCP to your machine.
Go to terraform and change the DNS record of the old machine to something else, like addons-legacy.openmrs.org. Run plan/apply to get the DNS propagated. Wait for the DNS TTL before proceeding.
Restore the backup into the new machine
Go to terraform and change the DNS record of the new machine to the desired DNS.
Go to ansible, change the host vars to change both the letsencrypt certificate and nginx servername. Run the remove-certs and site playbooks (as instructed in the README) to update the value.

Docker gave me quite a headache today. I won’t be doing openmrs tomorrow, but I should be back on Friday.