I don’t think it’s sensible for us to move all machines in one go, as we are also upgrading the operational system. I think it’s a lot safer if we create new machines and do the change per machine. I’m putting names below, but they are just suggestions, ok?
Upgrade crowd, for the same reason. Honestly, I’m not sure if @permissionerror would be keen on doing this?
@burke if you’d like, you can definitely give us a list of the names of the new machines I’d be keen on you naming them <3
So I’m thinking in this order:
This week, either @permissionerror or myself create the first machine via terraform. Commit that, so others can create machines and run ansible
Let’s make the first machine a bamboo agent, @raff has been keen on making a new shiny bamboo agent for ages. In order to connect a new bamboo agent to the existing bamboo server, we need to whitelist the new IP in terraform.
When @raff is happy with the new agent, more can be created by anyone, and the old ones deleted
After we manage to run terraform and basic ansible, anyone can create any machines except ID (ako, ambam, baragoi) and confluence/jira/bamboo due to the database (salima). Anyone can migrate those machines.
On the meantime, either @permissionerror and myself will be working on the database and ID.
I also want a shiny new Bamboo agent or 3, so this seems a good place to start. I know there’s already a lot on this migration, but do you envision doing this in Ansible or will we continue managing the Bamboo agents with Puppet?
Are there steps we need to take to take secondary DNS names offline before / during the migration? E.g., to migrate things like qa-refapp.openmrs.org (where the Bamboo agent logs in to qa-refapp.openmrs.org to download and pull the latest Docker images).
Thanks for this write-up! It looks like a great way forward!
I know I’ve been MIA but the I’ve done migrations is create old.new.openmrs.org
So like refapp.new.openmrs.org and set a TTL of 5m (300s) and then when it’s working. Nuke the old and change DNS to refapp.openmrs.org and you’re done.
Lower the TTL on the existing record to 5m (300s) – then when you’re done – change it back 1h (3600s)
Let’s keep puppet for now. It’s a lot to deal already.
My recommendation would be using a temp DNS while provisioning. Then powering off the old one, copy the data across, and change to the new DNS. Nothing too fancy.
Bamboo agents are disposable and don’t need that at all.
FYI – it looks like Jetstream1 will be turned off 31 July… so, any infrastructure we haven’t managed to migrate by the end of July will likely be turned off without warning.
@cintiadr would we just (temporarily) double the number of entries in openmrs-contrib-itsm-terraform or create a separate branch for jetstream2? While global_variables.tf will be nearly the same (just need to add ubuntu_22 entries), I think there will be several changes needed within the /base-network folder (e.g., TACC doesn’t really exist anymore, gateways are likely different, etc.).
And is there a need to do some initial network config on Jetstream2? Or will terraform automatically recognize the lack of network settings (routers, gateways, security groups, etc.) and rebuild them when the first machine is provisioned?
In order to use terraform, one needs to generate V2 creds, and update the openrc-personal file to have both V1 and V2 creds. V1 will eventually go away next month.
A simple restart of the docker service fixed it: systemctl restart docker. I’ll do some maven and js builds next and let you know how it went.
UPDATE:
Another fix which was needed was related to permissions on .m2 (maven repo). Some dirs got created with root owner whereas it should have been bamboo-agent, thus failing a build with:
27-Jun-2022 10:54:38 [ERROR] The project org.openmrs.module:uiframework-omod:3.23.0-SNAPSHOT (/home/bamboo-agent/bamboo-agent/xml-data/build-dir/UIFR-UIFR-JOB1/omod/pom.xml) has 1 error
27-Jun-2022 10:54:38 [ERROR] Unresolveable build extension: Plugin org.openmrs.maven.plugins:maven-openmrs-plugin:1.0.1 or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.apache.maven:maven-plugin-api:jar:2.0, org.apache.maven:maven-artifact:jar:2.0: Could not transfer artifact org.apache.maven:maven-plugin-api:jar:2.0 from/to openmrs-repo (https://mavenrepo.openmrs.org/nexus/content/repositories/public): /home/bamboo-agent/.m2/repository/org/apache/maven/maven-plugin-api/2.0/maven-plugin-api-2.0.jar.part.lock (Permission denied) -> [Help 2]
@raff I assume we need all three openconceptlab in new Jetstream, correct? I can provision the machines tomorrow, and I can leave the ansible details for you.
The only thing deployed to these machines is oclclient, which is an ngnix serving some static files. I would use the smallest VMs for them (no need for extra volumes as well). Let me know once provisioned and I can run ansibles.
I see, I assume all openconceptlab.org DNS entries aren’t using that machine anymore, and we only have the openmrs client.
Can we have two, one for QA+Staging and one for production?
Nope, we usually went with cities in countries that hosted OpenMRS conferences ‘recently’. I just randomly picked Ethiopia for now, I think it’s a great country for me to learn cities and towns from
That would be ideal. Honestly, if we have a production instance I’d probably deploy the “prod” instance of 3.x there (o3.openmrs.org) which needs enough resources to run the OpenMRS backend, but shouldn’t be much of a resource hog beyond that.
That’s a happy coincidence since we’ve currently got a serious implementation effort there!
As I’m migrating things over, I’m consolidating some services. I’m also consolidating git repos (if I need to change an ansible module, I’m copying it across to our repository); that can help as we need to apply modifications, and should require less workarounds. So, that’s why a bunch of new files are showing up in custom_roles, I’m tired of having this random modules in other git repos making it so hard to update it. People aren’t updating their roles anyway.
So here’s the current state for new machines:
xindi: can I get a confirmation I can provision other two machines like this, and potentially let me know when can I turn off the previous agents?
salwa: ready for my usage later, when I get any atlassian tools moved over
gode: has basic ansible; it has a (temp) DNS for addons-stg and atlas-stg. Missing deploying all these docker things.
goba: has basic ansible; it has a (temp) DNS for addons, atlas, implementation, quizgrader, shields, radarproxy and sonar. Missing deploying all these docker things.
We need to sort out the docker configuration. We need to change the docker data folder to the /data/, but we probably don’t want to install docker via ansible as it’s already part of our base image. Will do that tomorrow.
Datadog monitoring isn’t working well for device usage. We probably need to upgrade the module.
Extra questions:
Do we need all the openmrs sync instances? Or can we just not move them as I’m not sure they are still being used?
I assume that we are only hosting the openconceptlab client, not openconceptlab anymore. Please confirm if that’s correct
Do we still need openhim and others? The description was PLIR PoC Test Servers, I’m not sure if they are still being used.
I can do it, but what is inside that instance? I can reallocate it somewhere else more appropriate if I have more context.
It’s essentially the equivalent of demo.openmrs.org but for the 3.x version of OpenMRS (3.x is a work in progress… it’s not yet the production version but hopefully will be soon). The stack has a database, a Tomcat server and two nginx containers… one that acts as a gateway and one that just serves some static files (the 3.x frontend). In the ansible repo, it’s the emr-3-demo folder. The dev version is in emr-3-dev.
I’m happy for that to happen. Do we need to make updates to the scripts for the things Raff caught above? I can verify they work for what we need and then let you know to turn the jetstream1 ones off.
I don’t think there’s any real point in migrating sonar. We’re not actively using it and it’s quite out of date so if we wanted to setup an on-premise Sonar instance, we’d likely want to start from scratch anyway.
there are three bamboo agents in Jetstream 2, xindi, xiao and yu. The last two are currently marked as disabled in Bamboo. If you are confident, go ahead and enable them, and disable yue/yokobue/yak. I will turn them off on the weekend unless needed
goba and gode have docker and the docker-compose files installed correctly. Took me a hot minute. They are ready for the data migration
sawla still waiting for any atlassian migration, probably anything for the weekend
Feel free to pick the data migration for addons, atlas, implementationID, quizgrader, shields and radarproxy (I’m not sure which ones have data or not).
Requirements:
make sure you can ssh the machines
make sure you have access to our backups in AWS
configure your terraform to access Jetstream 2 and 1 (as per docs and personal creds).
Doing the migration:
Take a backup of the existing server. This wiki page should help. You can put it on S3, or you can SCP to your machine.
Go to terraform and change the DNS record of the old machine to something else, like addons-legacy.openmrs.org. Run plan/apply to get the DNS propagated. Wait for the DNS TTL before proceeding.
Restore the backup into the new machine
Go to terraform and change the DNS record of the new machine to the desired DNS.
Go to ansible, change the host vars to change both the letsencrypt certificate and nginx servername. Run the remove-certs and site playbooks (as instructed in the README) to update the value.
Docker gave me quite a headache today. I won’t be doing openmrs tomorrow, but I should be back on Friday.