Need to retire original Jetstream project and move to official one

burke · May 18, 2017, 4:38pm

We need to get resources off of the original Jetstream project (TG-ASC160035) and move everything under the official Jetstream project (TG-ASC170002). I created an ITSM ticket for this task:

https://issues.openmrs.org/browse/ITSM-4009

Assuming you don’t have any work in progress on bungoma for migrating Confluence, I can destroy that server and recreate it under the new project.

These servers need to be moved to the new project:

garissa (at IU under old project)
thika (at TACC under old project)
lamu (at TACC under old project)

If setup for any of these is automated, then let’s destroy the machine under the old project and rebuild it under the official one (TG-ASC170002). Otherwise, we may need to manually move these. I don’t think there’s a way to simply change the project for a server. Instead, we have to create a snapshot and then migrate it. This page describes how to migrate via snapshot. This discussion suggests there’s a way to share a snapshot with another project without having to download it locally (as the first page mentions).

Do you have API access to both projects? I’m happy to help with the migration, but would like to follow your leadership in how to pull it off.

cintiadr · May 21, 2017, 12:56am

Alright.

I’m trying to clean up automation of all machines being migrated to jetstream (using terraform and ansible only to provision them).

So here’s what we can do. I will move lamu today, because it’s staging.

Garissa is trivial to move as well, I can do next weekend. @pascal, do you think you’d be able to create a new machine to replace garissa this week using https://github.com/openmrs/openmrs-contrib-itsm-terraform ? The rationale being it would be awesome to have someone else testing the automation and readme

Thika is the one that needs a little bit of work to be completely automated. I made https://github.com/bmamlin/openmrs-contrib-quizgrader/pull/2 a couple weeks ago, because I need a self-contained image deployed to dockerhub in order to add it to https://github.com/openmrs/openmrs-contrib-ansible-docker-compose . With that, thika is ready to be recreated anytime Do you reckon you could help me with that, @burke ?

burke · May 22, 2017, 5:22am

@cintiadr, I merged your pull request. I was thinking of making docker-compose-dev.yml and docker-compose-prod.yml files that pull in env variables from either dev.env or prod.env, storing the secrets in the .env files instead of in the docker-compose.yml files (and adding these env files to gitignore). Does it matter to you? Is it easier (or required) for ansible-docker-compose to put the configuration directly into the yml files?

cintiadr · May 22, 2017, 5:50am

If you want to deploy both prd and dev, I’d need two folders in the ansible role, like addons.

I’m using .env in ansible, to be able to encrypt it. Here is an example of .env used, encrypted with ansible-vault:

Should be possible to follow the guidelines:

https://github.com/openmrs/openmrs-contrib-itsmresources/wiki/How-to-deploy-new-docker-compose-application

As far as the docker compose file doesn’t have creds, and it’s only using images from dockerhub, it shouldn’t be hard to deploy.

pascal · May 22, 2017, 4:00pm

Sure, sounds fun

burke · May 22, 2017, 5:27pm

Only need to deploy production. Dev is just for local development & testing against a test spreadsheet.

I’ve got quizgrader automatically building, encrypted the env for production secrets, and created a pull request for you to check my work, since I only know enough to be dangerous.

I need to do a little more testing before I can say confidently it’s ready for production in this new configuration, but we’re close!

cintiadr · May 28, 2017, 12:16pm

@burke,

Unfortunately I wasn’t able to move garissa today. It took me way more hours than I want to admit, none of the ubuntu images I tried could start a m1.medium in tacc. The server apparently refused to start every single time (I waited for more than 20 minutes, not even ping was reaching it anymore). m1.small works every time, but as soon as I change the type, network becomes unreachable. Any image I should try?

About thika, I think most of the code is done:

terraform - VM definition and DNS entries (thika.o.o and quizgrader.o.o)
Ansible inventory and host variables
Docker compose file

So, I updated the docker compose with some small changes, but it’s not starting the app I suppose it could the .env folder.

cintiadr · May 28, 2017, 12:34pm

I created TG-ASC???-test using the CLI (and not terraform) and the result is the same: I cannot ping or ssh the machine, even if the security groups allow it.

nova boot TG-ASC??????--test \
--flavor m1.medium \
--image 87e08a17-eae2-4ce4-9051-c561d9a54bde \
--key-name TG-ASC??????-terraform-key \
--security-groups TG-ASC??????-ssh-icmp \
--nic net-name=TG-ASC??????-terraform-private

The key, network, and security groups work just fine with m1.small machines I’ve created before.

burke · June 1, 2017, 3:40pm

There’s some weirdness with our project in the TACC environment. os project show TG-ASC????? works in the IU environment but fails in the TACC environment. I’ve notified folks at Jetstream and they’re looking into it (might be a permissions issue). I’m not sure if this is related to the connectivity problems you’re having… I’ll keep investigating.

cintiadr · June 11, 2017, 8:46am

It appears to be working now!

I will schedule to move garissa next weekend.

So the last one will be thika.

r0bby · June 11, 2017, 10:35am

We should migrate JIRA off of batouri ASAP. I suspect that will fix the LDAP timeouts – I think – it’s pretty much a stab in the dark.

cintiadr · June 12, 2017, 3:39am

I’m not entirely sure which hardware constraints you are referring in this case. While I can see the machine is using more CPU than expected, it’s still somehow ok. It appears to me it could be just JVM GCs because we haven’t tuned the memory settings for crowd and jira (having different values for ms or mx doesn’t really help us that much on that sense either). We appear to be logging GC for crowd, but not JIRA.

Even though it’s not perfect, it’s not so high - and there’s a lot of free memory any given time. No swapping, no bigger disk latency, nothing potentially worrying tbf.

Do you have reason to believe that the problem is the CPU utilization between 30-50%? That would be somehow a surprise to me. Regardless, ID doesn’t touch crowd nor JIRA, which I’d assume are the apps which would be slowed down by the amount of GCs running.

But I don’t really know how to tune openldap or even if some sort of time penalty for using docker would be manifesting itself.

Also, I’d love to have help on adding JIRA and mysql to ansible, so I could migrate it to Jetstream.

r0bby · June 12, 2017, 8:00am

re: openldap running in docker: I highly doubt there’s a penalty for using docker. We had the same issues when running it outside of docker

cintiadr · June 26, 2017, 1:06pm

Thika was recreated, so this should be now done.

@burke, can you confirm thika is now working?

Also, if you could merge:

and move it to openmrs org, it would be easier for me

burke · June 29, 2017, 1:54pm

PR merged, repo moved, thika tested & working. Thanks!