Bamboo agents to be recreated this weekend (1st/2nd August)

Hi peeps!

This weekend, I’m going to be destroying and recreating at least one of our Bamboo agents (the thingies tha run our CI builds). I will do one at a time, and hopefully that will help us with builds running out of disk.

7 Likes

That will be awesome!!! Thanks @cintiadr :slight_smile:

yak agent has been replaced.

Next weekend will be yue. Please let me know if there’s anything weird with those machines.

Thanks @cintiadr!

All bamboo agents were recreated. Please let me know if there’s something weirder than usual.

3 Likes

I will be fixing https://ci.openmrs.org/browse/TRAN-TRAN-1463/log tomorrow

Transifex build is finally fixed.

Thanks for the fix.

Is there any possibility of this being related to? https://ci.openmrs.org/browse/TRUNK-MASTER-2456/log

And: https://ci.openmrs.org/browse/TRUNK-OC3-54/log

Maybe, @dkayiwa.

Core 2.3.0:

 	[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project openmrs-api: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test failed: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?

It seems that the JVMs are just dying.

Could be related to the minor java update that happened. It was update 252, and it’s now 265

So @dkayiwa, the problem wasn’t java.

Turns out there’s an OCL build that is leaving a lot of docker containers behind, started a couple of days ago. They continue to eat all CPU and memory from the agents. The automation that is supposed to help with that doesn’t seem to be working.

Are you able to check with OCL team and make sure they get builds with cleanups? So it doesn’t break other builds?

I don’t know why automatic clean up wasn’t working, but we shouldn’t really rely on it.

1 Like

Thanks @cintiadr for looking into this!

@grace what do you think of this?

1 Like

Do we know which builds weren’t getting cleaned up? I”m guessing this is from the OCL dev team (e.g., work on OCL API v2) and not the OCL for OpenMRS squad.

1 Like

I didn’t have the time to chase the exactly which build that was triggering it, @burke.

But it seems to be OCL API that was up, indeed. I could see the api, Celery and a couple more docker containers. (now, unless the OCL for OpenMRS is starting OCL completely during CI).

I will try to check if there’s any agent suffering of the same problem, and will try to pinpoint the build based on start time.

I’m sure it’s the api team, then. I’ll let them know to make sure they bring down containers post build.

Thanks, @cintiadr!

2 Likes

I believe this is the build: https://ci.openmrs.org/browse/OCL-OCLAPI2

I’ll tag @raff and @sny as they seemed to have touched that build

I’ve added cleanup to https://ci.openmrs.org/build/admin/edit/editBuildTasks.action?buildKey=OCL-OCLAPI2-BO

Thanks for tracking it down!

1 Like

@cintiadr is it in any way related to this? failed; error='Cannot allocate memory' (errno=12) https://ci.openmrs.org/browse/OP-OPM-BS-776/log

I expect it to be related, @dkayiwa

I will do another cleanup in a few hours, hopefully that will cover it