Increase bamboo agent storage

raff · July 26, 2022, 7:07am

Are we in position to increase bamboo agent storage from 20 GBs to 40 GBs? We are starting to see build failures due to out of disk space. It’s because we are building more Docker images including for the ARM architecture. Let me know and I can handle the change.

@burke, @cintiadr, @ibacher

cintiadr · July 26, 2022, 8:23am

Yeah, should be fine I reckon. I have all the new servers and here’s where we are:

Screen Shot 2022-07-26 at 6.17.02 pm

We need to leave space for whatever else I might need to grow, as I was a bit conservative…

That said, changing size for a volume will make the volume be recreated empty (docs). It’s pretty annoying, but that’s how it is.

If that’s ok, all you need to do it:

disable the bamboo agent in bamboo, ofc, so it doesn’t run any builds
in terraform, change the vm.tf file to temporary allow destroying the data volume
change the size volume for that instance in the variables file
terraform plan/apply should destroy the volume, create a new one, and attach it to the VM
fix your data volume to have the folders you want (I assume rerunning puppet or smth), and renable the agent
update our VM terraform docs via ./build.rb docs && ./build.rb plan docs && ./build.rb apply docs to ensure Terraform'd VMs - OpenMRS community infrastructure is updated.

raff · July 26, 2022, 10:49am

Great. We can always downsize and clear up cache more often, but since we have space I will go ahead and increase. Thank you @cintiadr!

ibacher · July 26, 2022, 1:08pm

Between the m2 cache, the Docker cache and the NPM cache, I suspect we’ll always need more storage on the build agents than most of our other servers…

raff · August 31, 2022, 12:29pm

Just an update that I wasn’t able to increase the storage yet. I’m not able to login to https://js2.jetstream-cloud.org/ and awaiting support from the xsede helpdesk.

UPDATE: got a response and I was able to login. I’ll proceed with expanding storage.

raff · August 31, 2022, 2:21pm

@cintiadr I may need your help. Running apply on yu gives me:

**Error:** **Error creating openstack_blockstorage_volume_v2: Resource not found**

on ../modules/single-machine/vm.tf line 40, in resource "openstack_blockstorage_volume_v2" "data_volume":

40: resource "openstack_blockstorage_volume_v2" "data_volume" {

I have no clue what’s the issue…

raff · September 1, 2022, 9:58am

@cintiadr the change is committed to GitHub - openmrs/openmrs-contrib-itsm-terraform it would be great if you could at least check if plan and apply runs for you for yu.

burke · September 7, 2022, 6:11pm

I just tried and got the same error message.

I tried again with debugging on:

OS_DEBUG=1 TF_LOG=DEBUG ./build.rb apply yu

I’m not skilled enough to interpret all of the log output and I’d hesitant to publish the full log output in case there are secrets that might be revealed. It look like there’s a POST to Jetstream2 to create the data volume with size 40 that’s successful (HTTP 200 OK response) followed an HTTP 404 Not Found. I thought it might be terraform failing to verify the volume was created; however, the POST just before the 404 error looks like it’s from an aws provider trying to get information from AWS with a step described as “Request ec2/DescribeAccountAttributes Details”.

I can see the volume is getting created. The log shows terraform posting to https://js2.jetstream-cloud.org:xxxx/v2/d0ea...9430/ and I can see this volume with a matching tenant id in Jetstream2:

# os volume list
+-------------+--------------------------+--------+------+
| ID          | Name                     | Status | Size |
+-------------+--------------------------+--------+------+
| 46f8...fa95 | TG-ASC170002-data_volume | in-use |   40 |
+-------------+--------------------------+--------+------+
# os volume show 46f8...fa95
+------------------------------+-------------+
| Field                        | Value       |
+------------------------------+-------------+
| os-vol-tenant-attr:tenant_id | d0ea...9430 |
+------------------------------+-------------+

@raff maybe if you tried running in debug mode like I did, you could understand the output better than I. If you just want to view the output, I uploaded the full debug log output from trying to apply the plan to yu.openmrs.org:/home/burke/terraform-apply.log.

cintiadr · September 8, 2022, 11:37am

$ ./build.rb terraform yu "state list"
Running terraform 'state list' on yu
data.terraform_remote_state.base
module.single-machine.data.template_file.provisioning_file
module.single-machine.data.terraform_remote_state.base
module.single-machine.dme_dns_record.hostname
module.single-machine.null_resource.add_gitcrypt_key[0]
module.single-machine.null_resource.add_github_key[0]
module.single-machine.null_resource.ansible[0]
module.single-machine.null_resource.copy_facts[0]
module.single-machine.null_resource.mount_data_volume[0]
module.single-machine.null_resource.upgrade[0]
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm
module.single-machine.openstack_compute_floatingip_v2.ip
module.single-machine.openstack_compute_instance_v2.vm

That’s interesting. The volume itself wasn’t saved on the state file?

Comparing with other ones:

$ ./build.rb terraform xindi  "state list"
Running terraform 'state list' on xindi
data.terraform_remote_state.base
module.single-machine.data.template_file.provisioning_file
module.single-machine.data.terraform_remote_state.base
module.single-machine.dme_dns_record.hostname
module.single-machine.null_resource.add_gitcrypt_key[0]
module.single-machine.null_resource.add_github_key[0]
module.single-machine.null_resource.ansible[0]
module.single-machine.null_resource.copy_facts[0]
module.single-machine.null_resource.mount_data_volume[0]
module.single-machine.null_resource.upgrade[0]
module.single-machine.openstack_blockstorage_volume_v2.data_volume[0]
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm
module.single-machine.openstack_compute_floatingip_v2.ip
module.single-machine.openstack_compute_instance_v2.vm
module.single-machine.openstack_compute_volume_attach_v2.attach_data_volume[0]

Annoyingly, changing volume sizes in Terraform causes the volume to be destroyed. I’m not sure how we ended up in this inconsistent state, but maybe it’s easier if we destroy it and create it all again? I can do it over the weekend.

Otherwise we will have to do some terraform state file surgery, which tends to be error prone and annoying.

ibacher · September 8, 2022, 3:02pm

I stupidly tried to do this and got stuck with this:

09:51:57 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/DescribeAccountAttributes Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 403 Forbidden
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Content-Type: text/xml;charset=UTF-8
Date: Thu, 08 Sep 2022 13:51:56 GMT
Keep-Alive: timeout=20
Server: AmazonEC2
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: accept-encoding
X-Amzn-Requestid: 1e6b6fe0-1557-46c0-beea-05cb9ad9840e


-----------------------------------------------------: timestamp=2022-09-08T09:51:57.337-0400
2022-09-08T09:51:57.337-0400 [INFO]  plugin.terraform-provider-aws_v3.57.0_x5: 2022/09/08 09:51:57 [DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>UnauthorizedOperation</Code><Message>You are not authorized to perform this operation.</Message></Error></Errors><RequestID>1e6b6fe0-1557-46c0-beea-05cb9ad9840e</RequestID></Response>: timestamp=2022-09-08T09:51:57.337-0400

In the error log, which may or may not be the root error…

The messages immediately before that were:

2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Endpoint for volumev2: https://js2.jetstream-cloud.org:8776/v2/d0ea47e384f84fbda90eb955720d9430/
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] openstack_blockstorage_volume_v2 create options: schedulerhints.CreateOptsExt{VolumeCreateOptsBuilder:(*volumes.CreateOpts)(0xc000be1360), SchedulerHints:schedulerhints.SchedulerHints{DifferentHost:[]string(nil), SameHost:[]string(nil), LocalToInstance:"", Query:"", AdditionalProperties:map[string]interface {}(nil)}}
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Request URL: POST https://js2.jetstream-cloud.org:8776/v2/d0ea47e384f84fbda90eb955720d9430/volumes
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Request Headers:
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: Accept: application/json
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: Cache-Control: no-cache
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: Content-Type: application/json
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: User-Agent: HashiCorp Terraform/0.12.31 (+https://www.terraform.io) Terraform Plugin SDK/1.17.2 gophercloud/2.0.0
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: X-Auth-Token: ***
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Request Body: {
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0:   "volume": {
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0:     "name": "TG-ASC170002-data_volume",
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0:     "size": 40
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0:   }
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: }

burke · September 8, 2022, 5:39pm

@cintiadr, from what I could infer from the logs, it looks like the request to Jetstream2 to create the volume succeeded, but the following request (an AWS request… perhaps to persist the state change?) fails with a 404 error. So, it would make sense that Jetstream2 has the volume, but terraform’s state doesn’t show it.

I think @ibacher may have gone in through the horizon interface and manually attached the volume to yu (which could explain why the volume not only exists but is attached to yu despite terraform not completely applying the plan).

Do you get the same error that we’re getting when you try to apply the plan to yu? Maybe we need an upgrade to our terraform or our AWS module?

ibacher · September 12, 2022, 6:01pm

Turns out this was the root cause of nothing at all, but trivial to disable.

Underlying cause seems to be that Jetstream2 now only supports v3 of the openstack_blockstorage_volume resource. v3 has the nice property where resizing a volume doesn’t mean recreating the volume (I suppose only if the size of the volume increases). So yu.openmrs.org is now back with 40 GB of disk space in /data. I’m currently applying the same update to xiao.

./build.rb terraform yu "state list" now looks like this:

module.single-machine.data.template_file.provisioning_file
module.single-machine.data.terraform_remote_state.base
module.single-machine.dme_dns_record.hostname
module.single-machine.null_resource.add_gitcrypt_key[0]
module.single-machine.null_resource.add_github_key[0]
module.single-machine.null_resource.ansible[0]
module.single-machine.null_resource.copy_facts[0]
module.single-machine.null_resource.mount_data_volume[0]
module.single-machine.null_resource.upgrade[0]
module.single-machine.openstack_blockstorage_volume_v3.data_volume[0]
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm
module.single-machine.openstack_compute_floatingip_v2.ip
module.single-machine.openstack_compute_instance_v2.vm
module.single-machine.openstack_compute_volume_attach_v2.attach_data_volume[0]

raff · September 14, 2022, 10:00am

Great news! I suspected v2 wasn’t supported anymore, but I couldn’t find any info about that. Thanks for fixing this while I was away!

burke · September 14, 2022, 2:25pm

FWIW, I added this recipe to our ITSM docs.

burke · September 15, 2022, 4:05am

Now xindi doesn’t have a public IP address.

When I tried plan & apply on xindi, the appy fails:

$ ./build.rb apply xindi
Terraform apply is NOT thread-safe, and there's no lock mechanism enabled. Two concurrent calls on the same stack will cause inconsistences.
Do you really want to modify stack xindi? [y/N]:  y
Running terraform apply on xindi
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm: Creating...
module.single-machine.openstack_compute_volume_attach_v2.attach_data_volume[0]: Creating...

Error: Error creating openstack_compute_floatingip_associate_v2: Resource not found

  on ../modules/single-machine/vm.tf line 35, in resource "openstack_compute_floatingip_associate_v2" "fip_vm":
  35: resource "openstack_compute_floatingip_associate_v2" "fip_vm" {

Error: Error creating openstack_compute_volume_attach_v2 4e800a7a-3bed-47c7-8443-f96e2563b5e2: Resource not found

  on ../modules/single-machine/vm.tf line 53, in resource "openstack_compute_volume_attach_v2" "attach_data_volume":
  53: resource "openstack_compute_volume_attach_v2" "attach_data_volume" {

Are there more OS features that need @ibacher’s trick of switching from v2 to v3?

On a cheerier note: I was able to figure out why our sourceforge uploads haven’t been working and hopefully have fixed them with ITSM-4320.

ibacher · September 15, 2022, 12:12pm

So, xindi being off-line is my fault. The Core 2.5.x build failures seemed to be related to a corrupted Maven cache on xindi and the data volume was weird… there was a /data directory, but no corresponding device when I used fdisk -l, so I figured I’d recreate it using ./build.rb destroy and then ./build.rb plan and ./build.rb apply but I got the same error you’re getting above, which I haven’t been able to track down yet.