Are we in position to increase bamboo agent storage from 20 GBs to 40 GBs? We are starting to see build failures due to out of disk space. It’s because we are building more Docker images including for the ARM architecture. Let me know and I can handle the change.
Yeah, should be fine I reckon. I have all the new servers and here’s where we are:
We need to leave space for whatever else I might need to grow, as I was a bit conservative…
That said, changing size for a volume will make the volume be recreated empty (docs). It’s pretty annoying, but that’s how it is.
If that’s ok, all you need to do it:
- disable the bamboo agent in bamboo, ofc, so it doesn’t run any builds
- in terraform, change the
vm.tf
file to temporary allow destroying the data volume - change the size volume for that instance in the variables file
- terraform plan/apply should destroy the volume, create a new one, and attach it to the VM
- fix your data volume to have the folders you want (I assume rerunning puppet or smth), and renable the agent
- update our VM terraform docs via
./build.rb docs && ./build.rb plan docs && ./build.rb apply docs
to ensure Terraform'd VMs - OpenMRS community infrastructure is updated.
Great. We can always downsize and clear up cache more often, but since we have space I will go ahead and increase. Thank you @cintiadr!
Between the m2 cache, the Docker cache and the NPM cache, I suspect we’ll always need more storage on the build agents than most of our other servers…
Just an update that I wasn’t able to increase the storage yet. I’m not able to login to https://js2.jetstream-cloud.org/ and awaiting support from the xsede helpdesk.
UPDATE: got a response and I was able to login. I’ll proceed with expanding storage.
@cintiadr I may need your help. Running apply on yu gives me:
**Error:** **Error creating openstack_blockstorage_volume_v2: Resource not found**
on ../modules/single-machine/vm.tf line 40, in resource "openstack_blockstorage_volume_v2" "data_volume":
40: resource "openstack_blockstorage_volume_v2" "data_volume" {
I have no clue what’s the issue…
@cintiadr the change is committed to GitHub - openmrs/openmrs-contrib-itsm-terraform it would be great if you could at least check if plan and apply runs for you for yu.
I just tried and got the same error message.
I tried again with debugging on:
OS_DEBUG=1 TF_LOG=DEBUG ./build.rb apply yu
I’m not skilled enough to interpret all of the log output and I’d hesitant to publish the full log output in case there are secrets that might be revealed. It look like there’s a POST to Jetstream2 to create the data volume with size 40 that’s successful (HTTP 200 OK response) followed an HTTP 404 Not Found. I thought it might be terraform failing to verify the volume was created; however, the POST just before the 404 error looks like it’s from an aws provider trying to get information from AWS with a step described as “Request ec2/DescribeAccountAttributes Details”.
I can see the volume is getting created. The log shows terraform posting to https://js2.jetstream-cloud.org:xxxx/v2/d0ea...9430/
and I can see this volume with a matching tenant id in Jetstream2:
# os volume list
+-------------+--------------------------+--------+------+
| ID | Name | Status | Size |
+-------------+--------------------------+--------+------+
| 46f8...fa95 | TG-ASC170002-data_volume | in-use | 40 |
+-------------+--------------------------+--------+------+
# os volume show 46f8...fa95
+------------------------------+-------------+
| Field | Value |
+------------------------------+-------------+
| os-vol-tenant-attr:tenant_id | d0ea...9430 |
+------------------------------+-------------+
@raff maybe if you tried running in debug mode like I did, you could understand the output better than I. If you just want to view the output, I uploaded the full debug log output from trying to apply the plan to yu.openmrs.org:/home/burke/terraform-apply.log
.
$ ./build.rb terraform yu "state list"
Running terraform 'state list' on yu
data.terraform_remote_state.base
module.single-machine.data.template_file.provisioning_file
module.single-machine.data.terraform_remote_state.base
module.single-machine.dme_dns_record.hostname
module.single-machine.null_resource.add_gitcrypt_key[0]
module.single-machine.null_resource.add_github_key[0]
module.single-machine.null_resource.ansible[0]
module.single-machine.null_resource.copy_facts[0]
module.single-machine.null_resource.mount_data_volume[0]
module.single-machine.null_resource.upgrade[0]
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm
module.single-machine.openstack_compute_floatingip_v2.ip
module.single-machine.openstack_compute_instance_v2.vm
That’s interesting. The volume itself wasn’t saved on the state file?
Comparing with other ones:
$ ./build.rb terraform xindi "state list"
Running terraform 'state list' on xindi
data.terraform_remote_state.base
module.single-machine.data.template_file.provisioning_file
module.single-machine.data.terraform_remote_state.base
module.single-machine.dme_dns_record.hostname
module.single-machine.null_resource.add_gitcrypt_key[0]
module.single-machine.null_resource.add_github_key[0]
module.single-machine.null_resource.ansible[0]
module.single-machine.null_resource.copy_facts[0]
module.single-machine.null_resource.mount_data_volume[0]
module.single-machine.null_resource.upgrade[0]
module.single-machine.openstack_blockstorage_volume_v2.data_volume[0]
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm
module.single-machine.openstack_compute_floatingip_v2.ip
module.single-machine.openstack_compute_instance_v2.vm
module.single-machine.openstack_compute_volume_attach_v2.attach_data_volume[0]
Annoyingly, changing volume sizes in Terraform causes the volume to be destroyed. I’m not sure how we ended up in this inconsistent state, but maybe it’s easier if we destroy it and create it all again? I can do it over the weekend.
Otherwise we will have to do some terraform state file surgery, which tends to be error prone and annoying.
I stupidly tried to do this and got stuck with this:
09:51:57 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/DescribeAccountAttributes Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 403 Forbidden
Connection: close
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Content-Type: text/xml;charset=UTF-8
Date: Thu, 08 Sep 2022 13:51:56 GMT
Keep-Alive: timeout=20
Server: AmazonEC2
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: accept-encoding
X-Amzn-Requestid: 1e6b6fe0-1557-46c0-beea-05cb9ad9840e
-----------------------------------------------------: timestamp=2022-09-08T09:51:57.337-0400
2022-09-08T09:51:57.337-0400 [INFO] plugin.terraform-provider-aws_v3.57.0_x5: 2022/09/08 09:51:57 [DEBUG] [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>UnauthorizedOperation</Code><Message>You are not authorized to perform this operation.</Message></Error></Errors><RequestID>1e6b6fe0-1557-46c0-beea-05cb9ad9840e</RequestID></Response>: timestamp=2022-09-08T09:51:57.337-0400
In the error log, which may or may not be the root error…
The messages immediately before that were:
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Endpoint for volumev2: https://js2.jetstream-cloud.org:8776/v2/d0ea47e384f84fbda90eb955720d9430/
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] openstack_blockstorage_volume_v2 create options: schedulerhints.CreateOptsExt{VolumeCreateOptsBuilder:(*volumes.CreateOpts)(0xc000be1360), SchedulerHints:schedulerhints.SchedulerHints{DifferentHost:[]string(nil), SameHost:[]string(nil), LocalToInstance:"", Query:"", AdditionalProperties:map[string]interface {}(nil)}}
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Request URL: POST https://js2.jetstream-cloud.org:8776/v2/d0ea47e384f84fbda90eb955720d9430/volumes
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Request Headers:
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: Accept: application/json
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: Cache-Control: no-cache
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: Content-Type: application/json
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: User-Agent: HashiCorp Terraform/0.12.31 (+https://www.terraform.io) Terraform Plugin SDK/1.17.2 gophercloud/2.0.0
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: X-Auth-Token: ***
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: 2022/09/08 09:51:57 [DEBUG] OpenStack Request Body: {
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: "volume": {
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: "name": "TG-ASC170002-data_volume",
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: "size": 40
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: }
2022-09-08T09:51:57.216-0400 [DEBUG] plugin.terraform-provider-openstack_v1.43.0: }
@cintiadr, from what I could infer from the logs, it looks like the request to Jetstream2 to create the volume succeeded, but the following request (an AWS request… perhaps to persist the state change?) fails with a 404 error. So, it would make sense that Jetstream2 has the volume, but terraform’s state doesn’t show it.
I think @ibacher may have gone in through the horizon interface and manually attached the volume to yu (which could explain why the volume not only exists but is attached to yu despite terraform not completely applying the plan).
Do you get the same error that we’re getting when you try to apply the plan to yu? Maybe we need an upgrade to our terraform or our AWS module?
Turns out this was the root cause of nothing at all, but trivial to disable.
Underlying cause seems to be that Jetstream2 now only supports v3 of the openstack_blockstorage_volume
resource. v3 has the nice property where resizing a volume doesn’t mean recreating the volume (I suppose only if the size of the volume increases). So yu.openmrs.org is now back with 40 GB of disk space in /data. I’m currently applying the same update to xiao.
./build.rb terraform yu "state list"
now looks like this:
module.single-machine.data.template_file.provisioning_file
module.single-machine.data.terraform_remote_state.base
module.single-machine.dme_dns_record.hostname
module.single-machine.null_resource.add_gitcrypt_key[0]
module.single-machine.null_resource.add_github_key[0]
module.single-machine.null_resource.ansible[0]
module.single-machine.null_resource.copy_facts[0]
module.single-machine.null_resource.mount_data_volume[0]
module.single-machine.null_resource.upgrade[0]
module.single-machine.openstack_blockstorage_volume_v3.data_volume[0]
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm
module.single-machine.openstack_compute_floatingip_v2.ip
module.single-machine.openstack_compute_instance_v2.vm
module.single-machine.openstack_compute_volume_attach_v2.attach_data_volume[0]
Great news! I suspected v2 wasn’t supported anymore, but I couldn’t find any info about that. Thanks for fixing this while I was away!
FWIW, I added this recipe to our ITSM docs.
Now xindi doesn’t have a public IP address.
When I tried plan & apply on xindi, the appy fails:
$ ./build.rb apply xindi
Terraform apply is NOT thread-safe, and there's no lock mechanism enabled. Two concurrent calls on the same stack will cause inconsistences.
Do you really want to modify stack xindi? [y/N]: y
Running terraform apply on xindi
module.single-machine.openstack_compute_floatingip_associate_v2.fip_vm: Creating...
module.single-machine.openstack_compute_volume_attach_v2.attach_data_volume[0]: Creating...
Error: Error creating openstack_compute_floatingip_associate_v2: Resource not found
on ../modules/single-machine/vm.tf line 35, in resource "openstack_compute_floatingip_associate_v2" "fip_vm":
35: resource "openstack_compute_floatingip_associate_v2" "fip_vm" {
Error: Error creating openstack_compute_volume_attach_v2 4e800a7a-3bed-47c7-8443-f96e2563b5e2: Resource not found
on ../modules/single-machine/vm.tf line 53, in resource "openstack_compute_volume_attach_v2" "attach_data_volume":
53: resource "openstack_compute_volume_attach_v2" "attach_data_volume" {
Are there more OS features that need @ibacher’s trick of switching from v2 to v3?
On a cheerier note: I was able to figure out why our sourceforge uploads haven’t been working and hopefully have fixed them with ITSM-4320.
So, xindi being off-line is my fault. The Core 2.5.x build failures seemed to be related to a corrupted Maven cache on xindi and the data volume was weird… there was a /data directory, but no corresponding device when I used fdisk -l
, so I figured I’d recreate it using ./build.rb destroy
and then ./build.rb plan
and ./build.rb apply
but I got the same error you’re getting above, which I haven’t been able to track down yet.