QA Framework fails build plans with Bamboo

kdaud · October 11, 2021, 1:57pm

Its now a week when the module is failing plans with Bamboo. Looking into the commit history, there isn’t any significant merge that could be the cause But wondering why build plans are constantly failing.

The module is configured to run automated tests both with GitHub Actions that uses firefox and Bamboo that uses chrome so as to give confidence on what these tests are automating. Interestingly build plans are passing with GitHub actions But failing with Bamboo.

This is what have done so far,

Running the existing tests locally both with firefox & chrome, and they are all passing.
Manually running the module several times with Bamboo and I observe the following;

All the workflow tests are passing. See the Cucumber Reports
The full build log shows the dependant modules uitestframework, distro-referenceapplication are building successfully and at the end qaframework build summary shows success. On contrary, Bamboo shows a red report(Build plan failure).

3 questions are ringing in my minds while looking for a possible solution;

Is there something am not reading well within the full build report that can be of help to point out the issue?
Could this be associated with some configurations within Bamboo that are failing to produce the right report on the build plans?
Could this be linked with the ongoing effort of realizing refapp 2.12 probably resulting from upgrading some modules? cc: @herbert24

cc: @ibacher @mozzy @sharif @dkayiwa

sharif · October 11, 2021, 2:58pm

Thanks @kdaud , Have also been closely watching this build especially on the logs, However it could be on the remote machine storing bamboo configurations not building some artifacts and they end up not compatible at runtime.

dkayiwa · October 11, 2021, 4:31pm

It is the publish results to cucumber studio task that failed as per the last log. Unfortunately i made so many changes on bamboo that i do not know which one fixed it.

sharif · October 11, 2021, 5:15pm

interesting ,Thanks i actually also run the the ci multiple builds but there was nothing like a difference.Thanks @dkayiwa

kdaud · October 11, 2021, 5:29pm

Not sure whether you meant Bamboo configurations. However, I see some recent configurations here that were made by @ibacher.

@dkayiwa have also noticed some recent Bamboo build plans showing green, does this mean the configurations have been rectified?

dkayiwa · October 11, 2021, 5:39pm

I noticed that they turned green because the Job was disabled (probably accidentally by myself). After enabling it, the plan resumed failing. But this time with real test failures.

kdaud · October 18, 2021, 8:54am

I spent the weekend looking into why Publish results to cucumber studio is failing CI regardless all the tests in qaframework module are passing.

I discovered from the logs that the issue is coming from failure of server certificate verification. curl performs SSL certificate verification by default, using a “bundle” of Certificate Authority (CA) public keys (CA certs).

Most likely, the certificate verification is failing due to a problem with the certificate (either it might be expired, or the name might not be matching the domain name in the URL) because HTTPS server uses a certificate signed by a CA represented in the bundle. This is something have not yet known how to fix.

With the ongoing work on the qaframework, we would like to keep CI green on the module so currently have enabled the task(Publish results to cucumber studio) to run CI and temporarily set it to bypass certificate verification though this is not a solution especially whenever there’s a certificate issue with Web PKI.

cc: @dkayiwa @burke @ibacher @bistenes @mozzy

dkayiwa · October 18, 2021, 1:38pm

This command confirms that the certificate has expired: curl --cert-status https://studio.cucumber.io/cucumber_project/results

mozzy · October 18, 2021, 1:52pm

has this been fixed ?? i can see the Publish results task has passed

dkayiwa · October 18, 2021, 2:00pm

@kdaud just disabled certificate verification, for now.

kdaud · October 18, 2021, 2:04pm

I made a temporarily solution for the task to bypass certificate verification during CI as we look for a solution worth the problem.

@dkayiwa is this certificate renewable or we just need to opt for a new one? And either way, how is it done?

ibacher · October 18, 2021, 2:54pm

It’s a little bit more complicate than an expired certificate. We’re most likely getting bitten by this change to Let’s Encrypt. E.g.,

~ curl -LvsI https://studio.cucumber.io
*   Trying 109.232.236.90...
* TCP_NODELAY set
* Connected to studio.cucumber.io (109.232.236.90) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=studio.cucumber.io
*  start date: Sep 16 09:43:10 2021 GMT
*  expire date: Dec 15 09:43:09 2021 GMT
*  subjectAltName: host "studio.cucumber.io" matched cert's "studio.cucumber.io"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fcdee00f800)
> HEAD / HTTP/2
> Host: studio.cucumber.io
> User-Agent: curl/7.64.1
> Accept: */*
>

October 1st also corresponds to the first failure due to upload task. I think the underlying issue is that the Bamboo build servers are running Ubuntu 16.04 and probably needs to be updated to take account of the newer certificate.

ibacher · October 18, 2021, 4:29pm

I’ve upgrade the TLS packages for the three Bamboo build agents, so the certificates should now be validated correctly.

kdaud · October 18, 2021, 4:57pm

Have changed the flag to -s in order to have the certificates validated when running the task and from the consecutive manual builds have done after the change, it’s certain that the certificates are now being validated correctly.

@ibacher how did you make it?

ibacher · October 18, 2021, 7:22pm

I just updated the SSL certificates on the server.

kdaud · October 18, 2021, 7:35pm

Thanks @ibacher

Next time when they expire will do the same as you did

ibacher · October 18, 2021, 7:43pm

“Next time” will be in 2035. Hopefully by then things will be smoother.

To be clear, it’s not just that “a” certificate expired… SSL certificates are stored as chains where each certificate is signed by an intermediate certificate and the intermediate certificate is signed by a root certificate. There are only a couple dozen widely-recognised root certificates in existence (so your browser and OS don’t need to know the certificates of potentially millions of web sites). This problem came about because one of those root certificates (and a very widely used one) expired. Normal SSL certificates expire relatively frequently (generally under a year, almost always under 3 years; Let’s Encrypt is by default 90 days), but the root certificates generally have expiration dates on the order of 20-25 years.

(The catch is every time something like this happens people say “next time, we’ll do better” and they probably intend to, but 15 years can be quite a long time to keep those intentions over).

kdaud · October 18, 2021, 7:57pm

“Next time” will be in 2035.

I was guessing 2025

Hopefully by then things will be smoother.

IT infrastructures are changing every single day with the magic of open source development.

Well, by that time won’t take me 3 days as I took figuring out the issue. By then I guess will only require me less than an hour to discover the snag and fixing it.

ibacher:

To be clear, it’s not just that “a” certificate expired… SSL certificates are stored as chains where each certificate is signed by an intermediate certificate and the intermediate certificate is signed by a root certificate. There are only a couple dozen widely-recognised root certificates in existence (so your browser and OS don’t need to know the certificates of potentially millions of web sites). This problem came about because one of those root certificates (and a very widely used one) expired. Normal SSL certificates expire relatively frequently (generally under a year, almost always under 3 years; Let’s Encrypt is by default 90 days), but the root certificates generally have expiration dates on the order of 20-25 years.

(The catch is every time something like this happens people say “next time, we’ll do better” and they probably intend to, but 15 years can be quite a long time to keep those intentions over).

Thanks @ibacher for providing a wider scope that has equipped me with more info on the subject

kdaud · December 21, 2021, 3:04pm

Of recent, the task of Publishing results to cucumber studio started failing. It has been 2 months since @ibacher updated the SSL certificates on the server and things were running well however, certificate verification is now failing the build plan as per this log for QA Framework module.

cc: @sharif @dkayiwa

dkayiwa · December 21, 2021, 3:12pm

Do we seriously use those reports? Or should we just get rid of that task?