Build and Release Tests on CI Build

burke · July 13, 2015, 1:36pm

I think it’s fine & good to run release tests frequently (i.e., not just when thinking about a release), but we need to be careful how this is communicated – i.e., we need to maintain a primary dashboard (the first thing folks see at CI) with green lights (we cannot forgive any red amongst the green on our main dashboard).

Is it possible to set up a secondary “release” dashboard and/or change the default display at https://ci.openmrs.org/ to be a build dashboard that excludes the release tests?

/cc @darius, @jslawinski, @raff, @cintiadr

michael · July 13, 2015, 4:46pm

Bamboo doesn’t offer multiple dashboards, but it does allow individual users to filter the dashboard how they see fit. One way to make this easier is to apply useful labels to specific plans. That would allow people to quickly “opt in” for their own filter to look at 1+ “collections” of plans.

What labels would people suggest, and which plans should go in those labels?

burke · July 13, 2015, 5:40pm

I think it’s important that the default view of CI be green lights – even if we need to fashion a front page ourselves. A developer’s first impression of CI should be all green and she shouldn’t have to opt-in to filtering out red lights (or even log into Bamboo). Thanks to entropy, red lights on a (default) dashboard are viral.

Can we make a default “collection” other than “all plans”?

michael · July 13, 2015, 5:43pm

Personally, I disagree. Accuracy is more valuable than false advertisement.

That said, it should be clear which failed builds are urgent, and which are not. Filters can help. And AFAIK (someone else may know better) it’s not possible to make a filter default, because they’re user-defined.

jslawinski · July 14, 2015, 7:39am

Do I understand it correctly, that you want to hide the fact that there are broken tests/bugs in OpenMRS and you prefer to pretend that everything is OK?

cintiadr · July 14, 2015, 11:17am

I agree with Michael.

It’s OK if the some tests run less often than others, but the more tests we run on CI, the merrier. And if they are red, they should show up as red, as it’s a call for fixing.

But on the other hand, some builds are somehow more important than others (some builds you cannot wait days before fixing; others you’ll be fine if it’s fixed by the end of the week). Usually we come up with some name convention (for example, a project called ‘Less Important’ or plans are named like 'A - '/ 'B - '/ 'C - ’ ). Labels is a way of doing that, but I found it less visible than a project or plan name.

darius · July 15, 2015, 3:07am

I assume what Burke is fundamentally suggesting is that we should have a simple summary view of the status of CI builds, that covers only the “prioritized” ones, and we can publicize this as the view of CI that everyone should be looking at more often. Then some of our builds can be lower-priority (and perhaps less reliable?).

Given the distributed and volunteer effort of OpenMRS, we can’t force people to respond to build errors in the same way that a commercial team should (e.g. the expectation that a broken build means “drop everything, and someone fixes it ASAP”.) So I could see value in having a smaller set of tests that we push as hard as we can on, and a larger set of tests that get less attention. This shouldn’t mean they are ignored, but it may be a necessary concession to the reality of our limited dedicated resources.

Personally, I think what would be most useful is having a summary widget that shows either “all tests passing” or else lists just the failing builds, that we can embed in other places in OpenMRS’s web presence, so that casual OpenMRS developers who are not going to have an open window to ci.openmrs.org can see when things are broken.

burke · July 15, 2015, 4:28am

What @darius said. I’m not suggesting we hide failing tests; rather, we be careful not to replace an obvious “everything is okay” (or “everything is not okay”) message with a “if you look at this information long enough you will be able to clearly discern whether or not the system is functioning properly” message.

We had a CI page that was spotted red & green a few years ago. Nobody paid any attention to it and broken builds would languish because nobody could see the one speck on a dirty window that needed cleaning. We didn’t solve that problem by getting all tests to pass; rather, we solved it by prioritizing those tests that needed to stay green and limiting our CI page to those tests. Once we had a green page, the red lights stood out and got quicker attention. If we overload CI with more tests than the community can maintain in a timely manner, then we risk returning to a CI that gets ignored. I would love to see all tests green all the time, but we don’t have an infinite number of people sitting around waiting to fix broken tests. The alternatives are to limit our testing to only priority tests or, as we’ve discussed, split tests into build & release stages, ensuring that build tests are always green and ensuring release tests are green when preparing a new release.

jslawinski · July 15, 2015, 9:14am

You shouldn’t kill the messenger.

All the tests should be always run on the CI. If you do not want the test to be displayed as a failed one, you can simply create the appropriate ticket and disable the test with the explicit comment that you do not want to fix it now. However, the information about current status/reliability of the application should be always easily accessible.

I do not understand the arguments about failing tests that will be ignored. If someone break the test, then he should fix it. If he cannot, then the change should be reverted. This rule is really simple and you do not need any resources to enforce it.

The QA team is putting a lot of effort to create test cases and implement automated tests, so this is a little bit discouraging that you are not convinced to fully take a chance and improve your processes to create a better software.

michael · July 15, 2015, 1:34pm

I don’t believe this is historically accurate; rather, we either fixed the failing tests or removed them if they were no longer relevant. (Many were in fact outdated & no longer relevant.)

michael · July 15, 2015, 1:39pm

@cintiadr I hate adding more plugins to our software, but perhaps could something like https://marketplace.atlassian.com/plugins/com.wittified.bamboo.embedded-build-status be used to display a customized filter on another page somewhere with status of the desired build(s)?

burke · July 15, 2015, 3:09pm

Didn’t we remove several builds/modules from CI and focus our efforts on the reference application? I know we went from an ignored CI page to a usable one that was meant to be kept green and I know that we didn’t fix every test that was broken. My point is that we improved the situation by getting to a green page and keeping it green.

We all agree that we should keep all tests passing and I am not suggesting that we hide failing tests. We talked about dividing tests between build & release tests specifically to distinguish between priority & urgency of fixing these tests. If all tests should (and can) be fixed immediately, then there’s no reason to make this distinction.

My assumption (and I believe Darius shares it) is we could easily end up with changes to the UI that break dozens of high-level or edge-testing integration tests not because a bug is introduced, but because all of those UI-specific tests need to be rewritten to match the new UI changes. Developers focused on building the app may not be interested/motivated to maintain a suite of high-level integration tests for the community if it isn’t a priority for the implementation paying their salary. To the extent we can make it easy for them to do maintain those tests so it’s not a big deal and everyone does it, then we’re all good and the distinction between build & release tests may be unnecessary. If our assumptions are correct and maintaining a suite of high level integration (“release”) tests that need to be changed each time the UI is changed is not being addressed immediately by our developers, then we need a process to allow those failing tests to be communicated and fixed via a workflow that doesn’t dilute the green glow of our prioritized (“build”) tests.

michael · July 15, 2015, 3:16pm

That’s right. IIRC some of the builds were for older non-maintained branches, or for our since-abandoned approach of QA testing for the Platform 1.8 release.

We all definitely agree on this point!

cintiadr · July 16, 2015, 2:13am

Yeah, but I suppose if we prefer to have it not in Bamboo, AtlasBoard might be a good thing http://atlasboard.bitbucket.org/

I never created my own, but in theory it’s not hard (and it might be useful).

I think you can only associate with only one JIRA project, but you can add plenty of other infos.

cintiadr · July 16, 2015, 2:18am

Also, Bamboo doesn’t provide REST for everything, but it does at least for build status.

michael · July 16, 2015, 1:56pm

That is a pretty cool-looking project. Do you know of any public demos? I couldn’t find any while poking around quickly.

jslawinski · July 16, 2015, 2:43pm

OK, so please correct me if I am wrong, but this means that we shouldn’t divide these tests unless there will be a problem with failing tests that aren’t fixed in a long-term, right?

I would suggest to keep adding new tests and running all of them on the CI. When there will be some issues with a long-term failing tests, then we can think how to solve them. This issue doesn’t exist now, so we shouldn’t spend too much time on it.

burke · July 16, 2015, 3:22pm

Sounds good to me. In my opinion, if the QA team focuses as much or more on the infrastructure of testing (ensuring tests are easy to make & maintain) as testing itself, we can accomplish more.

QA as testers approach

Developers create unit tests. QA team creates and maintains a suite of integration tests.

The QA team wakes up asking “what new tests do we need?”

QA as infrastructure approach

Developers create & maintain unit tests and integration tests. The QA team focuses on making development and maintenance of tests easier with time (e.g., each month, writing/running/fixing unit tests and integration tests is easier and faster that it was the previous month). The QA team helps build & maintain tests, but their primary responsibility is on the maintaining & improving the testing infrastructure (testing workflows, time to run tests, clarity of failure messages & ease of identifying solution and fixing a failing test, documentation on how to test, etc.) so unit tests run quickly and anyone (QA or dev) can easily create & maintain integration tests.

The QA team wakes up asking “what can we do to our testing infrastructure this week that will make it easier for both us and developers to create, run, and maintain our tests?”

In the typical “QA as testers” approach, as the number of unit tests grows they take longer to run and devs being skipping testing because it takes too long. If we’re lucky, they run tests before pushing changes. Integration testing grows and improves relative to the size & resources of the QA team.

In the QA as infrastructure approach, someone is taking the time (i.e., the QA team) to ensure as the number of unit tests grows they can be run in a timely manner. Developers are less likely to skip testing because it doesn’t slow them down significantly. Integration tests grows and improve relative to the size & resources of the entire community (dev & QA members), since everyone feels ownership of them.

Returning to the topic of build vs. integration tests…

I’m fine with not having a distinction as long as we can maintain a culture of “all tests green all the time”, people know who should be addressing a failing test, and failing tests are addressed quickly. If we find developers are frustrated and/or failing tests are not getting fixed in a timely manner, because our test suite becomes fragile, then we may need to create separate tiers of tests (i.e., different workflows, like a suite of build tests that everyone knows should always pass and a suite of release tests that must pass before releasing a new version but for which a failed state doesn’t equate to “drop everything and fix this now”).

jslawinski · July 17, 2015, 12:35pm

Hi, the distinction between build and integration tests is OK, but should be understand in a way that the build tests are run by the developer on his machine (so the tests shouldn’t take too long time) and the integration tests are run on the CI instance (all the tests should be always run after every commit).

I believe that the QA team currently plays both roles (testers and maintainers of infrastructure). We are preparing a lot of utils that can be reused to quickly and easily create new tests, so all the developers should be able to add their own automated test cases without too much effort. However, we are currently playing the main role in implementing new tests, because there isn’t too many volunteers for QA.

tmueller · July 20, 2015, 9:00am

I changed the configuration to run all Tests on CI. To run only Build Tests on developer machine use ‘mvn clean install -Prun-build-tests’.