I think it’s fine & good to run release tests frequently (i.e., not just when thinking about a release), but we need to be careful how this is communicated – i.e., we need to maintain a primary dashboard (the first thing folks see at CI) with green lights (we cannot forgive any red amongst the green on our main dashboard).
Is it possible to set up a secondary “release” dashboard and/or change the default display at https://ci.openmrs.org/ to be a build dashboard that excludes the release tests?
Bamboo doesn’t offer multiple dashboards, but it does allow individual users to filter the dashboard how they see fit. One way to make this easier is to apply useful labels to specific plans. That would allow people to quickly “opt in” for their own filter to look at 1+ “collections” of plans.
What labels would people suggest, and which plans should go in those labels?
I think it’s important that the default view of CI be green lights – even if we need to fashion a front page ourselves. A developer’s first impression of CI should be all green and she shouldn’t have to opt-in to filtering out red lights (or even log into Bamboo). Thanks to entropy, red lights on a (default) dashboard are viral.
Can we make a default “collection” other than “all plans”?
Personally, I disagree. Accuracy is more valuable than false advertisement.
That said, it should be clear which failed builds are urgent, and which are not. Filters can help. And AFAIK (someone else may know better) it’s not possible to make a filter default, because they’re user-defined.
It’s OK if the some tests run less often than others, but the more tests we run on CI, the merrier. And if they are red, they should show up as red, as it’s a call for fixing.
But on the other hand, some builds are somehow more important than others (some builds you cannot wait days before fixing; others you’ll be fine if it’s fixed by the end of the week). Usually we come up with some name convention (for example, a project called ‘Less Important’ or plans are named like 'A - '/ 'B - '/ 'C - ’ ). Labels is a way of doing that, but I found it less visible than a project or plan name.
I assume what Burke is fundamentally suggesting is that we should have a
simple summary view of the status of CI builds, that covers only the
“prioritized” ones, and we can publicize this as the view of CI that
everyone should be looking at more often. Then some of our builds can be
lower-priority (and perhaps less reliable?).
Given the distributed and volunteer effort of OpenMRS, we can’t force
people to respond to build errors in the same way that a commercial team
should (e.g. the expectation that a broken build means “drop everything,
and someone fixes it ASAP”.) So I could see value in having a smaller set
of tests that we push as hard as we can on, and a larger set of tests that
get less attention. This shouldn’t mean they are ignored, but it may be a
necessary concession to the reality of our limited dedicated resources.
Personally, I think what would be most useful is having a summary widget
that shows either “all tests passing” or else lists just the failing
builds, that we can embed in other places in OpenMRS’s web presence, so
that casual OpenMRS developers who are not going to have an open window to
ci.openmrs.org can see when things are broken.
What @darius said. I’m not suggesting we hide failing tests; rather, we be careful not to replace an obvious “everything is okay” (or “everything is not okay”) message with a “if you look at this information long enough you will be able to clearly discern whether or not the system is functioning properly” message.
We had a CI page that was spotted red & green a few years ago. Nobody paid any attention to it and broken builds would languish because nobody could see the one speck on a dirty window that needed cleaning. We didn’t solve that problem by getting all tests to pass; rather, we solved it by prioritizing those tests that needed to stay green and limiting our CI page to those tests. Once we had a green page, the red lights stood out and got quicker attention. If we overload CI with more tests than the community can maintain in a timely manner, then we risk returning to a CI that gets ignored. I would love to see all tests green all the time, but we don’t have an infinite number of people sitting around waiting to fix broken tests. The alternatives are to limit our testing to only priority tests or, as we’ve discussed, split tests into build & release stages, ensuring that build tests are always green and ensuring release tests are green when preparing a new release.
All the tests should be always run on the CI. If you do not want the test to be displayed as a failed one, you can simply create the appropriate ticket and disable the test with the explicit comment that you do not want to fix it now. However, the information about current status/reliability of the application should be always easily accessible.
I do not understand the arguments about failing tests that will be ignored. If someone break the test, then he should fix it. If he cannot, then the change should be reverted. This rule is really simple and you do not need any resources to enforce it.
The QA team is putting a lot of effort to create test cases and implement automated tests, so this is a little bit discouraging that you are not convinced to fully take a chance and improve your processes to create a better software.
Didn’t we remove several builds/modules from CI and focus our efforts on the reference application? I know we went from an ignored CI page to a usable one that was meant to be kept green and I know that we didn’t fix every test that was broken. My point is that we improved the situation by getting to a green page and keeping it green.
We all agree that we should keep all tests passing and I am not suggesting that we hide failing tests. We talked about dividing tests between build & release tests specifically to distinguish between priority & urgency of fixing these tests. If all tests should (and can) be fixed immediately, then there’s no reason to make this distinction.
My assumption (and I believe Darius shares it) is we could easily end up with changes to the UI that break dozens of high-level or edge-testing integration tests not because a bug is introduced, but because all of those UI-specific tests need to be rewritten to match the new UI changes. Developers focused on building the app may not be interested/motivated to maintain a suite of high-level integration tests for the community if it isn’t a priority for the implementation paying their salary. To the extent we can make it easy for them to do maintain those tests so it’s not a big deal and everyone does it, then we’re all good and the distinction between build & release tests may be unnecessary. If our assumptions are correct and maintaining a suite of high level integration (“release”) tests that need to be changed each time the UI is changed is not being addressed immediately by our developers, then we need a process to allow those failing tests to be communicated and fixed via a workflow that doesn’t dilute the green glow of our prioritized (“build”) tests.
OK, so please correct me if I am wrong, but this means that we shouldn’t divide these tests unless there will be a problem with failing tests that aren’t fixed in a long-term, right?
I would suggest to keep adding new tests and running all of them on the CI. When there will be some issues with a long-term failing tests, then we can think how to solve them. This issue doesn’t exist now, so we shouldn’t spend too much time on it.
Sounds good to me. In my opinion, if the QA team focuses as much or more on the infrastructure of testing (ensuring tests are easy to make & maintain) as testing itself, we can accomplish more.
QA as testers approach
Developers create unit tests. QA team creates and maintains a suite of integration tests.
The QA team wakes up asking “what new tests do we need?”
QA as infrastructure approach
Developers create & maintain unit tests and integration tests. The QA team focuses on making development and maintenance of tests easier with time (e.g., each month, writing/running/fixing unit tests and integration tests is easier and faster that it was the previous month). The QA team helps build & maintain tests, but their primary responsibility is on the maintaining & improving the testing infrastructure (testing workflows, time to run tests, clarity of failure messages & ease of identifying solution and fixing a failing test, documentation on how to test, etc.) so unit tests run quickly and anyone (QA or dev) can easily create & maintain integration tests.
The QA team wakes up asking “what can we do to our testing infrastructure this week that will make it easier for both us and developers to create, run, and maintain our tests?”
In the typical “QA as testers” approach, as the number of unit tests grows they take longer to run and devs being skipping testing because it takes too long. If we’re lucky, they run tests before pushing changes. Integration testing grows and improves relative to the size & resources of the QA team.
In the QA as infrastructure approach, someone is taking the time (i.e., the QA team) to ensure as the number of unit tests grows they can be run in a timely manner. Developers are less likely to skip testing because it doesn’t slow them down significantly. Integration tests grows and improve relative to the size & resources of the entire community (dev & QA members), since everyone feels ownership of them.
Returning to the topic of build vs. integration tests…
I’m fine with not having a distinction as long as we can maintain a culture of “all tests green all the time”, people know who should be addressing a failing test, and failing tests are addressed quickly. If we find developers are frustrated and/or failing tests are not getting fixed in a timely manner, because our test suite becomes fragile, then we may need to create separate tiers of tests (i.e., different workflows, like a suite of build tests that everyone knows should always pass and a suite of release tests that must pass before releasing a new version but for which a failed state doesn’t equate to “drop everything and fix this now”).
Hi, the distinction between build and integration tests is OK, but should be understand in a way that the build tests are run by the developer on his machine (so the tests shouldn’t take too long time) and the integration tests are run on the CI instance (all the tests should be always run after every commit).
I believe that the QA team currently plays both roles (testers and maintainers of infrastructure). We are preparing a lot of utils that can be reused to quickly and easily create new tests, so all the developers should be able to add their own automated test cases without too much effort. However, we are currently playing the main role in implementing new tests, because there isn’t too many volunteers for QA.