Start of automating Reference App tests

As @raff already mentioned, we started to automate our test cases. Currently we are planning to automate the following:

And sorry for the build errors, we’ll do our best to avoid them in the future.

3 Likes

@tmueller, thanks for sharing this plan. I think this topic is definitely worth some discussion.

On the one hand, I think it’s great that this QA effort is increasing our test coverage. On the other hand, I worry that these functional tests are a double-edged sword, and we should be careful about expanding them too far. Specifically, scripted browser tests are (1) slow, (2) brittle, and (3) outside of the standard dev process.

  1. Slow is a problem because the full CI pipeline is already long, and adding lots more of these tests makes it even longer, especially if we start testing all sorts of corner cases. unit tests are better for this.
  2. Brittle is a problem because it’s easy for small UI refactorings to break tests, and cause false alarms
  3. Typically a dev will make a change to some module (e.g. registrationapp) but the functional test for this is somewhere completely different (openmrs-distro-referenceapplication). This is bad for dev workflow.

So, I would generally prefer if we can get our test coverage from (in preference order): (1) “real” unit tests, (2) DB-backed tests, (3) functional tests. (However I know that the other kinds of tests don’t align as nicely with the manual test plans, and they require a different skillset to write.)

Since we’re starting from such a low level of UI test coverage, most of my comments aren’t really relevant yet but I want to make sure we don’t go down a path of writing UI tests for everything, thus giving ourselves a brittle and slow build pipeline.

Perhaps we can divide up the functional tests into a basic suite (basically “smoke tests”), and a more comprehensive suite of longer running tests that don’t block the CI pipeline, and maybe aren’t run on every commit.

Now, looking at the specific tests you’ve mentioned:

RA-700 - Capture Vitals Test

Definitely we want to test this. I suggest just a single test of the happy path.

RA-703 - Contact Info Test

I would prefer if there is a Register Patient test that tests a patient was created with the right contact info, and a separate Edit Contact Info tests.

RA-702 - Merged patient record finding by both names Test

I don’t personally think we need a UI-level test for this. It seems more appropriate to do using junit tests in the main codebase.

RA-686 - Find recenly Patient Record Test

Ok.

RA-718 - Active Visits Edit Patient Test

I don’t like the name of this test. I agree we should have an Edit Demographics test, but it has nothing to do with active visits.

RA-717 - End Visit Test

Sounds good

RA-720 - Visit Note Test

Sounds good

RA-700 - Capture Vitals Test was added at

And it fails due to:

1 Like

This is a great example of the philosophical/practical point we should discuss.

I don’t think that we should be attempting to test every single corner case by automated browser testing.

This test is currently failing because you’re trying to set the respiratory rate to 99999999, but the underlying concept in the CIEL dictionary does not have an absolute upper limit. This is the exact kind of thing that will make our tests slow and brittle, per my prior comment.

I would suggest that the automated browser test should do one single range check, on one of the fields, to make sure the general feature works in the UI layer. And if we want to thoroughly test the data dictionary, we should do so in junit tests in the referencemetadata module, where we currently test just a tiny bit of sentinel metadata, but could do much more.

The other thing: this specific issue (putting an absolute upper bound on respiratory rate) cannot be fixed directly by any developer, because it requires a fix in the CIEL dictionary by @akanter. Please create a ticket in the (new) Terminology and Metadata project in JIRA for this.

I need to understand when something requires an absolute criteria versus just a critical one. In this case RR had a critical limit. I added absolute as well, but knowing when that is required (except for minimum of 0 to prevent negative numbers) would be helpful.

I removed validation from the Capture Vitals Test, so it should pass right now.

It is true that too complicated automated browser tests running on every build may slow down the development process.

But, @darius, what do you think of an idea of having an another set of automated browser test, being run on another server once in a while that would cover more advanced test cases, so that they wouldn’t need to be tested manually?

@tmueller,

I think it makes a lot of sense to have two different suites of automated browser tests, a minimal one (“smoke tests”, run always), and an extended one (“release tests” run less often).

We might also consider having different guidelines/ownership of the suites?

For example if someone breaks one of the minimal test suite, the original dev is responsible for seeing this and fixing it, whereas maybe the QA team owns the extended test suite? And perhaps it’s more okay to @Ignore something out of the extended test suite, whereas breaks in the smoke tests need to be fixed ASAP?

Rather than separating these as Developers' Tests and QAs' Tests (creating the “old school” inefficiencies/challenges of silos by role), how about approaching these as Build Tests and Release Tests? Failing build tests need to be addressed immediately. Failing release tests need to be addressed prior to releasing. Both Devs and QAs share ownership of all testing.

QAs can help by

  • creating and maintaining automated release tests
  • evolving automated release testing so each month it’s less brittle than the previous month (simplify scripting, refine conventions, etc.)
  • help ensure build testing is always getting better over time (more automated, easier to run, etc.)

Devs will focus primarily on build testing in day-to-day development, but will utilize release testing when preparing for a release. The extent to which we can maintain automated release testing, the closer we’ll get to continuous delivery nirvana of one-click releases.

1 Like

So, do you think that:

could be in the release test suite, then?

And how would you classify the next test cases we are planning for automating:

(You wanted this in Build Test Suite, right?)

I think it’s going to be less a decision of whether or not a specific test should be a smoke test vs. a release test; rather, it will be deciding which tests should be prioritized to run as smoke tests. To @darius’ point, I would suggest conventions like this to guide us whether a test is classified as a smoke test or release test:

  • Running all smoke tests should never exceed 5 minutes.* If smoke tests reach this threshold, then existing tests must be made faster or one or more removed before another can be added.
  • Smoke tests should focus on basic functionality (not edge cases or advanced features).
  • Smoke tests should not be brittle (a smoke test that issues a false alarm – i.e., problem is with the test and not the committed code – more than once should be removed or refactored).

Setting a maximum time allotted for smoke tests will help guide decisions. If existing tests are already taking the full time allotted, then the only way to add smoke tests will be to switch lower priority tests to release tests or refactor tests to make them run faster.

In the near term, you can prioritize any tests of basic functionality (e.g., find a patient, register a patient, start a visit, etc.) as a smoke test as long as it’s not brittle and the smoke tests can complete within the allotted time. Over time, we may find that it’s not just basic functionality that warrants smoke testing, but also specific features that are brittle (i.e., repeatedly broken by devs).

*I’ve arbitrarily chosen 5 minutes. A shorter threshold would be welcome, but I don’t think it should be much longer than this. The important point is this threshold should not be a moving target; rather, it should be explicitly defined and well known within the community. In fact, we should consider maintaining a final smoke test that fails if the time elapsed since starting smoke tests has exceeded the threshold. This will help manage expectations and make decisions more straightforward.

I applied a change in the plan configuration to run only Build Tests on each build. Currently, we have marked 5 tests as Build Tests:

1 Like