1.11.4 upgrade fails due to person change to datetime

dkayiwa · October 7, 2015, 8:48pm

Continuing the discussion from Changing Person birthdate to datetime:

Thanks @sunbiz for catching the upgrade to 1.11.4 problem with this change!

paul · October 8, 2015, 4:06am

So this is a learning opportunity for us.

I don’t want to problem solve in this case (as I think that’s up to folks like you Daniel), but it seems like the fundamental issue here is that we released a new feature in a maintenance release. At least this is what the paper trail seems to suggest.

I believe our convention is that we don’t do that.

https://wiki.openmrs.org/display/docs/Conventions

Seemingly because, among other reasons, the testing rigor of a major or minor release is very different than a maintenance (bugfix) release.

If you look at the paper trail, all had good intentions… an attempt to be responsive to the Bahmni team, @darius trying to simplify things.

Also, technically this is a new feature, and not a bugfix, so it wouldn’t be backported to the release branches.

However, given that this will be very small, and should not affect anything else (i.e. it’s a single, optional field, which doesn’t affect validation), I don’t see a problem backporting it.

Well, in this case according to @michael, the “small” feature had a database schema modification. That update is what bunged the upgrade process.

So, we should probably revisit and remind people that subjectively measuring “small vs. large” might not be as effective of an approach as simply saying “feature enhancements should never be applied to maintenance releases unless there are extraordinary circumstances surrounding it”.

Just my $0.02

burke · October 8, 2015, 1:23pm

Another potential learning point: was this caught by CI or did @sunbiz discover it himself?

mogoodrich · October 8, 2015, 1:46pm

It’s tough. I’m in favor of being very careful about what we add to maintenance release, and DB schema seem like a particular red flag. But, considering the infrequency of OpenMRS releases, there needs to be some flexibility. Would allowing this change had been worth it if the other option was Bahmni team creating their own core branch and no longer contributing to core? Not saying that would happen here, but, as an example, I haven’t done any development against core in years besides bug fixes and other things that can be backported, because the release time frame makes it prohibitive versus just releasing something in a custom module. Now maybe that’s okay, but we should be aware of it.

Mark

mogoodrich · October 8, 2015, 1:50pm

Also, haven’t followed to deeply what happened in this case, but it sounds like the problem is that the new feature was not implemented correctly, versus it being a new feature that inadvertently/unknowingly caused a backwards-incompatibility problem (a much bigger problem, imo).

Mark

dkayiwa · October 8, 2015, 1:59pm

@sunbiz discovered it himself!

lluismf · October 8, 2015, 2:16pm

If CI (or any other automated test) were able to detect bugs in upgrades (involving DB metadata changes) I would be impressed.

burke · October 8, 2015, 3:28pm

I would hope & expect CI could detect if the application cannot be deployed (like the bug @sunbiz discovered in TRUNK-4773). When the change was applied to master, I would expect CI’s attempt to build & deploy to fail with that commit.

michael · October 8, 2015, 3:53pm

A post was split to a new topic: On improving testing upgrades for releases

wyclif · October 8, 2015, 7:22pm

CI definitely would’ve have caught this, another reason now why we really need to set up the CI plans sooner than later for testing installation and upgrade scripts on mysql and maria DB

darius · October 9, 2015, 12:34am

This is a very illustrative example of something.

As Paul says, if we really never backport new features, this bug would not have happened in 1.11.4. As Burke and Mark point out, this was not an actual incompatibility, but could/should have been caught in testing.

My personal opinion is that we should remain flexible about backporting small new features, and improve our automated testing processes. This is especially relevant for features that cannot easily be implemented in a module (e.g. an addition to a core table, like this one).

Let’s look at the counterfactual in this example. In real life, it was implemented and backported in June. If instead we had been sticklers and not backported it, then the Bahmni team would still not have this feature, and indeed it would not be scheduled for release until Platform 2.0, scheduled for December. And to get this they would have to also simultaneously take the upgrades to Spring and Hibernate, removal of all deprecated methods, and removal of the legacy UI. And I presume they wouldn’t do this until most of the reference application modules had been upgraded to handle this, e.g. not until March. [I will fork this topic to talk about forks.]

Some of that complexity is because Platform 2.0 is going to be a big upgrade. But even assuming they would get to use the feature in December/January rather than March, adding 6-7 months to the timeline of getting a small optional column added is kind of thing that leads Mark to say:

I’ll add that I myself basically stopped committing to openmrs-core at the end of 2012, because no feature I put there would ever be ready in time for me to use it in an implementation. From github:

I’m afraid of having more devs who are attached to implementations end up thinking the same way that Mark and I did.

That said, I can see the argument that if OpenMRS truly behaves like a platform, we must be iron-fisted about never backporting data model changes. Does someone want to argue that point? I’d honestly like to know where people’s opinions fall regarding how OpenMRS should balance this tradeoff.

sunbiz · October 9, 2015, 2:13am

While thinking about backporting changes, improving releases, better testing etc., we should not lose sight that we need to withdraw this release. Delete the tag from git and any other places (maven repo maybe ?) where this has percolated to limit the damage. And think about how to fix this issue. I think @wyclif suggestion of another liquibase file that will be run on startup irrespective of login is a fine approach, that I have seen many projects take. That or another solution needs to be implemented soon, since this is a blocker and will prevent the refapp 2.3 release. 2 days with a known bad release should not become a week!!

michael · October 9, 2015, 11:32am

FWIW, In the past when we had broken releases, we never really removed them, just posted a newer one and made appropriate documentation.

wyclif · October 9, 2015, 2:40pm

I think we should focus on fixing the issue for now, does anyone have anymore ideas on how to fix this?

darius · October 9, 2015, 3:00pm

Offhand, if I’m understanding the problem correctly, two solutions could be:

What Saptarshi said (i.e. some liquibase run at startup before the spring context is loaded)
Don’t use the regular API to authenticate the user who is logging in to an upgrade (to avoid the hibernate mappings), but write this as direct SQL queries instead.

wyclif · October 9, 2015, 3:11pm

To add to your second solution Darius, i wonder if the same might apply for the daemon user if there are database changes to be run, seems to me like we would need to avoid the API too.

burke · October 12, 2015, 4:13pm

So, we’re starting up the OpenMRS API and using it to run liquibase changes? Given that the API is built to run against the updated data model, this sounds like a problem. How hard would it be to run updates before firing up the API?

dkayiwa · October 13, 2015, 2:06pm

After looking into this a bit, i think this is the expected behavior for a couple of reasons:

Before closing the ticket, i had tested on the master branch and the setup wizard appeared as expected.
Looking at the stack trace on the ticket, the failure happens on WebDaemon.startOpenmrs(WebDaemon.java:64) which is after Listener.setupNeeded() returns false.
I have confirmed by doing an upgrade from 1.11.0 to 1.11.4 and had the changesets run to completion with the upgrade successful.

So i suspect @sunbiz may have “auto_update_database” set to true in his runtime properties file. Which we are aware of as per:

If the above is not so, then i need a deeper dive into this!

darius · October 13, 2015, 2:25pm

To be clear, you are saying that upgrading to 1.11.4 actually does work? (At least it usually does, as long as you do not have auto_update_database=true.)

dkayiwa · October 13, 2015, 2:33pm

Yes that is exactly what am saying!