Change condition onset date from datetime to date

wyclif · May 1, 2024, 11:52pm

We are facing some undesired behavior in Superset when displaying onset dates for conditions, in the O3 UI the user enters just a date without a time component, being a datetime field on the backend, the time component is added which defaults to midnight. If the servers are running on different timezones e.g. UTC which usually is the case, the translated datetime value can fall on a different date. Even though the date is technically correct as saved in the database, it can be misleading in the Superset UI for end users when the converted value rolls over to a different date.

Given that onset date is usually an estimate, can we treat it like birthdate and change its datatype to just date in the database?

mksd · May 2, 2024, 4:54pm

Agreed, that’s basically the bottom line. Birthdates, onset dates are indeed generally just dates without time component and should be, IMHO, treated as such.

@mseaton @burke, thoughts?

ibacher · May 2, 2024, 6:25pm

The platform team did come up with a high-level data model proposal for supporting dates that weren’t just timestamps, but, to the best of my knowledge, that hasn’t been implemented anywhere. Conditions may be an ideal case to experiment with implementing the partial date support mentioned there.

mseaton · May 2, 2024, 7:22pm

Well, I wouldn’t necessarily use birthdate as the comparable, since we do now actually have a birthtime in the system, and birth dates are just dates for most use cases, but we did decide that we actually needed the time component in other cases.

But generally, as long as the motivation for the change is to make the data model more correct, and that conceptually a condition onset date should always be something that only needs to be at the specificity of “date” and not “datetime”, then I’d be supportive of it. It can’t just be to work around timezone issues in SuperSet.

As @ibacher says, for conditions, having an onset date that is truly flexible enough to be estimated to the level it is known is probably the right approach, as I’d imagine answers like “since I was a teenager” or “since last April” are probably quite common for conditions, and specific dates - let alone times - are not often known. So if we are going to make a data model change, I’d agree that doing it as a means to move towards partial date support would be preferable.

akanter · May 2, 2024, 7:33pm

I like the idea of generally dealing with the issue so that we could support times when needed. I think OHDSI had a big discussion of adding datetime where there previously was dates and there were lots of issues about date calculations. However, there are use cases when a condition needs to be recorded with the actual time. It is more likely that there is lack of specificity for onset, but there are some cases where the model should support time. There are lots of Quality Measures that track times, but many of those are tracking times of procedures, not diagnosis, but I still think it would be better to not exclude time if we can help it.

burke · May 2, 2024, 8:39pm

I came up with a proposal on how to handle dates & datetimes here:

I would think conditions would best be approached as “date with partial date support” – i.e., we don’t track condition onset to the time, but it wouldn’t be uncommon to say the onset was in a given year (even if a guess) without knowing the specific date. In this case, I’d propose the approach:

dateValue (string) in FHIR format to explicitly support partial data (to year, to month, to date)
date (date) stores “effective” date, interpreting partial data into suitable proxy (e.g., “2023” → 2023-01-01)

mksd · May 3, 2024, 3:35pm

Agreed.

@burke in that case, we would migrate to something like this right?

public class Condition ... {
    
    ...

    @Column(name = "onset_date")
    private Date onsetDate;

    @Column(name = "onset_date_value")
    private String onsetDateValue;

    ...
}

And for the table definition:

<column name="onset_date" type="date" />
<column name="onset_date_value" type="string" />

Would this group accept for this increment to be considered a patch to Core?

wyclif · May 3, 2024, 3:50pm

Does it mean if one wishes to capture onset date with time, then the value is stored in onset_date_value column.

burke · May 5, 2024, 8:25pm

Correct, with JavaDoc comments to guide usage. The String value, which allows for partial date is considered the source of truth. Setting onsetDate(Date) would update dateValue to String equivalent. Setting onsetDateValue(String) would not only set onsetDateValue, but also set onsetDate (using proxy values for month or day if not included). If there is ever a dispute between values, the String is considered truth (i.e., the Date is for convenience of embedding proxy value rules inside the object and for backwards compatibility).

I propose the date with partial date support, because I don’t believe there is a real world use case for needing time for conditions If we need time support, then I’d suggest we use the approach to support partial date and time (with timestamp datatype). But, while partial date for condition onset (e.g., only knowing year) is a very common use case, I haven’t heard of any real world use case of needing to know the onset of a condition to the hour (e.g., “At what exact minute did your asthma start in childhood?”). If all we can come up is theoretical use cases, I’d favor sticking with date. But wait, we don’t need to decide this ourselves. FHIR’s Condition.onset allows for time. So, ignore me and use the approach for partial date & time (with String and something like ZonedTime in Java and timestamp in the database to ensure we explicitly support timezone with times).

mksd · May 6, 2024, 7:04am

Thanks @burke, that’s definitely clear enough to create a ticket (if none exists yet).

@dkayiwa would this one change to Condition be accepted as a patch for 2.6.5?

dkayiwa · May 6, 2024, 10:07am

We have generally avoided back porting data model changes. But have relaxed this rule for cases where implementers find themselves with the reality of a platform upgrade being impossible within the current circumstances.

wyclif · May 16, 2024, 9:38am

To make the migration easier, can we just change conditions.onset_date to date datatype in version 2.6.5? And then in master that’s where we will do the rest of the work to add onset_date_value column and support for partial dates.

wyclif · May 17, 2024, 11:56am

I have created TRUNK-6236 and TRUNK-6237

ibacher · May 17, 2024, 12:46pm

I don’t love the idea of changing the definition of a column in a point release. That seems to break the idea that we’re following SemVer. I’d rather we made the change in master and went through the process of releasing 2.7.0, assuming there aren’t any in-process work that needs to be wrapped-up for that to happen.

wyclif · May 17, 2024, 1:01pm

Yes I agree that this is a convention we try to adhere to and generally discourage making DB modifications in a maintenance release but that door is not completely shut as earlier mentioned by @dkayiwa, the Ozone team is pushing to have this done in 2.6.5.

mksd · May 18, 2024, 5:23am

My take on this is that treating conditions onset date as a datetime rather than a date was in fact always a bug that could be patched in 2.6.x.

The rationale being that the time component was never set to anything else than 00:00 anyway.

I know we’re breaking a rule though, so if we need to wait for 2.7.x then we will.

This leads to wider bugs when integrating OpenMRS with other systems, but we can also advertise those bugs by disclaiming that OpenMRS is still in beta.