Cohort Modules Categorization/Typing

Thanks @burke, some more on this topic here on Slack.

(+1 for static vs dynamic.)

In particular it would be great to have your opinion on this:

To me, a Cohort Definition is just that - a definition for how to compute a Cohort. One such definition could be “a specific saved set of patients”. This is how the reporting module sees things - see: StaticCohortDefinition.

In the reporting module, an “EvaluatedCohort” is essentially a cohort that has additional references back to the definition and the context that was used to compute it. So this could be used to a similar purpose as “CohortType” in that you can inspect the CohortDefinition’s class or interface or some other marker to determine if it is “dynamic” or “static”.

Overall, I think it would be helpful for us to step back and look at the different places where we have built out Cohort-related functionality, and to come up with a future-looking vision for what we want to support and where. Among those places should be:

  1. The OpenMRS core notion of a “cohort” and how this has evolved from a static list of patients to one where cohort “memberships” are optionally date-constrained

  2. The OpenMRS reporting module’s Cohort Definitions and how these definitions are stored and categorized and how these relate to core Cohort.

  3. The OpenMRS cohort module, and how it has evolved - what use cases it was originally designed to address such as “household” cohorts that can have Obs and Encounters associated with them, how general-purpose it is, etc.

It would be great if we could lay out a vision for how we would ideally like to architect things, and then see how each of these approaches fits (or doesn’t fit) into that vision and how each might need to evolve to contribute directly to it or at least ensure compatibility down the road.

Before we commit too far to how fundamental the cohort module is to the OpenMRS 3.0 product architecture, or how much we want to continue to indefinitely support the OpenMRS reporting module in it’s current form, I think it would be helpful to lay this vision out a bit more clearly.

3 Likes

@mseaton last time we spoke about this with @burke and @ibacher (not sure if it was a TAC or an O3 design call) we left it at the fact that calculated cohorts would rely on the Reporting module actually. Since the Reporting module already offers ways to define cohorts, and then calculate (evaluate) them.

… while static cohorts would be done differently, but that was in oblivion of the StaticCohortDefinition existence and possibility. Now it looks like that everything could go through the same Reporting module channel?

Thoughts, @burke @ibacher?

@mksd Seems a bit redundant, though, doesn’t it? We talked about leveraging the reporting module to build a cohort (as a list of patient ids) and then translating that into a cohort module / core cohort and serving that to the frontend. I get that we’d get some consistency, but it seems silly to leverage the reporting module to get a list of patients to translate that into a cohort when… we could just already have the static cohort setup (we also avoid the overhead of having to update the StaticCohortDefinition every time we update the cohort itself).

And the reporting module already has a REST API via the reportingrest module.

@ibacher sure maybe, I’m not close enough to the code of this cohort module to weigh in properly. Indeed I was seeking consistency and that may be overkill. However since there would be some kind of mappers to go from the Reporting module’s cohort datasets to that module’s cohorts, this shouldn’t matter too much in principle though. But again, I don’t feel strongly about this.

Use of dates is optional. A cohort service that accepts period of membership or can answer “who was in the cohort on YYYY-MM-DD” can be used like a list of people and the dates ignored.

Great point. Example is static via dynamic as definition rather than “type” discriminator. FHIR is using “type” to specify the type of members in the cohort anyway (meaning we probably shouldn’t be using cohort type for another purpose).

Some of the features of the cohort module are worth keeping and others could be removed and brought back again some day. More details below.

To ensure we’re evolving toward FHIR (not diverging or creating new incompatibilities with FHIR), let’s start with what FHIR Group offers:

FHIR Group

property description notes
active active / inactive All cohorts will be seen in FHIR as “active” until we decide we want/need to add the capability to disable cohorts (without voiding/deleting them).
actual “actual” list of people vs. “descriptive” list of “types” of intended individuals From the documentation, I’m not sure FHIR’s “actual vs. descriptive” is precisely our static vs. dynamic. But maybe. This seems like a FHIR property we could infer when exposing an OpenMRS cohort as a FHIR Group.
type kind of members (e.g., people, cows, syringes, etc.) We can focus on people and leave grouping of cows & syringes to farming and inventory systems.
managingEntity ownership We’ll want to have owners, editors, viewers.
characteristic a simplified set of key/value pairs defining include or exclude criteria This looks like one approach to a cohort definition hardcoded into the FHIR group. We need more flexibility.
member members are anything (people, cows, syringes, etc.) We’ll focus on people.
member.active active / inactive membership For FHIR, we would infer inactive for any member who doesn’t have a membership with start & end dates covering the time of interest.
member.period period of membership Same as our membership start date & end date.

Next, let’s consider what we have in core:

OpenMRS core

  • members are patients
  • period of membership (start date & end date)

And then there’s the Cohort Module (wiki page), which was initial intended to enhance our cohorts in a way that could be re-introduced to core:

Cohort Module

  • members are persons
  • cohort attributes
  • membership attributes
  • membership role (e.g., head of household)
  • obs, program, visit & encounter for cohorts

So, let’s imagine bringing these together into a way that helps align us with FHIR and focuses on the things we need now (e.g., we need to support patient lists asap):

Target state (what we want now)

  • Use cohort definition to define how cohort is determined
    • Doesn’t translate to FHIR Group (which only supports include/exclude list of key/value pairs)
  • Support attributes of cohorts & memberships to empower local & module needs through extension
property notes
cohort.definition Take the reporting approach of providing a definition of how a cohort is achieved. The definition might specify the cohort is manually managed or it could point to a handler that can return the cohort. We’ll probably want a definition_handler property to avoid having to pre-define all types of definitions; rather, have the API persist the definition and where to go to turn the definition into an actual list of people. The API would provide common use cases out of the box, but custom cohort-generating algorithms with their own definition schema could be supplied by add-ons.
date_members_updated The UTC timestamp of when the cohort was last updated (whether manually or through a calculation)
public Some mechanism to support visibility (e.g., public vs private)
ownership / access Like FHIR has managingEntity, we want to track ownership. We’ll want more than ownership; rather, we want to be able to relate users to a cohort with access levels (e.g., owner, editor, viewer).
cohort.attribute
cohort_member.attribute
Attributes allow implementations and add-ons to innovate without having to fork the code. So, I would favor supporting the OpenMRS attributes pattern for both cohorts and cohort memberships.
cohort.type == Person We should probably migrate away from using cohort.type for other purposes (e.g., static vs. dynamic), since FHIR defines their type as the type of members. In our cases, this will always be persons – not just patients (like we currently have in core), but not cows (like FHIR supports). The idea of static (manually managed) vs. dynamic (calculated) cohorts can be absorbed into support for definitions (since these are definitions of how the cohort is realized).
cohort_member.start_date
cohort_member.end_date
We continue to support periods of membership. The API is (or can be) designed so cohorts can be used as a simple list of persons and the start & end dates ignored if they aren’t needed.

The type of contract with a definition handler could be include methods like:

public interface CohortDefinitionHandler {
  boolean needsUpdate(cohort); // True if cohort needs to be updated
  void update(cohort); // Update cohort using definition
  boolean supportsManualChanges(cohort); // allow manual changes?
}

This means we put the obs/program/visit/encounter for cohorts along with membership roles aside for now (pull them out of the cohort module) and focus the cohort module on cohorts of persons, user access to cohorts, cohort definitions, and continue supporting cohort/member attributes. This would allow us to support most/all of the needs for OpenMRS 3.0’s patient lists.

Future state

  • Add membership roles
  • Add membership sort weight
  • Support obs, program, visit & encounter for cohorts (like FHIR does) in core, so reporting and the rest of OpenMRS API can leverage them

I haven’t addressed this requirement – i.e., mutually exclusive cohorts. I’d suggest we could work through design ideas for supporting mutually exclusive cohorts in a separate Talk topic.

1 Like

I don’t know what you mean by this. The “StaticCohortDefinition” is just a simple thing that points to an existing saved, named Cohort.

I’m not saying using StaticCohortDefinition is necessarily desired, I just wanted to point out the the reporting module already has an interface / class structure that both abstracts away the notion of whether a cohort is dynamic or static, as well as providing a means to determine if it is static or dynamic (i.e. if it implements StaticCohortDefinition).

Oh! Ok… that’s actually quite cool then!

I tend to shy away a bit from mapping voided directly to a FHIR-readable property. The reason is that the FHIR module generally treats voided content as if it were deleted, so that, e.g., every visible cohort would necessarily be active. If we have a reason for differentiating active from inactive cohorts, then we should probably create a property for that purpose (so that the semantics of voided/not-voided are consistent across the domains rather than domain-specific).

I would agree with @ibacher here. It would seem to me that our closest current analog to “active/inactive” is the membership startDate and endDate. A member is inactive on a given date if the startDate and endDate indicate as much, and a cohort is inactive if it has no active members. But we probably want a new property for this, or to adopt a Cohort attribute for this.

Fair points. We can infer “inactive” for cohort memberships that are outside of their start/end dates. For cohorts, we can just have all cohorts “active” when exposed via FHIR until we decide we want to support disabling of cohorts (without deleting/voiding them).

I’ll update my previous post accordingly.

Catching up on this discussion, so apologies if I don’t get this right. I think FHIR is a good model to shoot for, but it is primarily for transfer of cohorts and not creation of cohorts. We need to think a bit broader about what a cohort is and how we create one. For example, access to patients to calculate the cohort determines the membership. If the cohort was built with one level of access (perhaps I don’t see HIV+ patients or certain private patients) then my cohort denominator will be different from another’s. So privacy or access needs to be captured somewhere.

The actual property is sort of like intensional and extensional value sets. Actual is a list of patient IDs (like a list of codes) and descriptive sounds like it is a rule-based description which requires evaluation to generate the members (like “all children of a particular code”). In the case of extensional or descriptive cohorts, you need enough information to evaluate/expand the list. That includes denominator, date of the rules, dates of the data and the date the query is executed.

For example, I might run a query looking for all people who had a CBC between 1.1.2021 and 3.31.2021. I create that list of patients today. Tomorrow, data is added to an existing patient in the database adding a result that occurred on 2.1.2021. If I re-run the cohort, I will add a new patient to the cohort even though the date range did not change. Somehow we need to capture not only the dates we are interested in, but also the date at which the cohort was evaluated (for dynamic/descriptive cohorts).

The cohort.definition would contain the information needed to evaluate the cohort, including date ranges of concern when appropriate. The specific format & content of the definition would be determined by the handler/type of definition. We might want to include some basic cohort definition handlers within the Cohort Module, but wouldn’t want the Cohort Module to try to define all the possible ways a cohort might be defined.

Good point about needing time last computed. I’ll update my earlier post to include that.

The near term goal for cohorts is to clean up the Cohort Module to focus on the foundational features needed to support our near-term cohort & patient lists. I’ll try to work with @ibacher and @corneliouzbett to define and share a near-term strategy for the Cohort Module.

I think we have two kinds of date_updated that are relevant here: the last time the Cohort itself was modified (for whatever reason) and the last time the Cohort definition was executed. The latter can be stored in an attribute and read by an appropriate handler or some similar mechanism. But we wouldn’t want to confuse things by claiming that the cohort was last updated two minutes ago when all I did was tweak the name or similar.

Good point. My intent was not to use the cohort.date_changed (which would reflect when the Cohort resource metadata like name, definition, etc. was last changed), but introduce a new timestamp for when the list of people in the cohort (cohort membership) was last refreshed/changed/validated.

Something like date_last_evaluated would work for calculated cohorts, where the evaluation of the definition updates the membership. However, what about cohorts defined to be manually managed (e.g., directly by users or through triggered events), where the membership isn’t determined by evaluating the definition? I thought the equivalent for manually managed cohorts would be the timestamp the membership was last altered (manually or via event) or reviewed/validated (e.g., someone verifying a manually managed cohort is correct without changing it would be functionally equivalent to re-calculating a calculated cohort that doesn’t result in any changes).

So, I was trying to capture for all cases of definitions (calculated, manually managed, altered via events, etc.). Maybe date_members_updated or membership_valid_as_of?

OK so here’s where I think we are at with getting momentum on this.

  1. Who needs this: Ampath, Mekom, & UCSF/OHRI
  2. Dev time: Ampath has committed some of @corneliouzbett’s time to focus on working on the Cohort Module; he is just waiting for direction & tickets/clear tasks.
  3. Architecture/Software Design time: @ibacher has graciously offered to provide direct support for @corneliouzbett, and @mksd will also provide input.

So what do we need next?

  • @burke @ibacher @corneliouzbett (and optionally, @mksd and @eudson) can you guys get together this coming week to assemble a proposal for how to wrangle Cohort Module, and then bring that to the TAC on Monday the following week? (I’d recommend posting the plan here as well)
  • Then folks like @mseaton and @eudson/@mayanja can join the TAC call to hear the approach

@ibacher and @corneliouzbett, I’ve started to draft a Cohort Module Next Steps google doc. Let me know if you already have something like this and I can help merge documents. At the moment, I’ve just copied my prior Talk post and started to organize content. I’ll try to find time today & tomorrow to evolve it.

@burke looks like the document ends with creating tickets(Next Steps) . I would like to know how i can help in this sense , this also looks late for platform 2.5.0 Iam planning to release platform-2.5-alpha in two weeks .This may be knocked off the list and re-scheduled for 2.5.x or 2.6

@ibacher @corneliouzbett are we ready to harness tickets out of this , anything impeding us ? . Am sure it’s the same thing @dkayiwa is waiting for .

We’ve created some tickets to do some of the refactoring. There’s also this one that I guess is open.

The idea of doing this work in a module is that it’s unbounded from the platform release cycle, so while this may be work for the “platform team”, it’s not really related to the 2.5 release. I’d anticipate it first being included in the 3.x RefApp.

1 Like