Data model for the revised cohort definition

burke · April 28, 2016, 9:57pm

Agreed. sql_definition assumes the definition will be in SQL. I’d suggest, per Darius’ suggestion, we use handler (fully specified classname of Class in charge of calculating the cohort), handler_config (any configuration needed for the handler), and handler_data (a place for a handler to persist any metadata like data about when the cohort should be run again). If you only want to handle SQL-based definitions, then you can write a SQL handler and then use it in all of your cases.

Alternatively, if you want to use sql_definition, then put it into your module-specific table and not directly into the cohort table, since we will not want this design in core.

Also, cohort is meant to serve as the base that handles both static (manually defined) and dynamic (calculated) cohorts. Initially, I had imagined we would keep the cohort table limited to those attributes shared by all types of cohorts and any static- or dynamic-specific attributes (like definitions & scheduling) would go into cohort_static and cohort_dynamic tables extending the cohort table. I realize this could be over-designing (which I tend to do), so I was okay with having a few dynamic-specific attributes in the cohort table that would go unused for static cohorts.

As for scheduling, we should not assume only intervals for scheduling as your approach of a 3-character schedule_interval suggests. We would also want, for example, to support recalculating a cohort every Thursday at 21:00 GMT+3. Rather than trying to come up with a model to handle all possible scheduling needs, my advice would be to leave this up to the handler to manage and simply put a date_to_expire attribute on the cohort (if exceeded, the API knows the cohort is stale).

As Ada pointed out, this isn’t the planned GSoC project, so you can do this work wherever you see fit. We do plan to add start_date and end_date to the cohort_member table in core, so, if you do the same, it will be easy to transition when these are supplied by core.

Agreed. This is an argument for separating dynamic cohorts (those that are calculated) from the base cohort, since cohorts are data and the definitions of dynamic cohorts are metadata.