idgen - generate a random identifier

jdick · February 1, 2018, 6:35am

We are migrating to use of the idgen module from idcards. Presently we are able to use rest calls to generate a new identifier. However, idgen is generating the identifiers sequentially. We would strongly prefer to generate a random identifier for two reasons:

It better anonymizes the patient
We are migrating from idcards which does this. We will need to move the existing identifiers (created in random order) to the idgen log. Given that we already have identifiers in the 99999999x range, we would be left with no identifiers.

@f4ww4z, @burke, others, do you have any recommendations?

Thanks,

JJ, @rtanui

mseaton · February 1, 2018, 3:35pm

@jdick, the idgen module is designed to handle this use case.

A SequentialIdentifierGenerator (by definition) will always generate an identifier based on the next available in a sequence.
An IdentifierPool is another Identifier Source that you can populate with identifiers, and which can then be configured to issue these either sequentially or randomly.
You can set up an IdentifierPool to auto-populate from a SequentialIdentifierGenerator (or any other Identifier Source) in order to ensure that the pool never runs out (you set a minimum threshold and number to populate with)
You can easily add a custom IdentifierSource class from your own module and the idgen module will pick this up and enable you to use it. This can have whatever logic and behavior you want.

Hope this helps, Mike

jdick · February 3, 2018, 4:14am

@mseaton, thanks for your comments.

Is there a docs that you know of on how to set up a pool? I don’t think there’s a UI and I can’t find any example rest calls.

I was hoping to better understand the workflow, does this seem right?

You create a pool which it seems to have a start identifier and a batch size.
As a identifiers are issued (random or sequential), a row is added to the idgen_pooled_identifier

A few questions:

When an identifier pool is created, does a record get entered into the idgen_log_entry table? I am hoping that this log is updated and is updated only AFTER an identifier is generated from the pool. The alternative would be that upon pool creation, the entire pool is written to the idgen_log_entry table. We would need to create a pool of all universal ids between 00000000 and 999999999 given that we already have a random distribution of these identifiers in place (we can set up a sequential range within that because of this pre-existing random distribution.
Why have an idgen_pooled_identifier table and a idgen_log_entry table where both include the identifier? I can understand why you would want to keep track of which identifier got created with which pool. But in this case why not reference the id key from the idgen_log_entry table from the idgen_pooled_identifier table? Maybe I’m just not understanding the role of the two log tables.

Thanks again.

mseaton · February 5, 2018, 5:34pm

@jdick, there is some documentation on the wiki. There is a UI, accessible from the legacy UI administration page (it is under the “Patients” header, not under it’s own section there, so might not be obvious). There has also been an effort to migrate these administrative pages to an OWA, much of which I think has been done, though I don’t know the precise status of this (@dkayiwa or @raff or @wyclif?).

Not exactly. A pool can be configured to auto-populate from another source, but this is optional. If you do this, then you indicate what the batch size should be to refill it, and also the lower threshold that needs to be reached before filling happens. This refilling is independent of a pool’s operation. A pool is simply a construct that holds a bunch of pre-defined identifiers. Making an identifier available in the pool is what adds it to the idgen_pooled_identifier table, with a null value in date_used. Issuing an identifier from the pool sets it’s date_used property, which is what lets the pool know not to re-issue it in the future.

The idgen_log_entry table is what records which identifiers have been issued, and when, regardless of source. So identifiers issued from a pool, a sequential generator, a remote source, or some custom source that you add will all result in this being logged in the idgen_log_entry table. This is basically an auditing table that is independent of source.

I am guessing that if you want random identifiers produced from a pool, and for these to go from 00000000 to 999999999, that this will not perform given the existing design. So I’d probably recommend you create a new source for this. Something that looks at a minValue, maxValue, and generates a random value in this range, and then checks the log table as to whether this has previously been used, probably makes sense. This would seem like it would have a lot of value to add directly to the module for general purpose usage.

Let me know if you have more questions, Mike