Do we have a consistent and reliable logging strategy? I’m thinking about this from a security and compliance perspective, so that when we decide something needs to get logged, we can be sure it’s actually happening.
So far, it seems like a lot of logging is happening in the modules, but if we want to be sure there’s no gaps it seems like we should be logging closer to the data access layer. This could potentially be important for implementors that have to comply with specific auditing requirements.
@ibacher you mentioned at the TAC today that a hibernate interceptor might be suitable for this, perhaps as a separate module. Do you mind describing that in a little more detail? Has work already been done on this?
So as I understand it, the concern is about ensuring we have a proper audit trail for changes made to a record and especially when a record is deleted. We should probably also think about having some way of auditing when data changes. The current system we have in place maintains this information for:
When the object was created
When the object was last updated
When the object was voided
This leaves us with two gaps in our audit logging:
Changes where the object is updated multiple times
Changes where the object is purged from the database
There was this module which provided that kind of functionality, but it hasn’t been actively worked and has some known issues. The heart of the module, though, is pretty much exactly what I was imagining, a Hibernate Interceptor that captures hibernate events and writes them somewhere.
So my thinking is that the easiest way to meet that requirement might be to revive that module and try to deal with it’s limitations. Ideally this is something we could either move into core itself or at least the platform distribution. (The advantage of doing this in a module is that it’s easier for already-running systems to be able to take advantage of this functionality without needing to upgrade their entire OMRS backend).
Wow, that’s good news for logging. This was one of the areas that was identified by the NCSU security team as being lacking but I think bringing back the audit log module would fix all the gaps.
This is another area where I’ve always felt there was overlap with other initiatives that require Change Data Capture (CDC), and for which we are starting to look more at use of Debezium, as it captures not only API-level changes but also would log direct SQL changes (including, importantly, those that might be introduced in a liquibase changeset).
I’m interested in Mekom’s take in particular, if there is a thought there that something like the auditlog module (with a Hibernate interceptor) might be something they’d support distinctly from something like OpenMRS EIP.
I’d be very keen to join a conversation as to how we can build and maintain framework level pieces that can be linked together to play various roles around sync, integration, data warehousing, audit logging, etc.
Thanks @mseaton, that’s a fair point and yes there are certainly overlaps. Happy to discuss this during TAC calls when appropriate.
The thing is that a complete audit log tool would need to track reads as well, and Debezium won’t do that AFAIC. So perhaps it would be time to
Upgrade the Event module to feed off Debezium.
Have “Audit Log 2.0” leverage the Event module if it doesn’t do it yet, that’d be for C(R)UD.
Track reads on top. In this case ad-hoc interceptors would probably be good enough, but maybe not depending on the business needs, that’s to be determined.
I think when we say audit logging we need to be very specific on what exactly we mean. From a security perspective, the interest is to know who read/inserted/updated/deleted a row and debezium can’t tell you who made the operation and I doubt it notifies you of reads but can track other operations otherwise from an integration perspective use debezium works fine.
With that said, if we can reliably update changedBy/dateChanged from the API which I think is taken care of by the OpenMRS API, when a debezium event is fired, we have to take the values of changed_by and date_changed columns from the current debezium row state associated to the event and use that as the user from the API who made the operation.