Migration: Accounts with the same email address

@plypy and I are working on the upcoming migration, where we copy user data from OpenLDAP to MongoDB in Dashboard 2.0.

One thing we need to consider is keeping email addresses unique. Primary email addresses are already “supposed” to be unique, but we have many accounts with the same primary email address. Presumably these accounts were created before we migrated to the ID Dashboard in 2011, and therefore could not have had their primary email address checked. Most appear to be duplicate accounts created by the same person.

I wrote a script to analyze duplicate email addresses in the userbase. These were the results I found:

6030 email addresses
130 email address duplicated
142 times.
128 of these are primary-only duplicates.
0 of these are secondary-only duplicates.

This means there are 128 times where more than one account is using the same primary email address. Due to the way MongoDB indexes data, we cannot keep these duplicate addresses around?

I see a couple potential options:

  • Change email addresses on old, duplicate accounts, to something like username@users.noreply.openmrs.org. Affected users, if they were actually using that account, would then be able to go in and update to a new email address.
  • Don’t import these users automatically, only import them when their account first authenticates to the Dashboard. Prevent subsequent accounts from importing to the Dashboard without a Helpdesk ticket to resolve the email address conflict.

Since I believe the vast majority of these clashes are with old accounts that have been forgotten about, I don’t think that either option will cause much of a problem.

Thoughts?

1 Like

I’d like to review the 128 accounts (somewhere non-public to prevent unintentional disclosure of e-mail addresses) to verify.

We need to see what those duplicates have created in apps like JIRA or Confluence, specifically. Generally, I’m inclined to support your approach #2 (don’t import automatically) because I don’t really want lots of deactivated accounts around. But removing accounts from the directory would cause problems for JIRA and Confluence, I think. (We need to verify this in a non-production environment.)

Also, if we had to have a few duplicate accounts, would email address tags work?

Maybe. But since email address tags don’t work with Google Groups, those addresses would be somewhat-invalid. And not all addresses we’re looking at are from providers who support them.