@plypy and I are working on the upcoming migration, where we copy user data from OpenLDAP to MongoDB in Dashboard 2.0.
One thing we need to consider is keeping email addresses unique. Primary email addresses are already “supposed” to be unique, but we have many accounts with the same primary email address. Presumably these accounts were created before we migrated to the ID Dashboard in 2011, and therefore could not have had their primary email address checked. Most appear to be duplicate accounts created by the same person.
I wrote a script to analyze duplicate email addresses in the userbase. These were the results I found:
6030 email addresses
130 email address duplicated
142 times.
128 of these are primary-only duplicates.
0 of these are secondary-only duplicates.
This means there are 128 times where more than one account is using the same primary email address. Due to the way MongoDB indexes data, we cannot keep these duplicate addresses around?
I see a couple potential options:
- Change email addresses on old, duplicate accounts, to something like username@users.noreply.openmrs.org. Affected users, if they were actually using that account, would then be able to go in and update to a new email address.
- Don’t import these users automatically, only import them when their account first authenticates to the Dashboard. Prevent subsequent accounts from importing to the Dashboard without a Helpdesk ticket to resolve the email address conflict.
Since I believe the vast majority of these clashes are with old accounts that have been forgotten about, I don’t think that either option will cause much of a problem.
Thoughts?