Out-of-the-box, what kind of names to support?

darius · April 20, 2015, 7:59pm

Hi All,

We are just now fixing a bug for the OpenMRS 2.2 release, where accented characters were not being allowed in patient names. E.g. “Martínez” was not allowed.

Under the hood, there is a global property setting for this, where the admin can specify a regular expression that limits allowed patient names. The current (overly-restrictive) default is a-z, A-Z, space, or dash.

My question:

Should the default out-of-the-box behavior be to limit to characters commonly-used in latin names? (E.g. in addition to the above, we’d allow accented characters, and dot) This would mean that someone using OpenMRS with names in other character sets (Chinese, Arabic, etc) would need to change the global property to make this work.

Or should the out-of-the-box behavior do no validation, and this is just an advanced feature that an administrator can enable, by setting the global property?

(My initial weak preference is to go with the first option, i.e. limit to latin out of the box.)

burke · April 20, 2015, 9:13pm

This may help. The post is old, but it looks like you could add two files (xregexp.js and xregexp-unicode-base.js) and use the XRegExp "^\\p{L}[\\p{L} ']*$" to support UTF-8 names… or you could create one of the longest and ugliest global property values imaginable.

darius · April 20, 2015, 9:34pm

As a technical point, this validation happens on the Java side, so I presume that \p{L} is handled without need for a plugin. But I’m more interested in what people think the standard out-of-the-box behavior should be.

Burke, by the fact that you sent that I take it you’re suggesting the default is to allow any unicode letter, which sounds like a good approach.

As an aside, I think we need to support at least the following non-letters:

apostrophe, e.g. O’Brien
dash, e.g. Al-Kurd
spaces, e.g. Maria Theresa
periods, e.g. Jr.
numbers, e.g. 3rd

michael · April 21, 2015, 12:10pm

What is the rationale for not allowing names with “special” and/or non-Latin characters? Is there any good reason to restrict them, when people generally won’t be able to enter them anyway unless they have an input device set up to create them? (And if they do, they’ll likely need those characters!)

soddoadam · April 21, 2015, 2:02pm

This is a great topic. Please forgive me for expanding on this question beyond character set issues. We use Latin Alphabet to enter our patients into MRS instead of Amharic Fidel, I could see the benefit of supporting fidel (which have Unicode definitions), but at current we are using the english alphabet sounds.

Here in Ethiopia we have a traditional naming convention that is not the same as the western world. We need different default fileds instead of first middle last with first and last being mandatory

Everyone has a Given Name (i.e. Rahel) Everyone has a second name which is their father’s Given Name (i.e Daniel) Some people have a third name which is there Grandfather’s given name. (i.e. Ephraim)

So a patient may be entered as Rahel Daniel Ephraim, or if they don’t use their grandfather’s name she would be entered as Rahel Daniel.

We are using a custom registration ap that defaults to collect the correct data and display it, but it would be nice if we could go into the reference app and change the name behavior to use the name format that we want First Name and Second Name mandatory and Third Name optional.

Thanks for all the hard work everyone!

mseaton · April 21, 2015, 2:50pm

I agree with Downey 100%. I don’t understand this feature. If we want to allow restricting names with regex I guess that’s ok, but I would default to not doing it at all. Certainly any default shipped with core should leave this global property blank and not try to do any kind of validation whatsoever. Our implementations have been burned by well-intentioned validation rules introduced in new OpenMRS versions so many times, with many, many lost days of productivity, that we need to be very careful about this kind of thing in the future.

Mike

darius · April 21, 2015, 2:59pm

Historically, looks like the validation was added in the 1.x UI in OpenMRS 1.9 in [TRUNK-338] - OpenMRS Issues It started also hitting the 2.x UI because of OpenMRS platform 1.11 [TRUNK-2616] - OpenMRS Issues

There’s no real rationale or argument given for this in the ticket, just:

Valid name could be like the one which doesn’t contain (numbers, / , " , ~ etc)

Thinking more, I guess we should not validate by default. At most, we might consider having the default validation disallow < and >, or else disallow “< script >”.

mogoodrich · April 21, 2015, 3:18pm

+100

Say a small subset of your production patients have special characters in their names, and you don’t test against that subset. Then you update you production system, and start getting emails/calls about errors when trying to update patient records. It’s takes some time talking with the users and testing to determine that it is only patients with special characters in the names, so you then start looking for a bug that may have been introduced in handling special characters. Finally you discover that it was actually a validation that was handled, and you get annoyed… We’ve had things like this happen before.

Take care, Mark

darius · April 21, 2015, 4:52pm

So, at this point I’m convinced that we want zero or absolutely minimal validation out of the box, and the ability to set this is an advanced setting.

Does anybody disagree?

dkayiwa · April 21, 2015, 7:47pm

What does minimal validation mean?

lluismf · April 21, 2015, 10:46pm

The only reason I can imagine to disallow accented characters is to avoid failed searches (if it’s stored Martínez searching for Martinez won’t work) or sorting (Java or DB). But in both cases there are solutions. A common approach is to store the name “normalized” without accents and uppercase.

darius · April 21, 2015, 10:55pm

I meant that we might disallow “<” and “>” or else disallow the string “< script >”.

This isn’t quite the correct way to avoid xss. I mean: we should work on handling xss with a general case, and not with a particular regex that is only applied to names.

So, forget I said that. You might imagine some truly minimal validation, that ensure names aren’t pure whitespace or something, but beyond that, by default we should allow whatever they type.

dkayiwa · April 22, 2015, 7:11am

Makes sense!

For avoiding xss attacks, should we leave that to a global property which the end user can change?

burke · April 22, 2015, 4:38pm

For now. Eventually, we should have conventions for avoiding XSS attacks. I imagine we’d want utility function(s) that escape content of all user-derived data unless explicitly disabled for specific fields – e.g., you wouldn’t want HTML escaped in the HTML Form Entry form definition field.