Concept units w/superscript don't display properly in UI since 1.11

When concept unit was set as cell/mm<sup>3</sup>, it was being displayed correctly as cell/mm3 in 1.10.x. But since 1.11, it is just <sup> tags are displayed as a plain text (cell/mm<sup>3</sup>) in UI. Have attached images of displayed units 1.10.x and 1.11.x.

On looking at the code, we found out that using <c:out> jsp tag escapes html characters which is causing this issue. <c:out> tag is added in 1.11 to avoid exposure to XSS attacks. ‘escapeXml=false’ can be used with <c:out> tag which will not escape html characters but it is s ame as not having c:out tag since the only purpose of <c:out> is to escape HTML. Any suggestions on fixing this issue?

IMO storing HTML markup in the DB is not correct because the data can be displayed in other non-web applications.

1 Like

Although it is indeed not ideal to store HTML in DB we should make an exception for the sup tag in the units field. Something along the lines could work:

<c:out var="escapedUnits" value="command.concept.units" />
<c:set var="units" value="${fn.replace(escapedUnits, "&lt;sup&gt;", "<sup>")}" />
${fn.replace(units, "&lt;/sup&gt;", "</sup>")}

Is there a single UTF-8 character for superscript 3?

Apologies for not googling first before sending my last message…

The is a Unicode character for superscript 3. See

Maybe this is a better approach?

(Alternately we could try the idea of whitelisting safe tags. I wouldn’t want to do the work and only support sup.)

2 Likes

Inserting the unicode glyph for superscript three (3) directly into the unit text could be used as a workaround.

In general, we should be escaping HTML/JavaScript using tools like <c:out></c:out> to prevent XSS attacks in user-entered data; however, I don’t think it’s necessary for admin-entered metadata. A ubiquitous HTML clean (something like a <c:out safeHTML="true"></c:out>) would be ideal for cases like this, but I think it would be reasonable until we have that to treat this specific case (concept units) as a regression and remove the <c:out></c:out> when rendering concept units.

Inserting unicode character for superscript 3 (&#179) doesn’t work as well since <c:out> will also escape ‘&’ character. As Burke mentioned, I feel that <c:out> is not necessary for admin-entered metadata like concept units. If that is okay, a pull request has been already raised to remove the c:out tag. Can this be merged?

&#179 is not the unicode value of superscript 3, it’s the escaped HTML value.

1 Like

As Lluis says, I’m not suggesting that you put the html entity value in the units field, but rather that you put the actual Unicode character in the database field.

Generally speaking, we still should escape admin-entered text. For example you could easily imagine an implementation where some users can manage concepts, but aren’t supposed to be able to manage users, and this gives them an attack surface.

My suggestion is to implement a tag that properly allows whitelisted html tags for this.

As Burke says, if we don’t have time to do this now, and if the Unicode-superscript-3 trick doesn’t work (and because I think the CIEL dictionary uses sup sometimes) I am okay with unprotecting this one field. (But do create a ticket for fixing it in the longer term!)

Also, remember that concept units are displayed on quite a few places in the UI including in modules.

Copying the actual character ( ³ ) into the text field worked fine. For now, we will go with this approach and update our documentation for the implementers.

1 Like