Rest API for generic tagging

Hello,

One of the GSoC projects this summer is about handling generic tagging in OpenMRS, we want to make sure we know what the rest API for tagging should look like and then design the java API with it in mind. Below is what I have as the initial draft of the rest API;

Set tag(s) for an object which should effectively support adding/removing tags

POST .../ws/rest/v1/tag {"type": "patient", "uuid": "patient-uuid", "tags": ["tag1"]} //Add tag1 to a patient
POST .../ws/rest/v1/tag {"type": "patient", "uuid": "patient-uuid", "tags": ["tag1", "tag2"]}  //Add tag2 to a patient
POST .../ws/rest/v1/tag {"type": "patient", "uuid": "patient-uuid", "tags": ["tag1"]}  //Should effectively remove tag2 from the previous patient

OR

POST .../ws/rest/v1/patient/patient-uuid {...., "tags": ["tag1"]} //Add tag1 to a patient
POST .../ws/rest/v1/patient/patient-uuid {...., "tags": ["tag1", "tag2"]}  //Add tag2 to a patient
POST .../ws/rest/v1/patient/patient-uuid {...., "tags": ["tag1"]}  //Should effectively remove tag2 from a patient

In the first option, we resolve the java type from the resource that matches the concatenation of the version and the specified type field

Get all objects with the specified tag(s) and of the given type(s), tags should be required and types should be optional.

GET .../ws/rest/v1/tag?type=patient,concept&tag=tag1,tag2

Fetching any resource should Include its tags for the full representation

GET .../ws/rest/v1/patient?v=full

Returns:

 {
    uuid: "some-uuid",
    person: {},
    identifiers : [{}],
    tags: ["tag1", "tag2"],
    .... //Other fields
}

Ideas are welcome, thanks!

I have made some changes to the rest end points in the original post, you might want to take another look

I don’t think the URLs ending with “tag” make sense, because tag is a token, not a resource. And what you get back from searching for everything that has given tag(s) would not be a list of tags, but would be a list of resources (like FHIR’s Bundle).

If you’re looking to search for objects by tag it would be best as:

GET .../patient?tag=tag1

Alternately I could see it as

GET .../tagsearch?resource=patient&tag=tag1

and this second should return a list of links.

Basically having a resource named “tag” implies you’re getting the tags themselves, so we should give it a clearer name.

Why not also include tags on the default representation? Performance concerns?

I’m fine with including tags for all representations. From @burke and @darius’ responses, adding/removing tags would be:

POST .../ws/rest/v1/patient/patient-uuid {...., "tags": ["tag1"]} //Add tag1 to a patient
POST .../ws/rest/v1/patient/patient-uuid {...., "tags": ["tag1", "tag2"]}  //Add tag2 to a patient
POST .../ws/rest/v1/patient/patient-uuid {...., "tags": ["tag1"]}  //Should effectively remove tag2 from a patient

And searching for object(s) by tag(s) would be:

GET .../tagsearch?resource=patient,concept&tag=tag1,tag2

The option above would support searching by multiple tags across multiple domain objects instead of just one. I prefer the request param name as type instead of resource since it would be consistent with others areas in the code base that are type sensitive.

I’m assuming you all agree with the revised version i last posted.

The other issues we need to decide on are:

  1. Do we want to implement this in a module or core?

  2. Should we have a tags table and a mapping table between the tags and the tagged objects? Or should we just have a single table that contains or the details about the mappings? With the latter approach, it means the single table will be kind like the concept_name table as opposed to the first where it would be similar to location tags.

This doesn’t seem very RESTful. tagsearch sounds a lot more like a method name than a resource to me. Do you get back a list of TagSearch resources? :wink:

How about something like this instead:

GET ../tag/tag1

Response

{
  "results": [
    {
      "type": "org.openmrs.Patient",
      "value": { /* patient */ }
    },
    ...
  ]
}

Getting objects with either tag1 or tag2 would be done in separate calls:

GET ../tag/tag1?types=patient,concept
GET ../tag/tag2?types=patient,concept

This would leave room for returning additional metadata about a tag (e.g., when it was created, who created it, etc.) if/when we wanted to track those data.

I would favor “core code in a module” approach (i.e., breaking module conventions to put classes into packages as if they were in core, so they can be migrated into core without changing packages). But, I’m not sure a module is even an option. Would you be able to introduce tags as new properties of core objects via a module?

As long as tags are not created independent of being used (unlike location “tags”), it doesn’t matter which approach you use. In other words, a client should be able to a tag a patient as “foo” without having to first create a “foo” tag. Likewise, once the last “foo” tag is removed from an object, then there shouldn’t be a “foo” tag laying around disconnected from anything. If nothing is tagged as “foo”, then there is no “foo” tag. This can be accomplished with either approach, but lends itself to the single table approach. As long as you automatically create & remove tags inside the API, then either internal approach would behave the same for the client.

<aside>Personally, I’d favor renaming LocationTag to LocationType to reduce the confusion and to align how we’ve approached other domain object classifiers (encounter types, patient identifier types, etc.).</aside>

This looks good :smiley:

Yes, we will have to modify existing get Methods for various objects to get tags as a property for these objects, hence core will have to be modified.

for using a seperate module, we can have an independent module which takes care of all general tag related methods involving interaction with the database, but i guess modifying the core will be more appropriate. In this case the REST api will have to be modified accordingly.

If we want to use tags in a String approach then we only need to create a single table, and we don’t need to create “tag” as an object. However if we want to add other properties to tags, or if we want an approach where we create and assign tags to objects rather than directly add a string tag to an object, then creating a separate table’s would be the required approach.

I am not sure if there is a use case, but there might be cases where tags used are important strings and are used pretty often, for grouping data and providing extra info, where concepts and other properties cannot provide info. In such a case, we can store tags in a seperate table. Even in such a case we can have it so that if there is no object that is tagged, then the tag itself will be deleted.

We’re not modifying domain objects in core to have a tag property, my assumption is that tags for a given domain object will be managed via a new tagging service.

I don’t like GET .../tag/tag1, because it implies one is fetching tag resources yet it actually returns a list of other resources, tag needs to be a request parameter given that it’s really a search parameter. Given that this functionality cuts across all domain objects, we need a generic end point like:

GET .../resource?tag=tag1,tag2&type=patient,concept

OR

GET .../openmrsobject?tag=tag1,tag2&type=patient,concept

We also don’t need to include type in the returned matches.

@burke for clarity, we need the rest API to allow one to fetch domain objects with the given tag and not tag resources, if a user wants to view tags along with extra metadata like who created it and when it was created, that’s something else and an end point like ../tag/tag1 would make sense but that’s not what we’re trying to address here.

Requesting a tag resource would return a tag resource as expected, it just would include a list of all resources tagged with that tag as a property.

I’d rather say GET .../taggedobject if we’re trying to noun-ify the search.

But how about this modification of Burke’s proposal. I think this addresses Wyclif’s concern that getting a tag should get you a tag, but it’s also the way that you search.

GET .../tag/my-favorites?v=default

{
    "tag": "my-favorites",
    "usageCount": 5
}

GET .../tag/my-favorites?v=full

{
    "tag": "my-favorites",
    "usageCount": 5,
    "links": [
        {"rel": "patient", "uri": ".../patient/uuid-of-tagged-patient"},
        {"rel": "concept", "uri": ".../patient/uuid-of-tagged-concept"}
        // and the other three
    ]
}

I would strongly vote for doing this in a module. I don’t believe we’ve yet proven that this s actually a valuable feature for real implementations, and I want to avoid changing all of our core XyzService.getXyz(…) to support this until we’ve done so. Further, the hibernate query required to e.g. search for a patient by name, and also on a tag would be very complex and probably have performance implications because we wouldn’t be joining on an indexed field.

So I suggest:

  • do it in a module
  • initially we only support searching by tag +/- type, e.g. using TagService.getTaggedObjects(String tagName) and the REST API above
  • See whether there’s some tricky thing we can do in the REST module to support GET .../patient?tag=abc without having to actually modify the underlying core code.

We do not want to do this. Tags should be just tags, and not grow into something else fancier.

I think an important bit of UX is that when the user is going to add a tag to something, they get an autosuggest behavior. Similar to what Discourse does when you add a label to a post, like this example where I typed forum:

Can we achieve this (and have it perform well) with just a single table? It could be possible, if we put the right index on the table, but I suggest you research this.

(Basically, my first reaction is that we should just have a single table, because having a tag table just so it can have a single tag varchar seems pointless. But the UX scenario above might push me to two tables.)

I’m inclined towards most of the things Darius said, if we’re to have a single object_tag table and wish to support the auto suggest feature, we would have to search against the tag column in the object_tag table.

Lucene, perhaps?

Can we call just call it tag? Or tag_map? “tag” isn’t a SQL reserved word. :slight_smile:

object_tag seems more correct, kind of like location_tag instead tag_map

All the db tables referring to this feature should start with tag for clarity and good organization. Please let’s not call anything object_tag.

Let me be more clear about what I meant to say here.

Myopically, the way to add tag support throughout our core Java API would mean:

  • for each domain object
  • find the XyzService.getXyzs(…) method that takes the most parameters
  • [1] copy that method into a new one that adds String havingTag as an additional parameter
  • [2] also modify the HibernateXyzDAO.getXyzs(…) method

[2] is the one where I’d be concerned about performance impact (because of joining on a non-indexed uuid column). This is purely a technical problem that can be analyzed and solved.

[1] is actually my real concern. We should not do this as I’ve described it here, because it continues to explode the number of API methods we have, without much proven real-world value. Instead, I think we need to refactor our API so that every domain object has XyzService.getXyzs(XyzSearchCriteria). This was done for Encounters (only) as part of TRUNK-4571 in this commit. Once we’ve done that, it becomes feasible to apply something crosscutting like tags to our whole API in a way that keeps the code clean.

1 Like

I was thinking the tagging service would provide a searching mechanism i.e. fetching patients with a given tag, we could add XyzService.getXyzs(…, String… havingTags) but it’s optional. I think it might be trickier to support searching by tags across multiple domain objects since hibernate doesn’t support it, we would have to bake in some custom code to achieve it.

Yes, implied in my comment, but I didn’t say this explicitly:

We should support TagService.getTaggedObjects(String tag, Class xyz), but we should delay adding this to all the other services until we’ve proven the utility.

@jtatia needs to get started with the coding phase and we need to come up with decisions, I think we should initially implement this in a module with the org.openmrs namespace since we intend to move this into core. This means we don’t have to worry about including tags in the full representation in rest or changing core code for now.

Some questions:

What is a better module Id? tag, tags, tagging And to those working on any GSoC projects on new modules, do you have jira projects for those modules?

My vote: openmrs-module-tag