Metadata Mapping Module development progress

kosmik · October 13, 2015, 7:21pm

Dear team,

I started working on the the Metadata Mapping Module a few weeks back. More specifically, I am implementing the new set of features described at Metadata Mapping (Design Page). Under this topic I will share updates on development progress.

Thus far I have been focusing on design and learning module development basics. Progress has been a bit slow as I am able to commit a modest five hours a week on the project. I have also set up a Metadata Mapping Module wiki page.

Please let me know if you have any questions, requirements or ideas.

Stay tuned!

Mikko

michael · October 15, 2015, 1:31am

I don’t have any requirements or ideas, but I want to give you a VERY BIG THANKS!

This is fantastic to hear and I for one am very grateful!

Please keep everyone posted about your progress and how we can help you along the way … and maybe even find some people to work along side you on this!

kosmik · October 16, 2015, 5:32pm

@michael, you are very much welcome! And I appreciate your kind words. Though I have to say it has been quite easy for me to take on this project. Overall the OpenMRS docs are quite good, this particular task was quite nicely described and @raff has been very supportive, for instance, by immediately suggesting a telco to get things started. So how about some cake and coffee (“kakkukahvit” in Finnish) for the whole community:

Good work, team!

michael · October 16, 2015, 5:42pm

Sounds good to me!

kosmik · October 28, 2015, 6:22am

A quick update:

We now have a JIRA project: https://issues.openmrs.org/browse/MAP
We haven’t yet integrated anything but development is going smoothly and I plan to have the api implemented at the end of next week. And when I say “implemented”, I do not mean “released”. Finishing up stuff for releasing might take another few weeks. Of course we can release an alpha version earlier, if needed.

mseaton · October 29, 2015, 12:08am

Awesome work - great to see this move forward. Thanks!

kosmik · November 15, 2015, 5:45pm

Ok, my schedule estimate was a bit too optimistic. I have had problems allocating time on this project and perhaps @raff also. However, things are still moving forward.

I’m currently implementing the MetadataSet type and related api methods. I made a design decision that differs from the original design specification and was wondering if anyone would like to comment on this?

The original design regarding sets is something like (in javaish pseudo code):

class MetadataTerm {
    MetadataSource source;
    String code;
    // ...
}

class MetadataSet {
    MetadataTerm parent;
    MetadataTerm member;
    double sortWeight;
    // ...
}

Here a set is fetched by using the code of its parent MetadataTerm and sortWeight is used to “optionally give members of a metadata set a reliable sequence”. I was thinking that the MetadataSet api is a bit awkward like this and that we can let Hibernate take care of maintaining the “reliable sequence”. Thus I came up with the following:

class MetadataTerm {
    MetadataSource source;
    String code;
    // ...
}

class MetadataSetMember {
    MetadataTerm metadataTerm;
    // ...
}

class MetadataSet {
    MetadataSource source;
    String code;
    List<MetadataSetMember> members;
    // ...
}

Here members would be mapped as an ordered list of components (a Hibernate concept) where the sequence index is stored in the MetadataSetMember database table. This means the order of items in the members list is preserved (by Hibernate) between saving and loading a MetadataSet. Also, a MetadataSet is identified by MetadataSource and code.

Any comments on this design?

raff · November 16, 2015, 12:59pm

We must be prepared that some MetadataSets may have hundreds of members and require paging. The initial design lets you properly implement fetching a subset of members and paging in a service method. In your proposal we must rely on Hibernate lazy loading and API users careful handling, who should not be calling getMembers().size() for example.

I vote for the initial design.

kosmik · November 16, 2015, 3:31pm

Thanks @raff. If the sets are that big then the performance issue is a valid point. So let’s forget about the ordered list of components mapping.

The other change I proposed earlier is about the MetadataSet “parent” reference to MetadataTerm. Does this design come from an actual functional requirement or could we just drop that reference in favour of MetadataSet being identified by a source/code pair itself? I’m suggesting this because I think it would focus the meaning of MetadataTerm and result in a cleaner api. So we would have something like:

// types

class MetadataTerm {
    MetadataSource source;
    String code;
    // ...
}

class MetadataSetMember {
    MetadataTerm metadataTerm;
    double sortWeight;
    // ...
}

class MetadataSet {
    MetadataSource source;
    String code;
    // ...
}

// service

List<T extends OpenmrsMetadata> getItems(Class<T> type, String metadataSourceName, String metadataSetCode, int start, int limit);
List<MetadataSetMember> getMetadataSetMembers(String metadataSourceName, String metadataSetCode, int start, int limit);

What do you think?

raff · November 16, 2015, 4:30pm

I like that. MetadataTerm would not serve two purposes… as a term and as a reference for a set, which makes a cleaner API indeed.

mseaton · November 16, 2015, 5:31pm

Looks good to me too. Presumably MetadataSetMember would have a reference to MetadataSet.

kosmik · November 17, 2015, 4:19pm

You are right, I missed that one.

raff · November 23, 2015, 3:49pm

I’d like to throw somewhat related question here. Should it be possible to create a MetadataSet containing other MetadataSets? For example to build a MetadataSet for an implementation containing MetadataSets supporting different forms. It seems that the original design without MetadataSetMember allowed for that. I wonder if it was intentional.

kosmik · November 24, 2015, 7:30am

Well, definitely the set inside sets thing would be easier to implement with the original design. I can try to think of other alternatives but the original might still be the best solution.

However, I’m also interested in what would getMetadataSetItems() return? All metadata items that are contained in the set, no matter how deep in the set hierarchy they are? (A nightmare to implement efficiently, btw.) Only immediate children in the set? What about paging? Or should we just skip paging and change to a tree based iteration model?

A quick recap: The current design is that a MetadataSet contains MetadataTermMappings (previously “MetadataTerm”) which in turn refer to OpenMRSMetadata objects. Said objects can be fetched via:

List<T extends OpenmrsMetadata> getMetadataSetItems(Class<T> type, String metadataSourceName, String metadataSetCode, int start, int limit);

mseaton · November 25, 2015, 5:23pm

For reference, this is how things currently work for Concepts I believe (eg. a concept can itself be a set that contains other concepts as members), and I don’t know if performance has been a major issue for people.

kosmik · November 25, 2015, 6:49pm

Thanks @mseaton, I wasn’t aware of that. We are now talking about the following method, right?

org.openmrs.api.ConceptService#getConceptsByConceptSet(Concept)

The implementation seems to load items in each child set separately from the database and just append all the results in a list. This is quite inefficient but I’m sure it can be fast enough in practice if there are not too many sets inside the root set. So far I don’t have any hard numbers on how many items a MetadataSet might contain but earlier @raff commented on my earlier design proposal:

We must be prepared that some MetadataSets may have hundreds of members and require paging. The initial design lets you properly implement fetching a subset of members and paging in a service method. In your proposal we must rely on Hibernate lazy loading and API users careful handling, who should not be calling getMembers().size() for example.

getConceptsByConceptSet does not support paging and using its design to do paging would be basically as inefficient as just loading the whole list of descendant items but returning just a subset (one page) of them. Also, getConceptsByConceptSet does not sort the items but of course it could be modified to do sorting in memory.

If the Concept/ConceptSet approach is deemed acceptable then I can implement that easily. Even the MetadataSet/MetadataSetMember approach I proposed earlier will work if we just allow MetadataSetMember to reference either a MetadataTermMapping or MetadataSet. No problem there.

However, I would be more comfortable if we could take a step back and first think about the requirements of clients. How big are the sets and how many nested sets might they have? How would a client iterate the hierachy of sets? What are all the different use cases? Is paging a requirement for the actual utilization of MetadataSets or just for the management of MetadataSets? Does ordering matter and, if it does, in which use cases?

raff · November 30, 2015, 2:07pm

Let’s bring up your questions on a design call. @jthomas, could you please help us find a slot?

jthomas · December 1, 2015, 6:21pm

Hi @raff,

I have had a couple requests this week for design call time. To try and help organize and simplify things please respond to the post I have created - Want some time on an upcoming design call?

Let me know if you need anything else.

kosmik · December 3, 2015, 8:35pm

I won’t be attending the summit but I’m willing to help in other ways, if needed. Even a telco would probably be possible for me. As long as it’s not at 3 am Finnish time.

jthomas · December 18, 2015, 1:36pm

Just a reminder we have a design forum on Monday, December 21 on metadata mapping @4-5pm UTC. Design Forum 2015-12-21: Metadata Mapping Module & Platform 2.0 Module Support