Replacing Modulus: Design Thoughts

It’s time to kill off the current modules.openmrs.org, because (a) the codebase is unwieldy and we haven’t been able to support it, and (b) we want to evolve to more distributed model where not everything has to belong to and be hosted by OpenMRS.

I propose the following:

  • We build a very thin application that functions as an index of OpenMRS modules
  • We do not store or host modules ourselves
  • Support indexing modules from a few (pluggable) providers, e.g. mavenrepo.openmrs.org, bintray, and github releases.
  • We should also support OWAs (and storage on npm) and other types of things
  • The functions we need to support are:
  • MVP:
    • as a user I want to browse/search the index
    • as a user I want to download a module version
    • as an author I want to add my module (that I have published via maven, bintray, or github) to the index
    • as an admin I want to approve/remove a module from the index
  • Probably also:
    • support for installing a module from the OpenMRS legacy UI manage modules page
  • Future
    • tagging modules (e.g. “in Refapp 2.5”) and letting users browse by tag
    • display compatibility of module versions with openmrs-core versions and/or search based on this

I’ve been thinking about the simplest possible way to do this (e.g. “serverless” and don’t have to mess with auth schemes or OpenMRS ID integration):

  • the database of modules to index is just a google sheet
  • authors can submit their modules via a google form that adds to that sheet
  • admins approve a module (or mark it as spam) by setting a field in the sheet
  • server component using spring-boot (to leverage OpenMRS community’s Java/Spring experience)
  • this watches the google sheet, and regularly checks for updates on any of the modules we’re watching, and indexes them
  • “index” can be in-memory, or stored ephemerally with no need to back it up, since it can be regenerated at any point (long-term data is in the google sheet)
  • exposes a simple private REST API to the web client
  • web client component is a simple JS+REST single-page application
  • provides read-only access so there’s no need to implement auth

What do people think about this?

PS- It’s also worth looking at the OWA App Store project from GSoC: wiki page, talk post, blog post. That project takes a similar approach, but specifically uses bintray, and assumes everything is stored in one (OpenMRS) bintray account.

Do people think my idea of “distributed publishing” (i.e. modules are primarily published to a host of the author’s choice, using bintray/github/maven’s standard tooling) has value? Or would it make more sense to just make some incremental changes to this GSoC project?

Don’t the providers (mavenrepo.openmrs.org, bintray, and github releases.) have some sort of REST API which our thin client can be based on?

I mean that our server component would know how to talk to maven, bintray, and github using their own REST APIs.

And it would expose all modules (regardless of where they’re indexed from) via a simple unified REST API that our web client is based on. So the web client wouldn’t know about maven vs bintray vs github.

Oh i see! That would be cool. :slight_smile: Do you mind also explaining why the simple REST API to the web client would have to be private instead of public?

Two things:

  1. Don’t want us to commit to maintaining/documenting/versioning this REST API. Its purpose is to let us write a web client, not to spur external usage
  2. Don’t want to bother having to protect against DDOS attacks

(If later we wanted to invest more in this we could make it public, but I don’t really know why we’d want to.)

Instead of using a google spreadsheet, I’d probably go with a git repository (a yaml file, or a bunch of yaml files on a public git repository). A git repository makes is very easy to review changes (pull requests), run commands on changes (run a CI job), and it’s very visible who did what and when. Also, git repos are particularly easy to change from CI if necessary.

The decision can be to either auto-discover new releases of a module (with the option to ‘hide’ some undesired versions from the UI) or to use the configuration files to add new versions. I tend to prefer the former (less manual interactions) but I can certainly understand if there’s a desire to approve every new version which should be released.

To make the MVP even simpler, I’d just choose one backend to start with. Not sure which one, though.

From the deployment perspective, I’d like it to be a docker container. Unless you want to go full serverless, because I’m quite happy deploying a javascript-only application to S3, using dynamoDB or whatever (have I gone too hipster here?). Other bits and pieces are not expensive either to use if desired (like CDN/cloudfront or lambda for the module version autodiscovery).

Ooooh, git repo…clever! That would be a bit less user-friendly for the person asking to have their module indexed, but it would be easier to implement.

I agree that once a module is added to the index we should autodiscover new versions.

For the MVP we should start with just the mavenrepo.openmrs.org backend, since all the reference application modules are already there.

I was thinking to deploy it as a docker-compose of (1) the spring-boot server app (which can also serve the static content of the webapp), and (2) elasticsearch/mongodb for the index. (And since the index can always be rebuilt, we don’t need to care about making the db persistent.)

Now that I’ve written this, it occurs to me that we could even just use an in-memory DB, or just a list of objects, in the server app. (500 modules * 10 versions each * 10k per version = 50MB of memory). The downside of this would be that the index is lost every time we deploy newer code, so it’s probably better to use a separate docker container for the DB.

PS- For my own curiosity I would use redis for the index, or do everything with aws lambda, but that’s not justifiable given the point is to quickly build something that is easy for the OpenMRS community to maintain.

Yeah, when I said ‘docker image’ I meant it as a docker-compose constellation :slight_smile: I’m OK about making the index cache as a volume in the host, would be faster for regular deployments on the same host.

PS: Just for the sake of completeness, even if the app was S3 only, we’d still could use redis or memcache from AWS:

Stepping back, I’d like to hear from people like @burke, @wyclif, @mogoodrich, @mseaton, etc about whether this is worth doing or not.

(I have some time this weekend where I might spike on it.)

@darius, I am very supportive of this idea. We have to replace modulus. I still think we should continue to push to use bintray for our release artifacts, and to get ourselves approved on the oss.jfrog.org Artifactory instance. These shouldn’t be mutually exclusive. I think having our own lightweight app that indexes these will enable us to do some additional useful things that have been long overdue, like making it easier to find modules by function, feature, or author, and enabling things like crowd-sourced certification of version compatibility. And it won’t commit us to a single artifact repository down the road. If nothing else, it will allow us to build something a little more consumer friendly than a developer-focused artifact repository.

We definitely need to learn our lessons from modulus and ensure that whatever we put together, that it is something that the community can maintain and contribute to easily. I really like @cintiadr’s idea of using a github repo for all of the back-end data and configuration, rather than a google sheet. That was my only real question mark with your original proposal, and I think a simple github repo containing structured text files will be straightforward and familiar to everyone, and will allow us to maintain the same type of permissions and review processes as we do with the OpenMRS codebase

Mike

Definitely +1 for this.

I agree with text/yaml files in github instead of a google spreadsheet.

If you do go the Amazon Lambda root, I’ve got some experience with it.

Sounds like a great idea to me, and more inclined towards github than a google spreadsheet

@raff are you reading this? :smile:

+1 as well to the idea

+1 to Spring Boot, Docker and github for config

I’d stay away from Amazon Lambda and Redis and use the plain old SQL db e.g. H2, which can work in-memory or be stored on disk and can be bundled in a jar (no other requirement than JVM). The less external dependencies the better for such a small app.

I’m all for replacing Modulus with something people can easily contribute to going forward. Some additional thoughts:

  • We want to continue supporting legacy URLs + responses, so we don’t break every instance of OpenMRS. Ideally, we can follow Atlassian’s example and eventually move module management into a module that itself can be upgraded independent of the core, giving us a path to drop legacy support.
  • I’d think we’d like to try to maintain data beyond just a list of modules – i.e., download counts (have been and will continue to be useful data) and anticipating tags, ratings, and comments in the future.
  • Accountability over authorization for now. While using OpenMRS IDs is nice, controlling updates via GitHub access can work (at least up until we want people to be able to rate or add comments).
  • Keep it simple. For example, a single JSON document for each module containing all metadata. Ratings or comments could be layered on.

For example, if each module is a single file in a folder of a github repo, then new entries and changes can be submitted via pull requests, pushes, or properly authorized scripts. I’d imagine layering of ratings or comments would need a separate datastore local to the app.

Deploying as a docker-compose app is really the only option these days. :slight_smile:

Thanks all for the input and feedback. I began working on this over the past couple of days, and I’ve made a good start.

I decided to call the project “Add-On Index” which sounds a bit awkward, but does make the points that (a) it can cover modules and OWAs, and (b) it’s an index, not a repository.

There is one file (add-ons-to-index.json) that lists all of the add-ons we want to be tracking, with minimal details. Anyone who wants to have their omod/owa indexed would send a PR, adding its details to this file. Thanks @cintiadr for this idea. The application periodically updates this file from github, and then it fetches details/versions of the modules one by one, and indexes them.

The server application uses Spring Boot. This felt very familiar (because of using Spring in OpenMRS) but the latest Spring Boot just makes everything you want to do really easy. Spring has done a great job with this.

I only implemented the “OpenMRS Maven Repo” handler; we should add them for Bintray, etc. And I’m not using any sort of database; the “index” is currently just a List of objects.

I created a small React app that will list the add-ons, let you search, view, and download a version (by linking you through to the place where it’s actually hosted). It needs some CSS love, but I’m not going to get distracted by that for now.

I tried out Snap CI (a ThoughtWorks product) for this project, and honestly their UI for setting up a multi-stage deployment pipeline is amazing. Once I can get some OpenMRS server space, I’ll set it up for CD.

Also, I spent a lot of time on the README. It should be easy for anyone to build and run this, assuming you’ve got Java 8 and Node/NPM installed. (Thanks @mseaton for running through this and finding the error in my instructions.)

Code: https://github.com/djazayeri/openmrs-contrib-addonindex CI: https://app.snap-ci.com/djazayeri/openmrs-contrib-addonindex

Anyone interested in getting involved?

Some of the next things to do:

  • move the github repo over to OpenMRS
  • docker packaging
  • make it look pretty
  • support for required OpenMRS version, and for required module versions
  • support Bintray and prove it works by indexing https://bintray.com/openmrs/owa/openmrs-owa-conceptdictionary
  • show download stats (from bintray)
  • show ratings (from bintray)
  • use a real database with full-text search capabilities
  • implement the legacy API
  • tags/labels
  • “lists” of add-ons e.g. “list of modules in refapp 2.5”
1 Like

Continued progress will be noted at Replacing Modulus: Project Status Updates

(I’m also trying out a new approach to status updates for the PM call.)

Burke requested that “download counts” is a key feature, and that he uses this feature in the existing modulus.

(I disagree that it’s actionable info for OpenMRS, or at least I don’t think we’ve used it in any meaningful way, but I don’t want to argue about that.)

@burke (or others), please define how “Download Counts” should work in a modulus-replacing Add On Index, i.e. what’s the MVP and/or ideal story for this feature.

A few background points:

  • currently modulus collects a count for the number of downloads of each module version, with no additional info
  • fancier services like bintray and sourceforge track individual downloads with timestamps and country downloaded from
  • in this replacement we aren’t hosting the modules, so downloads may happen outside of our webapp.
    • bintray tracks downloads itself, and I believe we can get stats for the last 30 days, but not longer
    • mavenrepo.openmrs.org does not track download counts
    • we could choose based on where the addon is hosted whether to track ourselves, or periodically pull down stats from the host system
    • we would need to decide what granularity of stats we care about presenting in our index (vs linking out to see more detailed stats on the host)

I have definitely always felt that our download counts are misleading, since downloading a module from modules.openmrs.org is only one way of obtaining an omod (and it is the one that many of us use least often, since much of our tooling downloads omods from Nexus, or modules are automatically included in various distributions).

If we are able to aggregate across multiple sources relatively accurately, then I do think this is useful, but misleading or bad data is often worse than no data at all. Do you know if Bintray or Nexus is able to distinguish between “new” requests and “repeated” requests? If the download count on artifacts increase every time a build runs that requires these, then this is obviously going to dramatically skew the results…

Interested in other thoughts, Mike

I have personally been using download counts to get a rough estimate of how many end users (non nexus users) are willing to try out a module. I do not take the counts to be necessarily accurate, but they give me a sense of direction as to whether i should continue investing time in a module or not. So taking out this functionality because it is not an accurate reflection of the true download count, seems like a bad idea.