Architecture Proposal: Module Filter Load Order

ibacher · December 17, 2021, 10:55pm

TLDR; the important part is the recommendations below.

Servlets and Filters

You can skip this if your familiar with the servlet API.

The Java Servlet API provides with two basic classes:

Servlets with process web requests and return responses, similar to how CGI (the web interface, not the movie graphics) and its more modern variants work.
Filters, which provide the equivalent of “middleware” in most web stacks.

Generally, Servlets are used for providing the (equivalent of) dynamic pages while Filters are used for providing cross-cutting concerns. Common use-cases for filters are: authentication, logging, compression, etc.

Servlets and Filters in OpenMRS Modules

In core, we provide the same concepts for modules, using the classes ModuleServlet and ModuleFilters. Essentially, these two classes are mounted in the web context using the normal mechanism (web.xml) and then they separately load the servlets and filters defined in modules to allow them to process requests.

Servlets, in this model, have a named path, meaning that requests for this named location will (eventually) be served by the servlet that owns that path. For example, as an implementation detail, the OpenMRS FHIR server provided by the FHIR2 module is also accessible at /ms/fhir2Servlet (/ms/ is the mount point of the general ModuleServlet and fhir2Servlet is the name of the HAPI FHIR Servlet we use to provide the API).

Filters, on the other hand, are loaded around requests to servlets and executed based on whether they match a URL pattern. For example, a filter with the URL pattern /* is supposed to run on any request to the web server, whereas one with the pattern /ws/fhir2/* (used a few times in the FHIR2 module) is supposed to only run on requests that start with /ws/fhir2/.

Because each request can have many filters (but only one servlet), the order in which the filters are executed becomes important. And here’s where we have a problem: the order in which module filters are loaded is difficult to reason about and basically impossible to control. Concretely, this means that the ways in which filters loaded from different modules interact can be unpredictable in a running environment.

The Issue

This problem was brought to my attention by @ruhanga, who was trying to get the oauth2login module (which uses filters to provide OAuth2 authentication in front of OpenMRS’s usual authentication mechanisms) to work with FHIR2 (which provides a filter to support HTTP Basic authentication and one to forward requests from /ws/fhir2/ to the servlet I mentioned earlier). This later filter, unfortunately, short-circuits the remaining filters and so never provides an opportunity for the OAuth2 filter to provide authentication. There are some ways that we can solve that particular problem, but it seems like it would be easier if we had a way of ensuring that all filters in the OAuth2 module could execute before any filters in the FHIR2 module (and this would also solve the related general problem and honestly allow for cleaner overall solutions to custom authentication—e.g., 2FA as a module—or other uses of filters).

Analysis

This is probably boring. Onward to the recommendation!

This general issue arises because, as I said above, the order that module filters get executed in is hard to reason about and very hard to control. Let’s look into why that is.

When a request comes in and it (eventually) hits the ModuleFilter. ModuleFilter is actually quite simple: it creates a ModuleFilterChain from the list of module filters that match the request URL. (A FilterChain is one of the concepts from the Servlet API… the chain gets passed to each filter which is allowed to either call chain.doFilter(request, response) to continue running further filters or to not call the chain to break the execution flow. The ModuleFilterChain is the same concept except that the chain is made up of module filters followed by any remaining web context filters).

The ModuleFilterChain itself just runs through the filters in iteration order, calling each one in turn. The iteration order of the module filters is determined by the result of the call to WebModuleUtil.getFiltersForRequest(). The getFiltersForRequest() call is a little more complicated, but essentially, it takes the request path and then iterates through each registered module filter. If the module filter was configured with a URL pattern that matches the requested URL, the filter is added to the list of filters returned.

For our purposes, the key question is what order are the filters that the getFiltersForRequest() call iterates over stored in? The order is determined by the order in which filters are saved to this list and filters are added to that list as calls are made to the loadFilters() function for a module. Inside loadFilters() for each module, the filters are loaded in the appropriate order (top → bottom as the <filter-mapping> entries are encountered in config.xml). These loaded filter mappings are then appended to the end of the list of filters.

loadFilters() itself is called in two places, either as each module loads or after the initial Spring refresh is done (this is usually when filters are loaded on startup). The order in which loadFilters() is called for different modules depends on the order they are returned from ModuleFactory.getStartedModules() and this order in turn is determined by the order in which they are returned from a WeakHashMap. A WeakHashMap has essentially the same order guarantees as a HashMap, which is to say consistent (as long as items are not added or deleted between iterations), but very unobvious.

Recommendation

loadFilters() (and loadServlets(), though this is less important) should be called in the order that the modules were loaded. This is to ensure that the order in which filters are executed is consistent with the order in which modules are loaded. This will ensure that filters for modules are loaded in the same order as modules themselves, and thus easier to reason about.
Filters should be loaded by prepending new filters to the list of filters.

2 might take a bit of explanation. The idea here is this: filters can have effects that apply either before a request is handled to the servlet (e.g., authentication), after a request is handled by the servlet (e.g. compressing the result) or do both of these things. Filters that have effects before the request reaches the servlet should also have their effects before the request reaches any filter from a module they depend on. This makes the solution to the problem I outlined above simply a matter of making the OAuth2 module aware_of the FHIR2 module, so that its authentication filter runs before the FHIR2 forwarding filter. Similarly, filters that have their effects after the request reaches the servlet should be able to be applied after all the filters of modules they depend on have run.

Finally, and this is incidental to all of the above, (so is probably a separate platform ticket): we should replace the caches using a WeakHashMap<String, Module> with ones using HashMap<String, WeakReference<Module>> as this expresses what I assume was the original intention. (WeakHashMaps hold WeakReferences to their keys, meaning when their keys are garbage collected, their entries are effectively removed from the map. The idea of a WeakHashMap is that you can store metadata about an object that doesn’t prevent that object from being garbage collected. However a WeakHashMap where the keys are strings is basically a less performant HashMap because Java Strings are stored as constants in the string pool and never removed by the garbage collector; a map where the values are WeakReferences would allow the values—the module objects in this case—to be garbage collected when there are no other references to them. The only issue is that a map with values of weak references needs to periodically poll itself to remove keys that point to values that are now null).

burke · December 18, 2021, 7:00pm

Thanks for the thoughtful proposal and excellent explanation, @ibacher!

I agree with your solution and I’m glad to avoid the anti pattern of assigning relative weight/priority numbers to try to implicitly handle sorting (and inevitably becomes a race to the top).

Could the get/iterator methods be designed to remove/filter nulls just in time (e.g., skip nulls or behave as if a null entry doesn’t exist & trigger cleanup only when needed)?

dkayiwa · December 18, 2021, 9:09pm

I like this proposal.

This makes a lot of sense and i wonder how we missed it!