Hi all! I wanted to start this as a way of getting some community discussion going on AI-generated code in the community, as we’re seeing more contributions that are clearly made with AI. The intention here is not to ban AI, but figure out how we as a community can take advantage of the growing capabilities of these agents to the benefit of the projects we’re building.
AI coding agents are very capable of generating code—sometimes even excellent and efficient code—but they also have important caveats and limitations. For example:
- AI agents tend to be “better” when using common languages and frameworks in common ways. Much of OpenMRS’s technical stack is built with common languages and frameworks (Typescript + React, Spring + Java), but not necessarily used in common ways.
- AI agents are currently quite bad at understanding the context of a modular application developed across numerous repos, the way OpenMRS is. That is, AI agents will tend to pick-up on resources and patterns that already exist in a given repo, but will not know how to leverage resources and patterns not in the current repo. E.g., it will be completely unaware of APIs available in, e.g., another module or even OpenMRS Core and often in existing dependencies. That is to say, AI agents often have a worse understanding of the OpenMRS code-base than your average IDE.
In addition to the technical sort of limitations of AI agents (and at least the second issue I outlined is likely to be improved in the next 3-4 months), there are non-technical considerations as well.
With AI tools, the role of a developer changes a bit, but I don’t think the fundamental responsibilities we all have as developers change much. As developers, we have a responsibility to create code that solves real problems in a way that can be understood and maintained by others. Part of the way of doing this is ensuring that your code adheres to our standards and conventions (backend, frontend). But beyond that, you are also responsible for reviewing the output of the tools you use to ensure that they conform to those standards, conventions, and the common patterns we use across the code base.
This brings up what is the main issue with AI coding agents to contribute to an open source project—pretty much any open source project: LLMs tend to confidently produce an answer for the instructions you give them, but it may not be the “right” one. This is basically what LLMs are trained to do. They generally won’t stop to ask questions like:
- Is this feature I’m working on well-specified enough to build?
- Is the place I’m telling the agent to build this code the right place to build it?
- How are similar features built in other OpenMRS modules?
In my experience, the best results with AI coding agents are produced not by simply prompting the agent with the problem or challenge you are trying to solve, but in doing at least some of the intellectual work of software development up front, i.e., having a clear understanding of what the problem is, analyzing why and where the issue occurs, coming up with a reproduction of the issue (ideally as some kind of unit test or tests), looking at similar solutions to a problem.
Working with AI agents can, in some ways, feel more like a higher-level role, i.e., your main responsibility may no longer be writing the code but reviewing it, ensuring that it’s not just a solution, but a good solution, that the solution is understandable, maintainable and won’t easily regress.
Concretely we’ve been getting a lot of LLM-written or largely LLM-written submissions and it’s been raising a few issues. Primarily, we rely on community members to review PRs as well as write them. Code reviews can be a time-consuming process, which is why we usually have a checklist of minimal things you can do to ensure your code is more likely to be easy to review. This checklist is just a bare minimum. You should be doing more to ensure your code meets quality standards. Making your PRs as clear, concise, and readable as possible is the best strategy to ensure your PR is reviewed and merged—regardless of whether you use an AI coding agent or not.
Long PRs and PRs with many changes irrelevant to the change are less likely to be reviewed or accepted. You want to make sure that when you open a PR, it looks like something that could be merged in, and the best way to do that is to familiarize yourself with what PRs we already merge look like and ensure your PR looks as much like that as possible.
As a last piece of advice: you should almost certainly be hand-editing the code even if you’re using a coding agent to generate most of it. Agents have a tendency to write code that is plausible, but not exactly right. There may be a better way to express things, or things just need to be slightly tweaked, etc.
I don’t want to leave this as just a post scolding, so I’ll point to how I think this can be done. For example, here is a PR I wrote that was primarily coded by Claude Code. Creating that PR took a series of conversations; I wrote something like 70 individual messages, both an initial prompt, course corrections on various things that it was doing, ensuring we were running through the plan (outlined separately) in a reasonable order, detailed instructions on a few really minor technical points (e.g., how to correctly add a new dependency and how to update dependencies; Claude consistently forgot that if it adds a new dependency, it needed to re-run yarn install for the dependency to be installed).
FWIW, here was my initial prompt (yes, ideally some of this would be in a CLAUDE.md or AGENT.md file):
Inside the notes folder, there’s a document called “UNIT_TESTING_PLAN.md”. We’re going to start working on that. Since the state-management component is kind of core to a lot of the other key functionality (e.g., auth data and extensions are stored in state-management stores), that seems like the logical place to start.
For testing, we’re using vitest with appropriate mocks. Since there are no tests in esm-state currently, we’ll need to start by adding the
vitestlibrary. We’re currently using^3.1.4and for compatibility with the rest of the packages in the monorepo, we’ll want to use exactly that string. We’re usingyarnfor package management andyarn workspacesfor monorepo organization. Monorepo-wide tasks tend to be run withturbo, but each package’s tests should be executable using, e.g.,yarn workspace @openmrs/esm-state test.We’ll also need to add at least a
testscript to the packages without tests and avitest.config.tsfile, which should be as minimal as possible. While some use of a setupFile is fine (which should be namedsetup-tests.tsand live at the same level as package.json), I prefer that we import test code explicitly in the tests themselves.Please read through the document and these notes and then start setting up the project. Ideally at the end of the setup phase we’ll have a single minimal test to prove the configuration works. Use the esm-expression-evaluator package as a model and feel free to ask questions for any points that need to be clarified. I prefer we clear things up before writing code, even though code can be rewritten.
You might notice:
- I had a Markdown file already describing a plan for how to proceed
- While the final PR ended up upgrading to Vitest 4, initially, we were working with Vitest 3 (I ended up adding in code coverage support and it was simpler to do this while migrating to Vitest 4)
One of the things this PR added tests for was the shallowEqual() function from @openmrs/esm-utils, which is meant to mimic React’s shallow-equals comparison. This actually caused the agent a lot of difficulty with because it sort of assumed that the implementation was something like:
export function shallowEqual(a: any, b: any) {
return a === b;
}
Rather than the actual implementation (which is a bit more complex)
Now for the more interactive part of this:
- What standards should we, as a community, have for LLM-written code?
- How can we best ensure that LLM-written code doesn’t become a pain to review and so is actually meaningfully contributing towards the goals of OpenMRS?
- Are there considerations I’ve missed? Things I’ve gotten wrong here?