[Discussion] AI-Assisted PR Reviews in OpenMRS

Hey everyone, @jayasanka brought this up on a recent call and @dkayiwa raised it again today - an automated AI pipeline that does a first-pass review before a human maintainer has to touch a PR. Daniel’s ask was simple: bring ideas to Talk. So here’s mine.

I’ve been spending time in the PR queue lately and I have some observations I think are worth putting on the table before the conversation gets too abstract.

The Actual Problem

Maintainer time is finite, and right now a lot of it goes toward PRs that are fault on similar issues week after week. Missing ticket links. Changes that nobody asked for. Minor UX changes that weren’t approved. During GSoC season this gets bad; the volume spikes and the ratio of review time to merged value gets ugly fast.

Before we get to AI: two things we can do right now

While working on various PR reviews from newcomers I noticed this most common pattern. A lot of the PR noise we’re trying to filter out with automation is not a PR problem in the first place but actually a ticket vacuum problem. When there aren’t enough approved, well-scoped tickets for contributors to pick up, they manufacture their own tickets, creating issues nobody asked for, or skipping the ticket step entirely and just pushing changes. Something @ibacher also mentioned. You can’t entirely blame them. They want to contribute and the project didn’t give them a queue to work from.

Fix 1: A curated ticket backlog

The most effective thing we could do before writing a single line of automation is have someone from within the project, someone who actually understands the roadmap, consistently curate a backlog of real, workable tickets. Not issues created by newcomers trying to find something to fix, but tickets that reflect actual project needs, scoped well enough that a new contributor can pick one up without needing deep context. That’s how we keep both the contributors’ and the reviewers’ efforts focused on things that actually matter.

Fix 2: A ticket check in CI

The second thing: a GitHub Action that checks for a linked Jira ticket before a PR can be marked ready for review. I think we should have this check already, something that parses the PR body and fail the check if no valid ticket reference is found. The obvious concern is maintainer and automated PRs that legitimately don’t need tickets; that’s handled cleanly either with some bypass label that org members can apply, or by scoping the check to external contributors only. One refinement worth building in from the start: the check should verify the ticket actually exists in Jira, not just that someone typed a plausible-looking ID. Otherwise it’s a gate people learn to spoof in about ten seconds. Please direct me if I am missing something here.

What can AI get done?

Fix 3: Scope discipline

Another pattern I keep seeing: PRs that bundle unrelated concerns. TypeScript clean-up alongside a UI change. A bug fix that quietly refactors something adjacent. These are harder to review, harder to revert cleanly, and harder to trace in history when something breaks.

The conventional solution here is a documented scope rule in the contributing guidelines: one logical concern per PR. The AI check layer could potentially flag when changed files span very different parts of the module tree, or changes in the same file are disconnected to the PR title/body or other parts.

Other Ideas

The AI can look into the PR body and the changes and check against the following:

  • Common Coding Conventions/Security/Breaking Changes
  • Inconsistent Patterns. For example, someone used an inline CSS in a codebase that uses SCSS classes or a change that doesn’t match the original style of how the file is written.
  • OpenMRS-specific patterns. Easy example, hardcoded strings not wrapped in t()
  • Description Adequacy: For example, in case of a UX work, asking to add supporting discussion links that directed those changes.
  • Coherence: does the PR description match what the diff actually does? \

The failure mode I’d worry about is an AI commenting confidently on things it can’t reliably evaluate; that erodes trust in the pipeline fast, and once contributors stop reading bot comments the whole thing is worse than useless. I’d recommend using it as a preliminary filter that helps the contributor spot issues and correct them before a human even goes through it, rather than as a help to the reviewers themselves.


These were my observations and ideas. Curious about what others think, also, whether others are seeing the same patterns in the queue, or if what I’ve been looking at is skewed.

Let’s discuss this.

Hey, this is a really thoughtful breakdown especially the point that a lot of “PR noise” is actually a ticket backlog problem, not a PR problem. That distinction is important and often overlooked.

I strongly agree with the idea of a curated, mentor-approved backlog. As a contributor, one of the hardest parts is knowing what is actually needed vs what just looks useful. Without that clarity, people default to creating their own tasks, which leads to the exact issues you described.

The CI ticket check also feels like a high-impact, low-effort win. Verifying that a PR is tied to a real, existing ticket (not just a formatted ID) would immediately filter out a large portion of low-signal contributions without adding friction for serious contributors.

On the AI side, I like the framing of it as a pre-review assistant for contributors rather than a reviewer replacement. Using it to catch things like:

  • missing t() wrappers

  • inconsistent patterns

  • PR scope mismatch

before human review would improve PR quality without overloading maintainers or risking trust in the system.

One small addition: it might also help to introduce a “good first issue” + “GSoC-ready” tagging system with stricter standards, so contributors can clearly distinguish between exploratory tasks and production-impact work. That could complement the curated backlog idea really well.

Overall, I think starting with process fixes (tickets + CI checks) before layering AI on top is the right approach.

Curious to see how others feel about enforcing ticket linkage vs keeping things flexible for small fixes.

1 Like

This is one of the more grounded proposals I have seen on this topic, and I appreciate that it leads with process fixes before jumping to AI as the solution. That ordering matters.

The ticket vacuum framing resonates with me. We have been treating PR quality as a review problem when a significant part of it is actually an upstream clarity problem. Contributors are not being careless. They are operating in an environment where the path to a meaningful contribution isn’t always obvious, so they pave their own. A well-maintained backlog of roadmap-aligned tickets doesn’t just reduce noise; it actively channels contributor energy toward work the project actually needs.

On the CI ticket check, I would push for this to be implemented sooner rather than later. The suggestion to verify ticket existence in Jira rather than just checking for a formatted ID is exactly right — a gate that can be learnt to spoof in ten seconds isn’t a gate; it’s theatre. One thing worth thinking through: how do we handle the edge case where a contributor opens a ticket themselves and immediately links it in their PR? Technically it passes the check, but it’s the same underlying problem. The check might need to include some signal about ticket age or approval state to be meaningful.

On scope discipline, this is the issue I find hardest to review consistently. When a PR bundles a legitimate bug fix with a quiet refactor nearby, the refactor often goes unreviewed because attention naturally focuses on the stated change. An AI layer that flags when changed files span unrelated parts of the module tree, or when the diff diverges from the PR description, would genuinely help here. That kind of structural check is something a tool can do reliably without needing to understand intent.

The failure mode you flagged; AI commenting confidently on things it can’t reliably evaluate – is worth taking seriously. I’d suggest being very conservative about what checks get surfaced as blocking vs informational in the early rollout. Start with high-confidence, low-ambiguity checks (missing t() wrappers, hardcoded strings, absent ticket links) and treat everything else as advisory comments the contributor can choose to act on. Let the pipeline build a track record before expanding its scope. If contributors start ignoring bot comments because they’re noisy or wrong too often, that trust is hard to recover.

One thing I would add to the discussion: contributor feedback on the automated checks matters. If someone gets flagged for something and the flag feels wrong or unclear, they should have a lightweight way to surface that. Otherwise, the feedback loop for improving the pipeline depends entirely on maintainers noticing patterns, which is exactly the bandwidth problem we’re trying to solve.

Overall the sequencing here; backlog curation, then CI checks, then AI layer on top — seems right. Each step is independently valuable and doesn’t depend on the next one working perfectly. That’s a more robust rollout path than trying to solve everything at once.

1 Like

This is a fantastic discussion, and I fully support the “Process first, AI second” approach that @sbuwule and @stillchamp highlighted. Curing the ticket vacuum will fix the root cause, while AI acts as a safety net.

To push the “how” forward based on how other open-source projects are handling this, here are a few technical implementations we could consider:

1. Structuring the CI Ticket Check to prevent spoofing @sbuwule raised a great point about contributors creating their own tickets just to bypass the CI gate. We can easily prevent this “theatre.” Our GitHub Action shouldn’t just check if the Jira ticket exists; it should query the Jira API to check the ticket’s Status or Labels. If a PR references a ticket that isn’t sitting in “Ready for Work” or doesn’t have an “Approved” label, the CI check should fail with a helpful message directing them to the curated backlog.

2. The Feedback Loop & Mitigating AI Trust Loss The concern about AI hallucinating and eroding trust is very real. We can mitigate this by implementing a reaction-based feedback loop. If the bot leaves a comment that is wrong, we instruct the contributor to react with a :-1:. We could easily hook that up to log the failure, giving maintainers a high-signal way to tune the AI prompt without having to read every single PR.

3. Strict Rule “Allowlisting” for the AI Instead of letting the AI loosely review the code (which is when it hallucinates the most), we should use a Strict Allowlisting approach. We can pass a custom_rules.md file to the AI’s context window that strictly says: “ONLY comment if you detect a violation of these 3 rules: 1. t() wrappers, 2. Hardcoded Strings, 3. Scope mismatches. Do not provide general refactoring advice.”

4. Tooling Recommendations Before we build a custom AI pipeline from scratch, we should look into existing, battle-tested tools. For example:

  • PR-Agent by Qodo: It’s open-source, highly customizable, and allows injecting custom project guidelines natively.

  • CodeRabbit: Often free for OSS projects and is exceptionally good at summarizing diffs and checking against specific .coderabbit.yaml rules.

By using existing tools with strict prompts and a smart Jira CI check, we can probably get an MVP of this running very quickly as a trial. Would we be open to running a trial on a single repository first to measure the noise-to-value ratio?

1 Like