Hey everyone, @jayasanka brought this up on a recent call and @dkayiwa raised it again today - an automated AI pipeline that does a first-pass review before a human maintainer has to touch a PR. Daniel’s ask was simple: bring ideas to Talk. So here’s mine.
I’ve been spending time in the PR queue lately and I have some observations I think are worth putting on the table before the conversation gets too abstract.
The Actual Problem
Maintainer time is finite, and right now a lot of it goes toward PRs that are fault on similar issues week after week. Missing ticket links. Changes that nobody asked for. Minor UX changes that weren’t approved. During GSoC season this gets bad; the volume spikes and the ratio of review time to merged value gets ugly fast.
Before we get to AI: two things we can do right now
While working on various PR reviews from newcomers I noticed this most common pattern. A lot of the PR noise we’re trying to filter out with automation is not a PR problem in the first place but actually a ticket vacuum problem. When there aren’t enough approved, well-scoped tickets for contributors to pick up, they manufacture their own tickets, creating issues nobody asked for, or skipping the ticket step entirely and just pushing changes. Something @ibacher also mentioned. You can’t entirely blame them. They want to contribute and the project didn’t give them a queue to work from.
Fix 1: A curated ticket backlog
The most effective thing we could do before writing a single line of automation is have someone from within the project, someone who actually understands the roadmap, consistently curate a backlog of real, workable tickets. Not issues created by newcomers trying to find something to fix, but tickets that reflect actual project needs, scoped well enough that a new contributor can pick one up without needing deep context. That’s how we keep both the contributors’ and the reviewers’ efforts focused on things that actually matter.
Fix 2: A ticket check in CI
The second thing: a GitHub Action that checks for a linked Jira ticket before a PR can be marked ready for review. I think we should have this check already, something that parses the PR body and fail the check if no valid ticket reference is found. The obvious concern is maintainer and automated PRs that legitimately don’t need tickets; that’s handled cleanly either with some bypass label that org members can apply, or by scoping the check to external contributors only. One refinement worth building in from the start: the check should verify the ticket actually exists in Jira, not just that someone typed a plausible-looking ID. Otherwise it’s a gate people learn to spoof in about ten seconds. Please direct me if I am missing something here.
What can AI get done?
Fix 3: Scope discipline
Another pattern I keep seeing: PRs that bundle unrelated concerns. TypeScript clean-up alongside a UI change. A bug fix that quietly refactors something adjacent. These are harder to review, harder to revert cleanly, and harder to trace in history when something breaks.
The conventional solution here is a documented scope rule in the contributing guidelines: one logical concern per PR. The AI check layer could potentially flag when changed files span very different parts of the module tree, or changes in the same file are disconnected to the PR title/body or other parts.
Other Ideas
The AI can look into the PR body and the changes and check against the following:
- Common Coding Conventions/Security/Breaking Changes
- Inconsistent Patterns. For example, someone used an inline CSS in a codebase that uses SCSS classes or a change that doesn’t match the original style of how the file is written.
- OpenMRS-specific patterns. Easy example, hardcoded strings not wrapped in
t() - Description Adequacy: For example, in case of a UX work, asking to add supporting discussion links that directed those changes.
- Coherence: does the PR description match what the diff actually does? \
The failure mode I’d worry about is an AI commenting confidently on things it can’t reliably evaluate; that erodes trust in the pipeline fast, and once contributors stop reading bot comments the whole thing is worse than useless. I’d recommend using it as a preliminary filter that helps the contributor spot issues and correct them before a human even goes through it, rather than as a help to the reviewers themselves.
These were my observations and ideas. Curious about what others think, also, whether others are seeing the same patterns in the queue, or if what I’ve been looking at is skewed.
Let’s discuss this.