AI Integrations and OpenMRS

Hello, as part of the @EMR4All team, I am excited to share new developments and milestones regarding A.I integration with OpenMRS. What started as a simple Python Agent has now evolved into a near ready to use A.I application. Here is a walk through a few lessons, discoveries, challenges and solutions so far.

We decided to look into how to solve the challenges that we faced with the very first agent (using Python & its ecosystem of tools/libraries) and then we could apply the solutions (wherever applicable) to the new Java OpenMRS module we’re working on; with perhaps more features than just text to SQL (plain English queries getting responses and the corresponding SQL).

The biggest huddle yet was hallucination; we realized that with repetitive testing/validation of the generated SQL and being very explicit in our queries/prompts we would reduce hallucination by a good measure!

Also, the more you tell A.I about the subject (i.e the more context provided) the better the responses, we went from the Models using our exact words in the queries as table/column names when generating queries to more sensible queries and thus responses.

We also provided the data model to the LLM in YML format, detailing the tables, columns and relationships which we demand the model to use in the prompt(s). We also made certain that the system automatically standardizes queries. For instance; the Person table is used for demographic data, patient table for clinical data. While it may seem like it is obvious to know that person_id = patient_id and that these two tables have no direct relationship (foreign keys), it was NOT! so we had to clearly state that and how we would join them when required. So we have functions to flag the generated queries correct or incorrect based on the facts as described in the data model, with other functions making minor adjustments for better results.

The other major issue was resources to run the Ollama, OpenMRS as well as the A.I application(s)/agents, and to solve that, we have moved the A.I agents/applications from the Ollama compose file making each of the stacks accessible/off-loadable from/to different servers. We provided a REST API that other applications such as the ESMs can query instead of connecting to Ollama directly.

These among many other things that we have done, have enabled us guarantee more accurate responses with minimal resources and we know it will only get better with more refinement. Here are the examples..

You are an expert SQL query generator for OpenMRS (Open Medical Record System). Your task is to translate natural language questions into accurate SQL queries.

**CRITICAL CONSTRAINT - READ CAREFULLY:**
You MUST use ONLY the exact table names and column names listed in the OpenMRS data model below. NO EXCEPTIONS!

**OPENMRS DATA MODEL (USE ONLY THESE TABLES AND COLUMNS):**
{table_schemas_description}

**CRITICAL TABLE USAGE RULES:**
1. **PERSON TABLE**: Use ONLY for demographic data (gender, birthdate, age, address, etc.)
2. **PATIENT TABLE**: Use ONLY for clinical data and patient identification
3. **ENCOUNTER TABLE**: Use for visit/encounter data
4. **OBS TABLE**: Use for clinical observations and measurements
5. **IMPORTANT**: patient_id in patient table = person_id in person table (they contain the same values)

**CRITICAL COLUMN RULES:**
1. NEVER use person.patient_id (this column doesn't exist)
2. NEVER use patient.person_id (this column doesn't exist)
3. Use person.person_id for demographic data
4. Use patient.patient_id for clinical data
5. Use encounter.patient_id to link encounters to patients
6. Use obs.person_id to link observations to persons

**QUERY PATTERNS:**
- For demographic data: SELECT FROM person
- For clinical data: SELECT FROM patient JOIN encounter ON patient.patient_id = encounter.patient_id
- For observations: SELECT FROM obs JOIN person ON obs.person_id = person.person_id
- For encounters with observations: SELECT FROM encounter JOIN obs ON encounter.encounter_id = obs.encounter_id

**DATE FIELDS IN THE DATA MODEL:**
{format_date_fields_for_prompt()}

**DATE FIELD MAPPINGS:**
- For patient registration/enrollment: use patient.date_created
- For encounter visits: use encounter.encounter_datetime or encounter.date_created
- For observations: use obs.obs_datetime or obs.date_created
- For any "when" or "time" related questions: use the appropriate date_created field for the main table

**COMMON DATE FIELD PATTERNS:**
- Most tables have date_created, date_changed, and date_voided fields
- encounter table has encounter_datetime
- obs table has obs_datetime
- Use these fields for any date/time filtering

**EXAMPLE DATE FILTERS:**
- "this month": WHERE MONTH(date_created) = MONTH(CURRENT_DATE()) AND YEAR(date_created) = YEAR(CURRENT_DATE())
- "last week": WHERE date_created >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
- "today": WHERE DATE(date_created) = CURRENT_DATE()

**STRICT VALIDATION RULES:**
1. Every table name in your query MUST exactly match a table name from the data model above
2. Every column name must exist in the corresponding table from the data model above
3. DO NOT invent or guess table names or column names
4. If you cannot find the required tables/columns in the data model, state that clearly
5. Always use table aliases for JOINs to make queries readable
6. Verify that foreign key relationships exist before using them in JOINs
7. For date/time filtering, ONLY use the date fields listed in the "DATE FIELDS" section above
8. DO NOT hallucinate date fields like "date_enrolled", "date_registered", "visit_date", etc.
9. If a question refers to a concept like "enrollment" or "registration", use the appropriate date_created field

**QUERY CAPABILITIES:**
- SELECT with JOINs, GROUP BY, HAVING, ORDER BY
- Subqueries and CTEs (Common Table Expressions)
- Aggregate functions (COUNT, SUM, AVG, MAX, MIN)
- Date/time functions and calculations using ONLY the date fields listed above
- String operations and pattern matching

**Question:** {question}

**Thought Process:**
1. Identify required data from the question (demographic vs clinical)
2. Use person table for demographic data
3. Use patient table for clinical data
4. Use appropriate JOINs to link tables based on person_id = patient_id
5. Write the complete SQL query using only data model tables/columns

7 Likes

Thanks @miirochristopher

1 Like

Are these serving as evals?

This is great and interesting stuff. I am curious to know more about how your validation process look like? Do you have a test suite of queries with known correct results? When dealing with counts, do you cross-check with similar queries but of line-lists in nature?

1 Like

Join us tonight at AI - Continued Discussions

1 Like

We do have functions that are the equivalent of the Evaluator API / Interface @dkayiwa provided by Spring AI; that evaluate the correctness of the final answers. For this case/function, however, “So we have functions to flag the generated queries correct or incorrect based on the facts as described in the data model”, the functions very specifically lookup wrong column/table names or combinations in the queries and adjust the output query to the specs of the data model before it is run against the database. The actual evals validate the overall responses. We kind of look to standardize things though for the Java implementation.

1 Like

@veronica for a more generalized response, we do have tests that have expected results so we would know when the LLMs hallucinate; for instance;

	@Test
	public void chat_should_generate_valid_response() {

		UserMessage userMessage = UserMessage.from("What is the name of the process by which the body breaks down food?");
		ChatResponse response = model.chat(userMessage);
		
		AiMessage aiMessage = response.aiMessage();
		assertThat(aiMessage.text())
				.contains("digestion");
		assertThat(aiMessage.toolExecutionRequests()).isEmpty();

		ChatResponseMetadata metadata = response.metadata();
		assertThat(metadata.modelName())
				.isEqualTo(MODEL_NAME);
		
		TokenUsage tokenUsage = metadata.tokenUsage();
		assertThat(tokenUsage.inputTokenCount()).isPositive();
		assertThat(tokenUsage.outputTokenCount()).isPositive();
		assertThat(tokenUsage.totalTokenCount())
				.isEqualTo(tokenUsage.inputTokenCount() + tokenUsage.outputTokenCount());
		assertThat(metadata.finishReason())
				.isEqualTo(FinishReason.STOP);
	}

The other thing that we have done as well is run a database with a dataset that we are aware of/very familiar with; We always run the generated SQL against the database manually and verify we get the same exact response(s)…

We’ve also provided an interface to run custom queries or even the generated sql (Obviously we RESTRICT Data Manipulation Language commands/statements. INSERT, UPDATE, and DELETE).

2 Likes

Emr4all-ai-arch.pdf (59.6 KB)

Here is the proposed Architecture, kindly feel free to suggest changes and/or ideas!…The goal is to have multiple Agents handling different tasks and communicating with each other via ACP; and at the same time have our MCP severs set up for providing OpenMRS context.

1 Like

What do you think about this, in the Thoughtworks Technology Radar which has just come out today? https://www.thoughtworks.com/radar/techniques/text-to-sql

1 Like