GSoC 2025: Performance Testing Enhancements – Current Progress, Future Scope & Community Discussion

jayg · May 15, 2025, 7:40pm

Hello Everyone,

I’m excited to share that I’ve been selected for GSoC 2025 to work on Performance Testing Enhancements. I’m truly grateful for this opportunity and would like to thank my mentors – @Jayasanka (Primary Mentor) and @bawanthathilan (Secondary Mentor)– for their support and guidance so far.

The goal of this discussion thread is not only to provide regular project updates but also to open up a space where the community can share suggestions, improvements, or ideas that could help shape the future direction of the project. While I aim to fulfill the proposed objectives, I’m also eager to go beyond the initial scope where possible, with input from both the mentors and the broader community.

Current Scope of the Project:

Restructuring parts of the existing codebase to improve maintainability.
Implementing logic to prevent duplication of test data.
Introducing new personas to simulate a wider range of user behaviors.
Expanding scenario coverage to ensure comprehensive testing across the codebase.
Improving the configurability and ease of adjusting performance test loads.
Deploying the enhanced testing suite on AWS for better scalability and integration.

Potential Future Enhancements:

Integrating tools like Prometheus and Grafana to monitor system metrics such as CPU and memory usage during performance tests.

Resources:

cc: @beryl @grace

jayg · June 5, 2025, 6:50pm

Discussion: Addressing Data Duplication in OpenMRS Platform Testing

This discussion outlines the challenges and proposed solutions related to data duplication issues during performance testing of the OpenMRS platform. The root cause stems from a limited number of patient records, which leads to duplicate data errors—especially when users are scaled and patient data is reused cyclically across test runs.

Finalized Solution: Endpoint-Specific Handling

After evaluating multiple approaches, the team has decided to proceed with handling each problematic endpoint individually. This strategy allows for customized solutions tailored to the behavior and constraints of each API endpoint, rather than implementing a one-size-fits-all approach.

Example: Allergy Addition Endpoint

One known issue arises with the Allergy Addition endpoint, which fails due to duplicate entries. Two options were considered and are currently used as needed:

Option 1: Delete the allergy immediately after adding it, to avoid conflicts in subsequent runs.
Option 2: Use a customized allergy entry that is unique per test execution, reducing the chance of duplication.

This method gives us granular control and allows us to tune the behavior per endpoint while maintaining test reliability.

Other Considered Approaches (Not Chosen)

1. Dynamic Patient Creation

In this approach, patients would be dynamically registered during the test run using a dedicated setup phase. This helps:

Stress-test the system under heavy registration load.
Reduce reuse of the same patient records.

Drawback: While effective for load generation, this method does not reflect realistic production-like patient data. Most generated patients are nearly empty, limiting the depth of test coverage.

2. Pre-generated Demo Patients

This involved generating realistic demo patients ahead of time and loading them via a database backup during test setup. Methods included:

Manual generation using shell scripts.
Automatic regeneration via GitHub Actions.

Drawback: Although this approach provides rich and realistic data, it is time-consuming to generate and maintain. The regeneration process can take up to 4 hours, making it unsuitable for frequent iteration.

Summary

Solution	Status	Pros	Cons
Endpoint-Specific Handling	Chosen	Targeted fixes; flexible; low overhead	Requires ongoing maintenance per endpoint
Dynamic Patient Creation	Not chosen	Reduces duplication; supports load testing	Unrealistic data; lacks associations
Pre-generated Demo Patients	Not chosen	Realistic and comprehensive data	Time-consuming setup; slow to regenerate

jayg · June 11, 2025, 2:53pm

Issue Summary:

We are currently facing a problem where the same patient_uuids.csv file is being used as a feeder across multiple scenarios. This leads to conflicts, especially when the same UUID is simultaneously used to initiate visits or perform other patient-related activities, resulting in inconsistent behavior and data collisions.

Proposed Solutions:

Global Feeder (Preferred Solution): Implement a single, centralized feeder using a thread-safe BlockingQueue. This approach ensures that UUIDs are consumed safely and concurrently across multiple scenarios, preventing duplicate usage. A prototype implementation of this global feeder is available in the associated PR.
Separate CSV Files per Scenario: Maintain individual copies of the patient_uuids.csv file for each scenario, either manually or through automated generation. However, this approach is less scalable, especially if managed manually, and increases maintenance overhead.

CC: @jayasanka @bawanthathilan

bawanthathilan · June 14, 2025, 6:13am

Thanks for the update. I agree that the global feeder with a thread-safe BlockingQueue is the better solution. It helps avoid UUID conflicts and is easier to manage.