Almost everyone needs some sort of a regular system performance monitoring, no matter how small their organization. System performance can take a variety of forms, each one designed to address different issues with system and server. Regardless of the implementation size, the demand for server, network, and infrastructure monitoring using the finest technology cannot be overlooked.
I’ve been struggling with unplanned sluggish system performance, discovering system bottlenecks, and system failures for quite some time. Although I am not an Infrastructure specialist, I have learned a few things about system performance and infrastructure. With the correct monitoring tools, you can detect performance issues, gain additional insight into what’s going on in that server, and uncover reasons. I’ve always found it inconvenient to read and browse through server/application logs. I could only hope to export to an external platform with advanced filtering and analytical capabilities. Let’s not worry about logs now, instead look into key metrics to measure;
- OS metrics - CPU utilization and memory usage
- JVM metrics - JVM memory, Garbage collector(collection count & collection time)
- Tomcat metrics - Request throughput and Latency, Thread pool and executors, Errors(error count)
- Database connection pool metrics
I’m seeking for a solution that is adaptable, low-maintenance, modern technology, and simple monitoring tools, integrable with an alert/notification manager. Prometheus + Grafana
is the popular for this case. However, I am aware that various implementations use various setups for application and infrastructure monitoring. Please share your monitoring experience, particularly as it relates to an OpenMRS instance. I’ve seen slns in the community i.e. emr-monitor and usage statistics module
FWIK OpenMRS doesn’t provide out of the box monitoring tools. I intend to work on a solution to optionally(configuration-based setting) expose JVM-based metrics i.e Metrics on classloaders, memory, garbage collection, threads, etc. and ability to check application health. I envision with a configuration ENABLE_METRICS: true
to expose JVM metrics through an endpoint /metrics
. Then visualize using Grafana.
First approach;
Using prometheus JMX Exporter: A collector that can configurable scrape and expose mBeans of a JMX target. Since OpenMRS already ship an openmrs-core docker image with tomcat as the base, the setup should be straight forward as adding the java agent jar
file to container and the configuration:
CATALINA_OPTS="$CATALINA_OPTS -javaagent:/path/to/<metics>.jar"
Second approach;
Targetting tomcat: This requires building a custom java client agent.
Basically, both approaches exposes data to be scraped by prometheus, stored as time series then visualize using grafana. This is not new, I would like hear more how other implementations have achieved system monitoring and reliable alert system.
Thoughts?