The promise of OpenMRS ETL: Advanced Reporting/Analytics for OpenMRS

Just to follow up from Slack: I am curious how distributed deployments of OpenMRS work around issues like synchronization. Since I could not find a clear answer in talk/wiki, I posted this thread.

This is relevant to the topic of ETL/ELT/Analytics because if what we care about is mostly single machine deployments, then processing incremental updates to the OpenMRS database (i.e., a streaming ETL/ELT solution) can potentially work on the same machine or another [single] machine working off of a slave copy of the DB. I always assumed that the ETL solution we are talking about here needs to be scalable to many nodes but am now questioning whether this is the right assumption.

@burke when you talk about the scalability of the ETL solution, where is a good place to get some stats on the data scale we are aiming for? Where are some places that have a distributed deployment of OpenMRS on a large number of nodes?