As part of the OpenHIE initiative we are trying to create a Shared Health Record out of OpenMRS. Basically what we need to do is break down CDA documents and create obs out of them. This becomes very obs insert intensive. Currently 50% of documents insert in <800ms but our 90th percentile is around 3s and the longest request is around 6s. We are trying to improve our performance, preferably to around the 200 - 500ms range.
We seems to be database constrained from the profile that we have done so far. Here are the queries taking the most time with our performance improvements applied:
We seem to be severely insert constrained. We have tried few things so far to get to this point, these include the following in order of biggest performance gain:
Set GP search.caseSensitiveDatabaseStringComparison to false
We have also tried a few thing that had no effect for us:
Try get rid of the CIEL concepts we don’t need and see if that makes a difference - no difference
Try MariaDB - no difference
Try MariaDB using the TokuDB engine - no difference
Our next step is to try make sure each document insert happens in one big transaction, but after that we are out of ideas. We aren’t SQL performance tuning experts so we wanted to ask if anyone has any other ideas for what we can try next?
A couple of ideas. Is the cardinality of the obs table very high? At some point (even with state of the art databases) you must partition very big tables to get decent performance. I don’t know if MySQL supports partitioned tables though.
Are you loading thousands of obs? Try to drop the indexes in obs table, perform the load and recreate at the end. Depending on the number of obs before and after the load this can be slower.
Using a big transaction won’t make it faster because the operations have to be stored in the rollback segment. But smaller ones (load the obs in chunks) will probably help. The drawback is that if one fails you lose the whole chunk.
Out of curiosity are you using the latest version of MySQL? (PIH got
absolutely massive speedup when upgrading Mirebalais from MySQL 5.5 to 5.6,
though that was about many-outer-joined reporting queries.)
@rcrichton, For insert intensive work, disabling indexes might be a good idea. InnoDB is infamous for the way in which it will index for every insert, which is why bulk import, using transactions or LOAD INFILE techniques will work much faster. If you can wait a bit and write to file and then do LOAD DATA INFILE, you will be able to get better insert performance… But if writing to file is as slow and you want live/real-time data, then this doesn’t help.
On the hardware front, there might be a surprise hidden in SSD drives. Although you might think that it will improve performance, cheaper SSD controllers are bad with database heavy write performance
Thanks for the tips all. Just to note that what we are trying to do is achieve good insert performance for a transactional system so I don’t think we will be able to disable indexes. It isn’t a one time import but load that the system should expect on a day-to-day basis.
Partitioning tables is an interesting idea that we could try, I don’t have much experience with that.
I did my benchmarking on mysql 5.5.43 so perhaps I should try again with 5.6. Sounds like there may be some good improvements to be had.
Thanks for the advice, if anyone has anything more keep the comment flowing!