Lucene patient search is there!

Hi,

Happy New Year everyone!

I wanted to share the news that TRUNK-425 i.e. Lucene patient search is done and will be available in OpenMRS Platform 2.1.0! Thanks to Aravind Madishetty and Udayakumar Rayal for your initial attempts! I and @adamg (working for @SolDevelo) paired recently to finally make it work!

I’ve run a basic performance test https://github.com/openmrs/openmrs-core/commit/3a8d71767ace844f4db598856ffa9e2f532836b1

It runs a few queries on a DB with 20,000 patients. Lucene compared to H2 SQL query speeds up the patient search even by a few times depending on a query. I haven’t got a chance to test on MySQL yet since it’s cumbersome to execute our tests on other DBs.

Note that requesting all results can be similarly slow on both Lucene and SQL, if we have a high number of matches as most of the time is spent on fetching matched objects from DB (see output for: “Al”, “London”, “uric”). The power of Lucene is visible after limiting the number of results (paging).

We tried to make the search behave the same way as the old one, but it is possible we missed something, which wasn’t covered by tests. We appreciate further testing by implementations!

In addition we can finally add efficient proximity search for misspelled names! I’ll create a separate issue for that (if it doesn’t exist yet).

Test output on my machine for those who are interested:

SQL search:

Starts with search for 'Al' name returned 576 in 1399 ms
Starts with search for 'Al' name limited to 15 results returned in 1418 ms
Starts with search for 'Al Dem' name returned 2 in 1295 ms
Starts with search for 'Al Dem' name limited to 15 results returned in 929 ms
Starts with search for 'Jack' name returned 43 in 1208 ms
Starts with search for 'Jack' name limited to 15 results returned in 1187 ms
Starts with search for 'Jack Sehgal' name returned 1 in 1255 ms
Starts with search for 'Jack Sehgal' name limited to 15 results returned in 943 ms
Anywhere search for 'aso' name returned 45 in 619 ms
Anywhere search for 'aso' name limited to 15 results returned in 481 ms
Anywhere search for 'aso os' name returned 3 in 752 ms
Anywhere search for 'aso os' limited to 15 results returned in 791 ms
Exact search for '9243' identifier returned 1 in 804 ms
Exact search for 'London' attribute returned 1000 in 1319 ms
Exact search for 'London' attribute limited to 15 results returned in 804 ms
Anywhere search for 'uric' attribute returned 1000 in 1424 ms
Anywhere search for 'uric' attribute limited to 15 results returned in 1076 ms

Lucene search:

Starts with search for 'Al' name returned 576 in 1302 ms
Starts with search for 'Al' name limited to 15 results returned in 239 ms
Starts with search for 'Al Dem' name returned 2 in 390 ms
Starts with search for 'Al Dem' name limited to 15 results returned in 258 ms
Starts with search for 'Jack' name returned 43 in 348 ms
Starts with search for 'Jack' name limited to 15 results returned in 98 ms
Starts with search for 'Jack Sehgal' name returned 1 in 115 ms
Starts with search for 'Jack Sehgal' name limited to 15 results returned in 100 ms
Anywhere search for 'aso' name returned 45 in 193 ms
Anywhere search for 'aso' name limited to 15 results returned in 87 ms
Anywhere search for 'aso os' name returned 3 in 134 ms
Anywhere search for 'aso os' limited to 15 results returned in 112 ms
Exact search for '9243' identifier returned 1 in 115 ms
Exact search for 'London' attribute returned 1000 in 1413 ms
Exact search for 'London' attribute limited to 15 results returned in 417 ms
Anywhere search for 'uric' attribute returned 1000 in 1430 ms
Anywhere search for 'uric' attribute limited to 15 results returned in 397 ms
5 Likes

Awesome. This is good news. We will spike this out and comment.

By the way, what is the impact on memory (RAM)?

What would be the best way to measure Lucene’s RAM impact? The index size on disk for 20,000 test patients is 11 MB.