Soundex search in LuceneQueries

fruether · March 9, 2020, 9:32pm

I got a new question regarding the business logic which I would like to validate before I pull the code into the PR. The reason for this is that it does not seem to be completely covered by PersonServiceTest.

The query that is executed in the getSimilarName query for two names n1 and n2 is part of the second if statement of HibernatePersonDAO.java and looks like the following:

     case  when pname.givenName is null then 1  
        when pname.givenName = '' then 1  
        when soundex(pname.givenName) = soundex(:n1) then 4
        when soundex(pname.givenName) = soundex(:n2) then 3  
        else 0  end
  +  
  case  when pname.middleName is null then 1
        when pname.middleName = '' then 1
        when soundex(pname.middleName) = soundex(:n1) then 3
        when soundex(pname.middleName) = soundex(:n2) then 4  else 0  end
  +  
  case  when pname.familyName is null then 1
        when pname.familyName = '' then 1
        when soundex(pname.familyName) = soundex(:n1) then 3
        when soundex(pname.familyName) = soundex(:n2) then 4  else 0  end
  + 
  case  when pname.familyName2 is null then 1
        when pname.familyName2 = '' then 1
        when soundex(pname.familyName2) = soundex(:n1) then 3
        when soundex(pname.familyName2) = soundex(:n2) then 4  else 0  end")
    .) 
  > 6");

I had a look to the logic and concluded that the following rules are implemented:

People who have a matching name n1 in at least 3 name elements namely middleName, familyName, familyName2 OR givenNamen matches n1 and one other name element.
People who have a matching name n2 in at least 2 names elements namely givenNamen, middleName, familyName, familyName2
People who have at least one name matching with n1 and one name matching with n2
People, who have only givenName set and it is matching n1
People who have one name element, that is not not givenName, matching n2 and the other names elements are empty

*name element: Either givenName, familyName, familyName2 or middleName

I have the following two points I would like to validate/discuss before creating the pull request with new tests and according changes:

Does someone see more or other business logic implemented in the query compared to my list?
I am not sure if 1. and 2. are really valid in terms of business logic. Sounds a bit odd.