Initializer not parsing lines containing newlines correctly

Tags: #<Tag:0x00007f01b93e8de0>

Module: Initializer

Module Version: My fork, rebased off the lastest master (1f55548).

Issue or Question: When I run Initializer (Iniz) with a CSV containing newlines, it freaks out:

example.csv (\M here means carriage return)

uuid,Void/Retire,text,some bool
41a3e98e-9ca1-acdd-0c5b-db6fc785f706,FALSE,"single line",FALSE
dfcdca5e-887f-330c-9b03-7717072a558d,FALSE,"Some text\M
that runs across multiple\M
\M
lines",FALSE

Output:

ERROR - CsvParser.saveAll(205) |2019-03-07 10:36:53,068| An OpenMRS object could not be constructed or saved from the following CSV line: [that runs across multiple]                                                                    
java.lang.IllegalArgumentException: 'that runs across multiple' did not pass the soft check for being a valid OpenMRS UUID.                                                                                                              
        at org.openmrs.module.initializer.api.BaseLineProcessor.getUuid(BaseLineProcessor.java:75)                                                                                                                                                                                                         
        at org.openmrs.module.initializer.api.BaseLineProcessor.getUuid(BaseLineProcessor.java:85)                                                                                                                                                                                                         
        at org.openmrs.module.initializer.api.obs.ObsLineProcessor.bootstrap(ObsLineProcessor.java:49)                                                                                                                                                                                                     
        at org.openmrs.module.initializer.api.obs.ObsLineProcessor.bootstrap(ObsLineProcessor.java:25)                                                                                                                                                                                                     
        at org.openmrs.module.initializer.api.CsvParser.createInstance(CsvParser.java:90)                                                                                                                                                                                                                  
        at org.openmrs.module.initializer.api.CsvParser.saveAll(CsvParser.java:186)                                                                                                                                                                                                                        
        at org.openmrs.module.initializer.api.ConfigDirUtil.loadCsvFiles(ConfigDirUtil.java:428)                                                                                                                                                                                                           
        at org.openmrs.module.initializer.api.impl.InitializerServiceImpl.loadObservations(InitializerServiceImpl.java:216)    
...
ERROR - CsvParser.saveAll(205) |2019-03-07 10:36:53,069| An OpenMRS object could not be constructed or saved from the following CSV line: [null]                         
java.lang.ArrayIndexOutOfBoundsException: 1                                                                                    
        at org.openmrs.module.initializer.api.BaseLineProcessor.getVoidOrRetire(BaseLineProcessor.java:91)          
        at org.openmrs.module.initializer.api.BaseLineProcessor.getVoidOrRetire(BaseLineProcessor.java:99)
        at org.openmrs.module.initializer.api.obs.ObsLineProcessor.bootstrap(ObsLineProcessor.java:59)              
        at org.openmrs.module.initializer.api.obs.ObsLineProcessor.bootstrap(ObsLineProcessor.java:25)                          
        at org.openmrs.module.initializer.api.CsvParser.createInstance(CsvParser.java:90)                           
        at org.openmrs.module.initializer.api.CsvParser.saveAll(CsvParser.java:186)                                             
        at org.openmrs.module.initializer.api.ConfigDirUtil.loadCsvFiles(ConfigDirUtil.java:428)                    
        at org.openmrs.module.initializer.api.impl.InitializerServiceImpl.loadObservations(InitializerServiceImpl.java:216)
...
ERROR - CsvParser.saveAll(205) |2019-03-07 10:36:53,068| An OpenMRS object could not be constructed or saved from the following CSV line: [lines,FALSE]                                                                    
java.lang.IllegalArgumentException: 'lines' did not pass the soft check for being a valid OpenMRS UUID.                                                                                                              
        at org.openmrs.module.initializer.api.BaseLineProcessor.getUuid(BaseLineProcessor.java:75)         
...

Unfortunately and bizarrely, I haven’t been able to reproduce this in a unit test. When I add newlines to a quoted field in a test CSV, I get no such error; the field parses as it should. This is true whether I add carriage returns or not.

@mksd @mksrom I’m going to continue investigating, but do either of you have any intuitions about what might be going on?

Thanks!

Note that I first ran into this error (and was unable to reproduce it) on 6f5032b, prior to the OpenCSV upgrade to 4.5. I had hoped that the upgrade would fix exactly this problem.

Isn’t the carriage return ^M? That’s the DOS one right? The weirdest thing is the discrepancy between runtime and tests. Is it really for the same CSV file??

I’m actually surprised it works at all. I would ensure that they get Unix’d before being fed to OpenCSV, ie turned into \n. If that’s an option of course.

Alternatively I’d try to narrow it down to a clear cut bug in OpenCSV (outside of Iniz.)

Yeah, I piped a line of the real data onto the test file and got the same result (test passed, no problem).

When I started debugging Iniz in runtime IntelliJ informed me that the source code and bytecode disagree. A different version of Iniz is running in runtime vs in the tests. I’m still wrestling with Maven to get it to point my distro at the correct Iniz package.

You should validate that your test works with Maven, that’s your source of truth. I don’t know what the test class name is but would it be ‘FooTest’ you’d do:

mvn test -Dtest=FooTest -pl api/

Keep me posted. The key point is to figure out whether the bug is in OpenCSV or in Iniz.

Running that command, the test passes.

Ok so you do have a case where the same code is fine within tests but bumps into an issue at runtime…

Could you narrow it down to a bug in OpenCSV?

I got it straightened out. OpenCSV by default doesn’t parse CSVs according to spec – it treats backslash as an escape character. At some point some doctor ended a consult note with a backslash, which the CSV writer, per spec, wrote un-escaped, but which CSVReader read as escaping the field-closing double-quote.

I’ve updated Iniz to use the correctly functioning parser, RFC4180Parser.