The first 6 links talked only about jsoup. It has a very good documentation and after playing with it for a while i think it will be good for my needs. But i would like to know if the general community has a recommendation as to what to use.
Also i would like to know if i would be able to parse these html files just by using the htmlform entry module since we will be using it for the creation of html forms to serve as report templates. I would like to know if it can parse general html files not created with the module.
@darius thanks for your quick reply. I guess i will go on with the jsoup library then since only xml processing that HTML Form Entry module does will not be sufficient for us as the report templates are in html 5.
HTML is a subset of XML, if it’s well formed and the structure is known you can use any XML parser (JAXB, XStream, Dom4J …) easily. I’ve heard that JSoup is used mostly for web scraping.