Stripping HTML tags in Java [duplicate]
Use JSoup, it's well documented, available on Maven and after a day of spending time with several libraries, for me, it is the best one i can imagine.. My own opinion is, that a job like that, parsing html into plain-text, should be possible in one line of code -> otherwise the library has failed somehow... just saying ^^ So here it is, the one-liner of JSoup - in Markdown4J, something like that is not possible, in Markdownj too, in htmlCleaner this is pain in the ass with somewhat about 50 lines of code...
String plain = new HtmlToPlainText().getPlainText(Jsoup.parse(html));
And what you got is real plain-text (not just the html-source-code as a String, like in other libs lol) -> he really does a great job on that. It is more or less the same quality as Markdownify for PHP....
This is what I found on google on it. For me it worked fine.
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");