Convert HTML to Plain Text in Swift
I'm working on a simple RSS Reader app as a beginner project in Xcode. I currently have it set up that it parses the feed, and places the title, pub date, description and content and displays it in a WebView.
I recently decided to show the description (or a truncated version of the content) in the TableView used to select a post. However, when doing so:
cell.textLabel?.text = item.title?.uppercaseString
cell.detailTextLabel?.text = item.itemDescription //.itemDescription is a String
It shows the raw HTML of the post.
I would like to know how to convert the HTML into plain text for just the TableView's detailed UILabel.
Thanks!
You can add this extension to convert your html code to a regular string:
edit/update:
Discussion The HTML importer should not be called from a background thread (that is, the options dictionary includes documentType with a value of html). It will try to synchronize with the main thread, fail, and time out. Calling it from the main thread works (but can still time out if the HTML contains references to external resources, which should be avoided at all costs). The HTML import mechanism is meant for implementing something like markdown (that is, text styles, colors, and so on), not for general HTML import.
Xcode 11.4 • Swift 5.2
extension Data {
var html2AttributedString: NSAttributedString? {
do {
return try NSAttributedString(data: self, options: [.documentType: NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("error:", error)
return nil
}
}
var html2String: String { html2AttributedString?.string ?? "" }
}
extension StringProtocol {
var html2AttributedString: NSAttributedString? {
Data(utf8).html2AttributedString
}
var html2String: String {
html2AttributedString?.string ?? ""
}
}
cell.detailTextLabel?.text = item.itemDescription.html2String
Swift 4, Xcode 9
extension String {
var utfData: Data {
return Data(utf8)
}
var attributedHtmlString: NSAttributedString? {
do {
return try NSAttributedString(data: utfData, options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
],
documentAttributes: nil)
} catch {
print("Error:", error)
return nil
}
}
}
extension UILabel {
func setAttributedHtmlText(_ html: String) {
if let attributedText = html.attributedHtmlString {
self.attributedText = attributedText
}
}
}
Here is my suggested answer. Instead of extension, if you want to put inside function.
func decodeString(encodedString:String) -> NSAttributedString?
{
let encodedData = encodedString.dataUsingEncoding(NSUTF8StringEncoding)!
do {
return try NSAttributedString(data: encodedData, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil)
} catch let error as NSError {
print(error.localizedDescription)
return nil
}
}
And call that function and cast NSAttributedString to String
let attributedString = self.decodeString(encodedString)
let message = attributedString.string