Tokenizes an HTML string, extracting plain text while ignoring HTML tags
Extracts plain text from Markdown strings