I"m parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn"t automatically decode for me:
>>> from BeautifulSoup import BeautifulSoup >>> soup = BeautifulSoup("<p>£682m</p>") >>> text = soup.find("p").string >>> print text £682m
How can I decode the HTML entities in
text to get
"¬£682m" instead of
import html print(html.unescape("£682m"))
html.parser.HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.
You can use
HTMLParser.unescape() from the standard library:
>>> try: ... # Python 2.6-2.7 ... from HTMLParser import HTMLParser ... except ImportError: ... # Python 3 ... from html.parser import HTMLParser ... >>> h = HTMLParser() >>> print(h.unescape("£682m")) ¬£682m
You can also use the
six compatibility library to simplify the import:
>>> from six.moves.html_parser import HTMLParser >>> h = HTMLParser() >>> print(h.unescape("£682m")) ¬£682m
Computer languages have so far been of the ‘interpreted’ or the ‘compiled’ type. Compiled languages (like ‘C’) have been more common. You prepare a program, save it (the debugged version),...
The role of adaptation, learning and optimization are becoming increasingly essen- tial and intertwined. The capability of a system to adapt either through modification of its physiological structure ...
Learning Correct Cryptography by Example. The interconnected world of the current era has drastically changed everything, including banking, entertainment, and even statecraft. Despite difference...
Coding for Kids: Python - Learn to Code with 50 Awesome Games and Activities. Learning to code isn't as difficult as it sounds, you just have to get started! Coding for Kids: Python gets kids start...