UnicodeEncodeError: “charmap” codec can”t encode characters


I"m trying to scrape a website, but it gives me an error.

I"m using the following code:

import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen("https://www.website.com/")
html = get.read()

soup = BeautifulSoup(html)


And I"m getting the following error:

File "C:Python34libencodingscp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: "charmap" codec can"t encode characters in position 70924-70950: character maps to <undefined>

What can I do to fix this?

Answer rating: 446

I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:

with this:

import io
with io.open(fname, "w", encoding="utf-8") as f:

Using io gives you backward compatibility with Python 2.

If you only need to support Python 3 you can use the builtin open function instead:

with open(fname, "w", encoding="utf-8") as f:

If your file is encoded in something other than UTF-8, specify whatever your actual encoding is for encoding.

Answer rating: 218

I fixed it by adding .encode("utf-8") to soup.

That means that print(soup) becomes print(soup.encode("utf-8")).

Answer rating: 67

In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)

Replacing this line:

with open("filename", "w") as f:

With this:

with open("filename", "w", encoding="utf-8") as f:

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.

