str.encode in Python

Python Methods and Functions

Encodes a string to bytes / bytestrings using a registered codec.

str.encode ([encoding [, errors]])

-> bytes in Python 3 / str in Python 2

encoding - Encoding name. The default is the system encoding available from sys.getdefaultencoding ().

+ py2.3 errors = & # x27; strict & # x27; - The name of the error handling scheme. The default is & # x27; strict & # x27;. 


Since + py2.7, parameters can be specified using named arguments.

Note

The names of the available encodings are best learned from codecs module documentation .


  from sys import getdefaultencoding 

# Python 3 ================= ===

getdefaultencoding () # utf-8

my_string = `cat cat`

type (my_string) # str

my_string.encode ( )
# b`xd0xbaxd0xbexd1x82 cat`

my_string.encode (`ascii`)
# UnicodeDecodeError

my_string.encode (` ascii`, errors = `ignore`)
# b` cat`

my_string.encode (`ascii`, errors =` replace`)
# b` ?? ? cat`

my_string.encode (`ascii`, errors =` xmlcharrefreplace`)
# b` & # 1082; & # 1086; & # 1090; cat`

my_string.encode (`ascii`, errors =` backslashreplace`)
# b` u043a u043e u0442 cat`

my_string.encode (`ascii`, errors = `namereplace`)
# b` N {CYRILLIC SMALL LETTER KA} N {CYRILLIC SMALL LETTER O} N {CYRILLIC SMALL LETTER TE} cat`

surrogated =` udcd0udcbaudcd0udcbeudcd1udc82 cat`

surrogated.encode ()
# UnicodeEncodeError

surrogated.encode (errors = `surrogateescape` )
# b`xd0xbaxd0xbexd1x82 cat`

surrogated.encode (errors = `surrogatepass`)
# b`xedxb3x90xedxb2xbaxedxb3x90xedxb2xbexedxb3x91xedxb2x82 cat`


# Python 2 =====================

getdefaultencoding () # ascii


my_string = `cat cat`.decode (` utf-8`)

type (my_string) # unicode

my_string.encode ()
# UnicodeDecodeError

my_string.encode (errors = `ignore`)
# u` cat`

my_string.encode (errors =` replace` )
# u` ??? cat`

my_string.encode (` utf-8`)
#` xd0xbaxd0xbexd1x82 cat`

my_string.encode (errors = `xmlcharrefreplace`)
#` & # 1082 ; & # 1086; & # 1090; cat`

my_string.encode (errors = `backslashreplace`)
#` u043a u043e u0442 cat`

Error handling schemes

Schemes differ from each other in what will happen when unsupported by encoding characters are found in the string being decoded.

 
Name What`s going on
strict UnicodeError (or successor) is raised.
ignore Characters are skipped.
replace Characters are replaced with U + FFFD < / code> ( REPLACEMENT CHARACTER ).
+ py2.3 xmlcharrefreplace Characters are replaced to their corresponding XML representation.
+ py2.3 backslashreplace Characters are replaced with sequences starting with a backslash (forward slash).
+py3.5namereplace Characters are replaced with sequences like N {...} .
+ py3.1 surrogateescape Replaces each byte with a surrogate code ( U + DC80 to U+DCFF).
+py3.1surrogatepass Ignores surrogate codes in the line ... Used with the following codecs: utf-8, and starting with + py3.4 utf-16, utf-32, utf-16-be, utf-16-le, utf-32-be, utf-32-le. < / td>


You can register a new schema using codecs.register_error ()





Get Solution for free from DataCamp guru