UnicodeDecodeError: “ascii” codec can”t decode byte 0xe2 in position 13: ordinal not in range(128)

| | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I"m using NLTK to perform kmeans clustering on my text file in which each line is considered as a document. So for example, my text file is something like this:

belong finger death punch <br>
hasty <br>
mike hasty walls jericho <br>
jägermeister rules <br>
rules bands follow performing jägermeister stage <br>
approach 

Now the demo code I"m trying to run is this:

import sys

import numpy
from nltk.cluster import KMeansClusterer, GAAClusterer, euclidean_distance
import nltk.corpus
from nltk import decorators
import nltk.stem

stemmer_func = nltk.stem.EnglishStemmer().stem
stopwords = set(nltk.corpus.stopwords.words("english"))

@decorators.memoize
def normalize_word(word):
    return stemmer_func(word.lower())

def get_words(titles):
    words = set()
    for title in job_titles:
        for word in title.split():
            words.add(normalize_word(word))
    return list(words)

@decorators.memoize
def vectorspaced(title):
    title_components = [normalize_word(word) for word in title.split()]
    return numpy.array([
        word in title_components and not word in stopwords
        for word in words], numpy.short)

if __name__ == "__main__":

    filename = "example.txt"
    if len(sys.argv) == 2:
        filename = sys.argv[1]

    with open(filename) as title_file:

        job_titles = [line.strip() for line in title_file.readlines()]

        words = get_words(job_titles)

        # cluster = KMeansClusterer(5, euclidean_distance)
        cluster = GAAClusterer(5)
        cluster.cluster([vectorspaced(title) for title in job_titles if title])

        # NOTE: This is inefficient, cluster.classify should really just be
        # called when you are classifying previously unseen examples!
        classified_examples = [
                cluster.classify(vectorspaced(title)) for title in job_titles
            ]

        for cluster_id, title in sorted(zip(classified_examples, job_titles)):
            print cluster_id, title

(which can also be found here)

The error I receive is this:

Traceback (most recent call last):
File "cluster_example.py", line 40, in
words = get_words(job_titles)
File "cluster_example.py", line 20, in get_words
words.add(normalize_word(word))
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 183, in memoize
result = func(*args)
File "cluster_example.py", line 14, in normalize_word
return stemmer_func(word.lower())
File "/usr/local/lib/python2.7/dist-packages/nltk/stem/snowball.py", line 694, in stem
word = (word.replace(u"u2019", u"x27")
UnicodeDecodeError: "ascii" codec can"t decode byte 0xe2 in position 13: ordinal not in range(128)

What is happening here?

👻 Read also: what is the best laptop for engineering students?

UnicodeDecodeError: "ascii" codec can"t decode byte 0xe2 in position 13: ordinal not in range(128) mean: Questions

Meaning of @classmethod and @staticmethod for beginner?

5 answers

user1632861 By user1632861

Could someone explain to me the meaning of @classmethod and @staticmethod in python? I need to know the difference and the meaning.

As far as I understand, @classmethod tells a class that it"s a method which should be inherited into subclasses, or... something. However, what"s the point of that? Why not just define the class method without adding @classmethod or @staticmethod or any @ definitions?

tl;dr: when should I use them, why should I use them, and how should I use them?

1726

Answer #1

Though classmethod and staticmethod are quite similar, there"s a slight difference in usage for both entities: classmethod must have a reference to a class object as the first parameter, whereas staticmethod can have no parameters at all.

Example

class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        date1 = cls(day, month, year)
        return date1

    @staticmethod
    def is_date_valid(date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        return day <= 31 and month <= 12 and year <= 3999

date2 = Date.from_string("11-09-2012")
is_date = Date.is_date_valid("11-09-2012")

Explanation

Let"s assume an example of a class, dealing with date information (this will be our boilerplate):

class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year

This class obviously could be used to store information about certain dates (without timezone information; let"s assume all dates are presented in UTC).

Here we have __init__, a typical initializer of Python class instances, which receives arguments as a typical instancemethod, having the first non-optional argument (self) that holds a reference to a newly created instance.

Class Method

We have some tasks that can be nicely done using classmethods.

Let"s assume that we want to create a lot of Date class instances having date information coming from an outer source encoded as a string with format "dd-mm-yyyy". Suppose we have to do this in different places in the source code of our project.

So what we must do here is:

  1. Parse a string to receive day, month and year as three integer variables or a 3-item tuple consisting of that variable.
  2. Instantiate Date by passing those values to the initialization call.

This will look like:

day, month, year = map(int, string_date.split("-"))
date1 = Date(day, month, year)

For this purpose, C++ can implement such a feature with overloading, but Python lacks this overloading. Instead, we can use classmethod. Let"s create another "constructor".

    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        date1 = cls(day, month, year)
        return date1

date2 = Date.from_string("11-09-2012")

Let"s look more carefully at the above implementation, and review what advantages we have here:

  1. We"ve implemented date string parsing in one place and it"s reusable now.
  2. Encapsulation works fine here (if you think that you could implement string parsing as a single function elsewhere, this solution fits the OOP paradigm far better).
  3. cls is an object that holds the class itself, not an instance of the class. It"s pretty cool because if we inherit our Date class, all children will have from_string defined also.

Static method

What about staticmethod? It"s pretty similar to classmethod but doesn"t take any obligatory parameters (like a class method or instance method does).

Let"s look at the next use case.

We have a date string that we want to validate somehow. This task is also logically bound to the Date class we"ve used so far, but doesn"t require instantiation of it.

Here is where staticmethod can be useful. Let"s look at the next piece of code:

    @staticmethod
    def is_date_valid(date_as_string):
        day, month, year = map(int, date_as_string.split("-"))
        return day <= 31 and month <= 12 and year <= 3999

    # usage:
    is_date = Date.is_date_valid("11-09-2012")

So, as we can see from usage of staticmethod, we don"t have any access to what the class is---it"s basically just a function, called syntactically like a method, but without access to the object and its internals (fields and another methods), while classmethod does.

1726

Answer #2

Rostyslav Dzinko"s answer is very appropriate. I thought I could highlight one other reason you should choose @classmethod over @staticmethod when you are creating an additional constructor.

In the example above, Rostyslav used the @classmethod from_string as a Factory to create Date objects from otherwise unacceptable parameters. The same can be done with @staticmethod as is shown in the code below:

class Date:
  def __init__(self, month, day, year):
    self.month = month
    self.day   = day
    self.year  = year


  def display(self):
    return "{0}-{1}-{2}".format(self.month, self.day, self.year)


  @staticmethod
  def millenium(month, day):
    return Date(month, day, 2000)

new_year = Date(1, 1, 2013)               # Creates a new Date object
millenium_new_year = Date.millenium(1, 1) # also creates a Date object. 

# Proof:
new_year.display()           # "1-1-2013"
millenium_new_year.display() # "1-1-2000"

isinstance(new_year, Date) # True
isinstance(millenium_new_year, Date) # True

Thus both new_year and millenium_new_year are instances of the Date class.

But, if you observe closely, the Factory process is hard-coded to create Date objects no matter what. What this means is that even if the Date class is subclassed, the subclasses will still create plain Date objects (without any properties of the subclass). See that in the example below:

class DateTime(Date):
  def display(self):
      return "{0}-{1}-{2} - 00:00:00PM".format(self.month, self.day, self.year)


datetime1 = DateTime(10, 10, 1990)
datetime2 = DateTime.millenium(10, 10)

isinstance(datetime1, DateTime) # True
isinstance(datetime2, DateTime) # False

datetime1.display() # returns "10-10-1990 - 00:00:00PM"
datetime2.display() # returns "10-10-2000" because it"s not a DateTime object but a Date object. Check the implementation of the millenium method on the Date class for more details.

datetime2 is not an instance of DateTime? WTF? Well, that"s because of the @staticmethod decorator used.

In most cases, this is undesired. If what you want is a Factory method that is aware of the class that called it, then @classmethod is what you need.

Rewriting Date.millenium as (that"s the only part of the above code that changes):

@classmethod
def millenium(cls, month, day):
    return cls(month, day, 2000)

ensures that the class is not hard-coded but rather learnt. cls can be any subclass. The resulting object will rightly be an instance of cls.
Let"s test that out:

datetime1 = DateTime(10, 10, 1990)
datetime2 = DateTime.millenium(10, 10)

isinstance(datetime1, DateTime) # True
isinstance(datetime2, DateTime) # True


datetime1.display() # "10-10-1990 - 00:00:00PM"
datetime2.display() # "10-10-2000 - 00:00:00PM"

The reason is, as you know by now, that @classmethod was used instead of @staticmethod

1726

Answer #3

@classmethod means: when this method is called, we pass the class as the first argument instead of the instance of that class (as we normally do with methods). This means you can use the class and its properties inside that method rather than a particular instance.

@staticmethod means: when this method is called, we don"t pass an instance of the class to it (as we normally do with methods). This means you can put a function inside a class but you can"t access the instance of that class (this is useful when your method does not use the instance).

How can I open multiple files using "with open" in Python?

5 answers

I want to change a couple of files at one time, iff I can write to all of them. I"m wondering if I somehow can combine the multiple open calls with the with statement:

try:
  with open("a", "w") as a and open("b", "w") as b:
    do_something()
except IOError as e:
  print "Operation failed: %s" % e.strerror

If that"s not possible, what would an elegant solution to this problem look like?

788

Answer #1

As of Python 2.7 (or 3.1 respectively) you can write

with open("a", "w") as a, open("b", "w") as b:
    do_something()

In earlier versions of Python, you can sometimes use contextlib.nested() to nest context managers. This won"t work as expected for opening multiples files, though -- see the linked documentation for details.


In the rare case that you want to open a variable number of files all at the same time, you can use contextlib.ExitStack, starting from Python version 3.3:

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # Do something with "files"

Most of the time you have a variable set of files, you likely want to open them one after the other, though.

open() in Python does not create a file if it doesn"t exist

5 answers

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, file = open("myfile.dat", "rw") should do this, right?

It is not working for me (Python 2.6.2) and I"m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I"m on a Linux system... so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.

778

Answer #1

You should use open with the w+ mode:

file = open("myfile.dat", "w+")

Difference between modes a, a+, w, w+, and r+ in built-in open function?

5 answers

In the python built-in open function, what is the exact difference between the modes w, a, w+, a+, and r+?

In particular, the documentation implies that all of these will allow writing to the file, and says that it opens the files for "appending", "writing", and "updating" specifically, but does not define what these terms mean.

721

Answer #1

The opening modes are exactly the same as those for the C standard library function fopen().

The BSD fopen manpage defines them as follows:

 The argument mode points to a string beginning with one of the following
 sequences (Additional characters may follow these sequences.):

 ``r""   Open text file for reading.  The stream is positioned at the
         beginning of the file.

 ``r+""  Open for reading and writing.  The stream is positioned at the
         beginning of the file.

 ``w""   Truncate file to zero length or create text file for writing.
         The stream is positioned at the beginning of the file.

 ``w+""  Open for reading and writing.  The file is created if it does not
         exist, otherwise it is truncated.  The stream is positioned at
         the beginning of the file.

 ``a""   Open for writing.  The file is created if it does not exist.  The
         stream is positioned at the end of the file.  Subsequent writes
         to the file will always end up at the then current end of file,
         irrespective of any intervening fseek(3) or similar.

 ``a+""  Open for reading and writing.  The file is created if it does not
         exist.  The stream is positioned at the end of the file.  Subse-
         quent writes to the file will always end up at the then current
         end of file, irrespective of any intervening fseek(3) or similar.

We hope this article has helped you to resolve the problem. Apart from UnicodeDecodeError: “ascii” codec can”t decode byte 0xe2 in position 13: ordinal not in range(128), check other mean-related topics.

Want to excel in Python? See our review of the best Python online courses 2022. If you are interested in Data Science, check also how to learn programming in R.

By the way, this material is also available in other languages:



Marie Emmerson

Berlin | 2022-11-27

open is always a bit confusing 😭 UnicodeDecodeError: “ascii” codec can”t decode byte 0xe2 in position 13: ordinal not in range(128) is not the only problem I encountered. Will use it in my bachelor thesis

Davies Schteiner

Warsaw | 2022-11-27

Thanks for explaining! I was stuck with UnicodeDecodeError: “ascii” codec can”t decode byte 0xe2 in position 13: ordinal not in range(128) for some hours, finally got it done 🤗. I am just not quite sure it is the best method

Javier Williams

Prague | 2022-11-27

Maybe there are another answers? What UnicodeDecodeError: “ascii” codec can”t decode byte 0xe2 in position 13: ordinal not in range(128) exactly means?. Will use it in my bachelor thesis

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically