Writing Unicode text to a text file?

| | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I"m pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a Wordpress page).

It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source?

Currently I"m converting everything to Unicode on the way in, joining it all together in a Python string, then doing:

import codecs
f = codecs.open("out.txt", mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))

There is an encoding error on the last line:

UnicodeDecodeError: "ascii" codec can"t decode byte 0xa0 in position 12286: ordinal not in range(128)

Partial solution:

This Python runs without an error:

row = [unicode(x.strip()) if x is not None else u"" for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open("out.txt", "w")
f.write(all_html.encode("utf-8"))

But then if I open the actual text file, I see lots of symbols like:

Qur’an 

Maybe I need to write to something other than a text file?

👻 Read also: what is the best laptop for engineering students?

Writing Unicode text to a text file? join: Questions

Why is it string.join(list) instead of list.join(string)?

5 answers

Evan Fosmark By Evan Fosmark

This has always confused me. It seems like this would be nicer:

my_list = ["Hello", "world"]
print(my_list.join("-"))
# Produce: "Hello-world"

Than this:

my_list = ["Hello", "world"]
print("-".join(my_list))
# Produce: "Hello-world"

Is there a specific reason it is like this?

1906

Answer #1

It"s because any iterable can be joined (e.g, list, tuple, dict, set), but its contents and the "joiner" must be strings.

For example:

"_".join(["welcome", "to", "stack", "overflow"])
"_".join(("welcome", "to", "stack", "overflow"))
"welcome_to_stack_overflow"

Using something other than strings will raise the following error:

TypeError: sequence item 0: expected str instance, int found

1906

Answer #2

This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str methods including join) was released in Oct 2000.

  • There were four options proposed in this thread:
    • str.join(seq)
    • seq.join(str)
    • seq.reduce(str)
    • join as a built-in function
  • Guido wanted to support not only lists and tuples, but all sequences/iterables.
  • seq.reduce(str) is difficult for newcomers.
  • seq.join(str) introduces unexpected dependency from sequences to str/unicode.
  • join() as a built-in function would support only specific data types. So using a built-in namespace is not good. If join() supports many datatypes, creating an optimized implementation would be difficult, if implemented using the __add__ method then it would ve O(n¬≤).
  • The separator string (sep) should not be omitted. Explicit is better than implicit.

Here are some additional thoughts (my own, and my friend"s):

  • Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS2/4. To calculate total buffer length of UTF-8 strings it needs to know character coding rule.
  • At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn"t support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).

Guido"s decision is recorded in a historical mail, deciding on str.join(seq):

Funny, but it does seem right! Barry, go for it...
Guido van Rossum

1906

Answer #3

Because the join() method is in the string class, instead of the list class?

I agree it looks funny.

See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:

Historical note. When I first learned Python, I expected join to be a method of a list, which would take the delimiter as an argument. Lots of people feel the same way, and there’s a story behind the join method. Prior to Python 1.6, strings didn’t have all these useful methods. There was a separate string module which contained all the string functions; each function took a string as its first argument. The functions were deemed important enough to put onto the strings themselves, which made sense for functions like lower, upper, and split. But many hard-core Python programmers objected to the new join method, arguing that it should be a method of the list instead, or that it shouldn’t move at all but simply stay a part of the old string module (which still has lots of useful stuff in it). I use the new join method exclusively, but you will see code written either way, and if it really bothers you, you can use the old string.join function instead.

--- Mark Pilgrim, Dive into Python

How can I open multiple files using "with open" in Python?

5 answers

I want to change a couple of files at one time, iff I can write to all of them. I"m wondering if I somehow can combine the multiple open calls with the with statement:

try:
  with open("a", "w") as a and open("b", "w") as b:
    do_something()
except IOError as e:
  print "Operation failed: %s" % e.strerror

If that"s not possible, what would an elegant solution to this problem look like?

788

Answer #1

As of Python 2.7 (or 3.1 respectively) you can write

with open("a", "w") as a, open("b", "w") as b:
    do_something()

In earlier versions of Python, you can sometimes use contextlib.nested() to nest context managers. This won"t work as expected for opening multiples files, though -- see the linked documentation for details.


In the rare case that you want to open a variable number of files all at the same time, you can use contextlib.ExitStack, starting from Python version 3.3:

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # Do something with "files"

Most of the time you have a variable set of files, you likely want to open them one after the other, though.

open() in Python does not create a file if it doesn"t exist

5 answers

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, file = open("myfile.dat", "rw") should do this, right?

It is not working for me (Python 2.6.2) and I"m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I"m on a Linux system... so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.

778

Answer #1

You should use open with the w+ mode:

file = open("myfile.dat", "w+")

Difference between modes a, a+, w, w+, and r+ in built-in open function?

5 answers

In the python built-in open function, what is the exact difference between the modes w, a, w+, a+, and r+?

In particular, the documentation implies that all of these will allow writing to the file, and says that it opens the files for "appending", "writing", and "updating" specifically, but does not define what these terms mean.

721

Answer #1

The opening modes are exactly the same as those for the C standard library function fopen().

The BSD fopen manpage defines them as follows:

 The argument mode points to a string beginning with one of the following
 sequences (Additional characters may follow these sequences.):

 ``r""   Open text file for reading.  The stream is positioned at the
         beginning of the file.

 ``r+""  Open for reading and writing.  The stream is positioned at the
         beginning of the file.

 ``w""   Truncate file to zero length or create text file for writing.
         The stream is positioned at the beginning of the file.

 ``w+""  Open for reading and writing.  The file is created if it does not
         exist, otherwise it is truncated.  The stream is positioned at
         the beginning of the file.

 ``a""   Open for writing.  The file is created if it does not exist.  The
         stream is positioned at the end of the file.  Subsequent writes
         to the file will always end up at the then current end of file,
         irrespective of any intervening fseek(3) or similar.

 ``a+""  Open for reading and writing.  The file is created if it does not
         exist.  The stream is positioned at the end of the file.  Subse-
         quent writes to the file will always end up at the then current
         end of file, irrespective of any intervening fseek(3) or similar.

We hope this article has helped you to resolve the problem. Apart from Writing Unicode text to a text file?, check other join-related topics.

Want to excel in Python? See our review of the best Python online courses 2022. If you are interested in Data Science, check also how to learn programming in R.

By the way, this material is also available in other languages:



Frank Sikorski

Rome | 2022-12-03

I was preparing for my coding interview, thanks for clarifying this - Writing Unicode text to a text file? in Python is not the simplest one. Checked yesterday, it works!

Chen Porretti

Prague | 2022-12-03

Simply put and clear. Thank you for sharing. Writing Unicode text to a text file? and other issues with open was always my weak point 😁. Will use it in my bachelor thesis

Frank Galleotti

Warsaw | 2022-12-03

Thanks for explaining! I was stuck with Writing Unicode text to a text file? for some hours, finally got it done 🤗. Will get back tomorrow with feedback

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically