Split Strings into words with multiple word boundary delimiters

| | | | | |

👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!

I think what I want to do is a fairly common task but I"ve found no reference on the web. I have text with punctuation, and I want a list of the words.

"Hey, you - what are you doing here!?"

should be

["hey", "you", "what", "are", "you", "doing", "here"]

But Python"s str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?

👻 Read also: what is the best laptop for engineering students?

Split Strings into words with multiple word boundary delimiters __del__: Questions

How can I make a time delay in Python?

5 answers

I would like to know how to put a time delay in a Python script.


Answer #1

import time
time.sleep(5)   # Delays for 5 seconds. You can also use a float value.

Here is another example where something is run approximately once a minute:

import time
while True:
    print("This prints once a minute.")
    time.sleep(60) # Delay for 1 minute (60 seconds).


Answer #2

You can use the sleep() function in the time module. It can take a float argument for sub-second resolution.

from time import sleep
sleep(0.1) # Time in seconds

Split Strings into words with multiple word boundary delimiters __del__: Questions

How to delete a file or folder in Python?

5 answers

How do I delete a file or folder in Python?


Answer #1

Path objects from the Python 3.4+ pathlib module also expose these instance methods:

Best way to strip punctuation from a string

3 answers

Lawrence Johnston By Lawrence Johnston

It seems like there should be a simpler way than:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("";""), string.punctuation)

Is there?


Answer #1

From an efficiency perspective, you"re not going to beat

s.translate(None, string.punctuation)

For higher versions of Python use the following code:

s.translate(str.maketrans("", "", string.punctuation))

It"s performing raw string operations in C with a lookup table - there"s not much that will beat that but writing your own C code.

If speed isn"t a worry, another option though is:

exclude = set(string.punctuation)
s = "".join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won"t perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

Timing code:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("";"")
regex = re.compile("[%s]" % re.escape(string.punctuation))

def test_set(s):
    return "".join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko"s solution, with fix.
    return regex.sub("", s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott"s solution
    for c in string.punctuation:
    return s

print "sets      :",timeit.Timer("f(s)", "from __main__ import s,test_set as f").timeit(1000000)
print "regex     :",timeit.Timer("f(s)", "from __main__ import s,test_re as f").timeit(1000000)
print "translate :",timeit.Timer("f(s)", "from __main__ import s,test_trans as f").timeit(1000000)
print "replace   :",timeit.Timer("f(s)", "from __main__ import s,test_repl as f").timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

Remove all special characters, punctuation and spaces from string

3 answers

I need to remove all special characters, punctuation and spaces from a string so that I only have letters and numbers.


Answer #1

This can be done without regex:

>>> string = "Special $#! characters   spaces 888323"
>>> "".join(e for e in string if e.isalnum())

You can use str.isalnum:

S.isalnum() -> bool

Return True if all characters in S are alphanumeric
and there is at least one character in S, False otherwise.

If you insist on using regex, other solutions will do fine. However note that if it can be done without using a regular expression, that"s the best way to go about it.


Answer #2

Here is a regex to match a string of characters that are not a letters or numbers:


Here is the Python command to do a regex substitution:

re.sub("[^A-Za-z0-9]+", "", mystring)


Learn programming in R: courses


Best Python online courses for 2022


Best laptop for Fortnite


Best laptop for Excel


Best laptop for Solidworks


Best laptop for Roblox


Best computer for crypto mining


Best laptop for Sims 4


Latest questions


psycopg2: insert multiple rows with one query

12 answers


How to convert Nonetype to int or string?

12 answers


How to specify multiple return types using type-hints

12 answers


Javascript Error: IPython is not defined in JupyterLab

12 answers



Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method