👻 Check our latest review to choose the best laptop for Machine Learning engineers and Deep learning tasks!
I have two lists:
- a list of about 750K "sentences" (long strings)
- a list of about 20K "words" that I would like to delete from my 750K sentences
So, I have to loop through 750K sentences and perform about 20K replacements, but ONLY if my words are actually "words" and are not part of a larger string of characters.
I am doing this by pre-compiling my words so that they are flanked by the word-boundary metacharacter:
compiled_words = [re.compile(r"" + word + r"") for word in my20000words]
Then I loop through my "sentences":
import re
for sentence in sentences:
for word in compiled_words:
sentence = re.sub(word, "", sentence)
# put sentence into a growing list
This nested loop is processing about 50 sentences per second, which is nice, but it still takes several hours to process all of my sentences.
Is there a way to using the
str.replace
method (which I believe is faster), but still requiring that replacements only happen at word boundaries?Alternatively, is there a way to speed up the
re.sub
method? I have already improved the speed marginally by skipping overre.sub
if the length of my word is > than the length of my sentence, but it"s not much of an improvement.
I"m using Python 3.5.2
👻 Read also: what is the best laptop for engineering students?
Speed up millions of regex replacements in Python 3 __del__: Questions
How can I make a time delay in Python?
5 answers
I would like to know how to put a time delay in a Python script.
Answer #1
import time
time.sleep(5) # Delays for 5 seconds. You can also use a float value.
Here is another example where something is run approximately once a minute:
import time
while True:
print("This prints once a minute.")
time.sleep(60) # Delay for 1 minute (60 seconds).
Answer #2
You can use the sleep()
function in the time
module. It can take a float argument for sub-second resolution.
from time import sleep
sleep(0.1) # Time in seconds
Speed up millions of regex replacements in Python 3 __del__: Questions
How to delete a file or folder in Python?
5 answers
How do I delete a file or folder in Python?
Answer #1
os.remove()
removes a file.os.rmdir()
removes an empty directory.shutil.rmtree()
deletes a directory and all its contents.
Path
objects from the Python 3.4+ pathlib
module also expose these instance methods:
pathlib.Path.unlink()
removes a file or symbolic link.pathlib.Path.rmdir()
removes an empty directory.
Answer #2
os.remove()
removes a file.os.rmdir()
removes an empty directory.shutil.rmtree()
deletes a directory and all its contents.
Path
objects from the Python 3.4+ pathlib
module also expose these instance methods:
pathlib.Path.unlink()
removes a file or symbolic link.pathlib.Path.rmdir()
removes an empty directory.
Answer #3
Python syntax to delete a file
import os
os.remove("/tmp/<file_name>.txt")
Or
import os
os.unlink("/tmp/<file_name>.txt")
Or
pathlib Library for Python version >= 3.4
file_to_rem = pathlib.Path("/tmp/<file_name>.txt")
file_to_rem.unlink()
Path.unlink(missing_ok=False)
Unlink method used to remove the file or the symbolik link.
If missing_ok is false (the default), FileNotFoundError is raised if the path does not exist.
If missing_ok is true, FileNotFoundError exceptions will be ignored (same behavior as the POSIX rm -f command).
Changed in version 3.8: The missing_ok parameter was added.
Best practice
- First, check whether the file or folder exists or not then only delete that file. This can be achieved in two ways :
a.os.path.isfile("/path/to/file")
b. Useexception handling.
EXAMPLE for os.path.isfile
#!/usr/bin/python
import os
myfile="/tmp/foo.txt"
## If file exists, delete it ##
if os.path.isfile(myfile):
os.remove(myfile)
else: ## Show an error ##
print("Error: %s file not found" % myfile)
Exception Handling
#!/usr/bin/python
import os
## Get input ##
myfile= raw_input("Enter file name to delete: ")
## Try to delete the file ##
try:
os.remove(myfile)
except OSError as e: ## if failed, report it back to the user ##
print ("Error: %s - %s." % (e.filename, e.strerror))
RESPECTIVE OUTPUT
Enter file name to delete : demo.txt Error: demo.txt - No such file or directory. Enter file name to delete : rrr.txt Error: rrr.txt - Operation not permitted. Enter file name to delete : foo.txt
Python syntax to delete a folder
shutil.rmtree()
Example for shutil.rmtree()
#!/usr/bin/python
import os
import sys
import shutil
# Get directory name
mydir= raw_input("Enter directory name: ")
## Try to remove tree; if failed show an error using try...except on screen
try:
shutil.rmtree(mydir)
except OSError as e:
print ("Error: %s - %s." % (e.filename, e.strerror))
Is there a simple way to delete a list element by value?
5 answers
I want to remove a value from a list if it exists in the list (which it may not).
a = [1, 2, 3, 4]
b = a.index(6)
del a[b]
print(a)
The above case (in which it does not exist) shows the following error:
Traceback (most recent call last):
File "D:zjm_codea.py", line 6, in <module>
b = a.index(6)
ValueError: list.index(x): x not in list
So I have to do this:
a = [1, 2, 3, 4]
try:
b = a.index(6)
del a[b]
except:
pass
print(a)
But is there not a simpler way to do this?
Answer #1
To remove an element"s first occurrence in a list, simply use list.remove
:
>>> a = ["a", "b", "c", "d"]
>>> a.remove("b")
>>> print(a)
["a", "c", "d"]
Mind that it does not remove all occurrences of your element. Use a list comprehension for that.
>>> a = [10, 20, 30, 40, 20, 30, 40, 20, 70, 20]
>>> a = [x for x in a if x != 20]
>>> print(a)
[10, 30, 40, 30, 40, 70]
We hope this article has helped you to resolve the problem. Apart from Speed up millions of regex replacements in Python 3, check other __del__-related topics.
Want to excel in Python? See our review of the best Python online courses 2023. If you are interested in Data Science, check also how to learn programming in R.
By the way, this material is also available in other languages:
- Italiano Speed up millions of regex replacements in Python 3
- Deutsch Speed up millions of regex replacements in Python 3
- Français Speed up millions of regex replacements in Python 3
- Español Speed up millions of regex replacements in Python 3
- Türk Speed up millions of regex replacements in Python 3
- Русский Speed up millions of regex replacements in Python 3
- Português Speed up millions of regex replacements in Python 3
- Polski Speed up millions of regex replacements in Python 3
- Nederlandse Speed up millions of regex replacements in Python 3
- 中文 Speed up millions of regex replacements in Python 3
- 한국어 Speed up millions of regex replacements in Python 3
- 日本語 Speed up millions of regex replacements in Python 3
- हिन्दी Speed up millions of regex replacements in Python 3
Abu Dhabi | 2023-04-01
code Python module is always a bit confusing 😭 Speed up millions of regex replacements in Python 3 is not the only problem I encountered. Will use it in my bachelor thesis
Tallinn | 2023-04-01
Maybe there are another answers? What Speed up millions of regex replacements in Python 3 exactly means?. I just hope that will not emerge anymore
Vigrinia | 2023-04-01
Thanks for explaining! I was stuck with Speed up millions of regex replacements in Python 3 for some hours, finally got it done 🤗. Will get back tomorrow with feedback