This code opens the URL and appends the /names
at the end and opens the page and prints the string to test1.csv
:
import urllib2
import re
import csv
url = ("http://www.example.com")
bios = [u"/name1", u"/name2", u"/name3"]
csvwriter = csv.writer(open("/test1.csv", "a"))
for l in bios:
OpenThisLink = url + l
response = urllib2.urlopen(OpenThisLink)
html = response.read()
item = re.search("(JD)(.*?)(d+)", html)
if item:
JD = item.group()
csvwriter.writerow(JD)
else:
NoJD = "NoJD"
csvwriter.writerow(NoJD)
But I get this result:
J,D,",", ,C,o,l,u,m,b,i,a, ,L,a,w, ,S,c,h,o,o,l,....
If I change the string to ("JD", "Columbia Law School" ....) then I get
JD, Columbia Law School...)
I couldn"t find in the documentation how to specify the delimeter.
If I try to use delimeter
I get this error:
TypeError: "delimeter" is an invalid keyword argument for this function
Why does csvwriter.writerow() put a comma after each character? __del__: Questions
How can I make a time delay in Python?
5 answers
I would like to know how to put a time delay in a Python script.
Answer #1
import time
time.sleep(5) # Delays for 5 seconds. You can also use a float value.
Here is another example where something is run approximately once a minute:
import time
while True:
print("This prints once a minute.")
time.sleep(60) # Delay for 1 minute (60 seconds).
Answer #2
You can use the sleep()
function in the time
module. It can take a float argument for sub-second resolution.
from time import sleep
sleep(0.1) # Time in seconds
How to delete a file or folder in Python?
5 answers
How do I delete a file or folder in Python?
Answer #1
os.remove()
removes a file.os.rmdir()
removes an empty directory.shutil.rmtree()
deletes a directory and all its contents.
Path
objects from the Python 3.4+ pathlib
module also expose these instance methods:
pathlib.Path.unlink()
removes a file or symbolic link.pathlib.Path.rmdir()
removes an empty directory.
Why does csvwriter.writerow() put a comma after each character? find: Questions
Finding the index of an item in a list
5 answers
Given a list ["foo", "bar", "baz"]
and an item in the list "bar"
, how do I get its index (1
) in Python?
Answer #1
>>> ["foo", "bar", "baz"].index("bar")
1
Reference: Data Structures > More on Lists
Caveats follow
Note that while this is perhaps the cleanest way to answer the question as asked, index
is a rather weak component of the list
API, and I can"t remember the last time I used it in anger. It"s been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index
follow. It is probably worth initially taking a look at the documentation for it:
list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is equal to x. Raises a
ValueError
if there is no such item.The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.
Linear time-complexity in list length
An index
call checks every element of the list in order, until it finds a match. If your list is long, and you don"t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index
a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000)
is roughly five orders of magnitude faster than straight l.index(999_999)
, because the former only has to search 10 entries, while the latter searches a million:
>>> import timeit
>>> timeit.timeit("l.index(999_999)", setup="l = list(range(0, 1_000_000))", number=1000)
9.356267921015387
>>> timeit.timeit("l.index(999_999, 999_990, 1_000_000)", setup="l = list(range(0, 1_000_000))", number=1000)
0.0004404920036904514
Only returns the index of the first match to its argument
A call to index
searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.
>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2
Most places where I once would have used index
, I now use a list comprehension or generator expression because they"re more generalizable. So if you"re considering reaching for index
, take a look at these excellent Python features.
Throws if element not present in list
A call to index
results in a ValueError
if the item"s not present.
>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list
If the item might not be present in the list, you should either
- Check for it first with
item in my_list
(clean, readable approach), or - Wrap the
index
call in atry/except
block which catchesValueError
(probably faster, at least when the list to search is long, and the item is usually present.)
Answer #2
One thing that is really helpful in learning Python is to use the interactive help function:
>>> help(["foo", "bar", "baz"])
Help on list object:
class list(object)
...
|
| index(...)
| L.index(value, [start, [stop]]) -> integer -- return first index of value
|
which will often lead you to the method you are looking for.
Answer #3
The majority of answers explain how to find a single index, but their methods do not return multiple indexes if the item is in the list multiple times. Use enumerate()
:
for i, j in enumerate(["foo", "bar", "baz"]):
if j == "bar":
print(i)
The index()
function only returns the first occurrence, while enumerate()
returns all occurrences.
As a list comprehension:
[i for i, j in enumerate(["foo", "bar", "baz"]) if j == "bar"]
Here"s also another small solution with itertools.count()
(which is pretty much the same approach as enumerate):
from itertools import izip as zip, count # izip for maximum efficiency
[i for i, j in zip(count(), ["foo", "bar", "baz"]) if j == "bar"]
This is more efficient for larger lists than using enumerate()
:
$ python -m timeit -s "from itertools import izip as zip, count" "[i for i, j in zip(count(), ["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 174 usec per loop
$ python -m timeit "[i for i, j in enumerate(["foo", "bar", "baz"]*500) if j == "bar"]"
10000 loops, best of 3: 196 usec per loop