I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B.
So this:
A B
1 10
1 20
2 30
2 40
3 10
Should turn into this:
A B
1 20
2 40
3 10
Wes has added some nice functionality to drop duplicates: http://wesmckinney.com/blog/?p=340. But AFAICT, it"s designed for exact duplicates, so there"s no mention of criteria for selecting which rows get kept.
I"m guessing there"s probably an easy way to do this---maybe as easy as sorting the dataframe before dropping duplicates---but I don"t know groupby"s internal logic well enough to figure it out. Any suggestions?
python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B log: Questions
Python"s equivalent of && (logical-and) in an if-statement
5 answers
Here"s my code:
def front_back(a, b):
# +++your code here+++
if len(a) % 2 == 0 && len(b) % 2 == 0:
return a[:(len(a)/2)] + b[:(len(b)/2)] + a[(len(a)/2):] + b[(len(b)/2):]
else:
#todo! Not yet done. :P
return
I"m getting an error in the IF conditional.
What am I doing wrong?
Answer #1
You would want and
instead of &&
.
Answer #2
Python uses and
and or
conditionals.
i.e.
if foo == "abc" and bar == "bac" or zoo == "123":
# do something
How do you get the logical xor of two variables in Python?
5 answers
How do you get the logical xor of two variables in Python?
For example, I have two variables that I expect to be strings. I want to test that only one of them contains a True value (is not None or the empty string):
str1 = raw_input("Enter string one:")
str2 = raw_input("Enter string two:")
if logical_xor(str1, str2):
print "ok"
else:
print "bad"
The ^
operator seems to be bitwise, and not defined on all objects:
>>> 1 ^ 1
0
>>> 2 ^ 1
3
>>> "abc" ^ ""
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ^: "str" and "str"
Answer #1
If you"re already normalizing the inputs to booleans, then != is xor.
bool(a) != bool(b)
python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B repeat: Questions
Create list of single item repeated N times
5 answers
I want to create a series of lists, all of varying lengths. Each list will contain the same element e
, repeated n
times (where n
= length of the list).
How do I create the lists, without using a list comprehension [e for number in xrange(n)]
for each list?
Answer #1
You can also write:
[e] * n
You should note that if e is for example an empty list you get a list with n references to the same list, not n independent empty lists.
Performance testing
At first glance it seems that repeat is the fastest way to create a list with n identical elements:
>>> timeit.timeit("itertools.repeat(0, 10)", "import itertools", number = 1000000)
0.37095273281943264
>>> timeit.timeit("[0] * 10", "import itertools", number = 1000000)
0.5577236771712819
But wait - it"s not a fair test...
>>> itertools.repeat(0, 10)
repeat(0, 10) # Not a list!!!
The function itertools.repeat
doesn"t actually create the list, it just creates an object that can be used to create a list if you wish! Let"s try that again, but converting to a list:
>>> timeit.timeit("list(itertools.repeat(0, 10))", "import itertools", number = 1000000)
1.7508119747063233
So if you want a list, use [e] * n
. If you want to generate the elements lazily, use repeat
.
What is the best way to repeatedly execute a function every x seconds?
5 answers
I want to repeatedly execute a function in Python every 60 seconds forever (just like an NSTimer in Objective C). This code will run as a daemon and is effectively like calling the python script every minute using a cron, but without requiring that to be set up by the user.
In this question about a cron implemented in Python, the solution appears to effectively just sleep() for x seconds. I don"t need such advanced functionality so perhaps something like this would work
while True:
# Code executed here
time.sleep(60)
Are there any foreseeable problems with this code?
Answer #1
If your program doesn"t have a event loop already, use the sched module, which implements a general purpose event scheduler.
import sched, time
s = sched.scheduler(time.time, time.sleep)
def do_something(sc):
print("Doing stuff...")
# do your stuff
s.enter(60, 1, do_something, (sc,))
s.enter(60, 1, do_something, (s,))
s.run()
If you"re already using an event loop library like asyncio
, trio
, tkinter
, PyQt5
, gobject
, kivy
, and many others - just schedule the task using your existing event loop library"s methods, instead.
Answer #2
Lock your time loop to the system clock like this:
import time
starttime = time.time()
while True:
print "tick"
time.sleep(60.0 - ((time.time() - starttime) % 60.0))