python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B

| | |

I have a dataframe with repeat values in column A. I want to drop duplicates, keeping the row with the highest value in column B.

So this:

A B
1 10
1 20
2 30
2 40
3 10

Should turn into this:

A B
1 20
2 40
3 10

Wes has added some nice functionality to drop duplicates: http://wesmckinney.com/blog/?p=340. But AFAICT, it"s designed for exact duplicates, so there"s no mention of criteria for selecting which rows get kept.

I"m guessing there"s probably an easy way to do this---maybe as easy as sorting the dataframe before dropping duplicates---but I don"t know groupby"s internal logic well enough to figure it out. Any suggestions?

python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B log: Questions

log

Python"s equivalent of && (logical-and) in an if-statement

5 answers

delete By delete

Here"s my code:

def front_back(a, b):
  # +++your code here+++
  if len(a) % 2 == 0 && len(b) % 2 == 0:
    return a[:(len(a)/2)] + b[:(len(b)/2)] + a[(len(a)/2):] + b[(len(b)/2):] 
  else:
    #todo! Not yet done. :P
  return

I"m getting an error in the IF conditional.
What am I doing wrong?

934

Answer #1

You would want and instead of &&.

934

Answer #2

Python uses and and or conditionals.

i.e.

if foo == "abc" and bar == "bac" or zoo == "123":
  # do something

log

How do you get the logical xor of two variables in Python?

5 answers

Zach Hirsch By Zach Hirsch

How do you get the logical xor of two variables in Python?

For example, I have two variables that I expect to be strings. I want to test that only one of them contains a True value (is not None or the empty string):

str1 = raw_input("Enter string one:")
str2 = raw_input("Enter string two:")
if logical_xor(str1, str2):
    print "ok"
else:
    print "bad"

The ^ operator seems to be bitwise, and not defined on all objects:

>>> 1 ^ 1
0
>>> 2 ^ 1
3
>>> "abc" ^ ""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ^: "str" and "str"
794

Answer #1

If you"re already normalizing the inputs to booleans, then != is xor.

bool(a) != bool(b)

python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B repeat: Questions

Create list of single item repeated N times

5 answers

I want to create a series of lists, all of varying lengths. Each list will contain the same element e, repeated n times (where n = length of the list).

How do I create the lists, without using a list comprehension [e for number in xrange(n)] for each list?

618

Answer #1

You can also write:

[e] * n

You should note that if e is for example an empty list you get a list with n references to the same list, not n independent empty lists.

Performance testing

At first glance it seems that repeat is the fastest way to create a list with n identical elements:

>>> timeit.timeit("itertools.repeat(0, 10)", "import itertools", number = 1000000)
0.37095273281943264
>>> timeit.timeit("[0] * 10", "import itertools", number = 1000000)
0.5577236771712819

But wait - it"s not a fair test...

>>> itertools.repeat(0, 10)
repeat(0, 10)  # Not a list!!!

The function itertools.repeat doesn"t actually create the list, it just creates an object that can be used to create a list if you wish! Let"s try that again, but converting to a list:

>>> timeit.timeit("list(itertools.repeat(0, 10))", "import itertools", number = 1000000)
1.7508119747063233

So if you want a list, use [e] * n. If you want to generate the elements lazily, use repeat.

What is the best way to repeatedly execute a function every x seconds?

5 answers

DavidM By DavidM

I want to repeatedly execute a function in Python every 60 seconds forever (just like an NSTimer in Objective C). This code will run as a daemon and is effectively like calling the python script every minute using a cron, but without requiring that to be set up by the user.

In this question about a cron implemented in Python, the solution appears to effectively just sleep() for x seconds. I don"t need such advanced functionality so perhaps something like this would work

while True:
    # Code executed here
    time.sleep(60)

Are there any foreseeable problems with this code?

391

Answer #1

If your program doesn"t have a event loop already, use the sched module, which implements a general purpose event scheduler.

import sched, time
s = sched.scheduler(time.time, time.sleep)
def do_something(sc): 
    print("Doing stuff...")
    # do your stuff
    s.enter(60, 1, do_something, (sc,))

s.enter(60, 1, do_something, (s,))
s.run()

If you"re already using an event loop library like asyncio, trio, tkinter, PyQt5, gobject, kivy, and many others - just schedule the task using your existing event loop library"s methods, instead.

391

Answer #2

Lock your time loop to the system clock like this:

import time
starttime = time.time()
while True:
    print "tick"
    time.sleep(60.0 - ((time.time() - starttime) % 60.0))

Shop

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Best laptop for Zoom

$499

Best laptop for Minecraft

$590

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News

Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method