pandas iterrows 有性能問題嗎?

| | | | | | | |

我注意到使用 pandas 的 iterrows 時性能很差。

這是其他人經歷過的事情嗎?它是特定於 iterrows 的嗎?對於一定大小的數據(我正在處理 2-3 百萬行),是否應該避免使用此函數?

GitHub 上的這個討論 讓我相信這是在數據框中混合 dtypes 時引起的,但是下面的簡單示例顯示即使使用一種 dtype (float64) 也是如此。這個在我的機器上需要 36 秒:

import pandas as pd import numpy as np import time s1 = np.random.randn(2000000) s2 = np.random.randn(2000000) dfa = pd .DataFrame({"s1": s1, "s2": s2}) start = time.time() i=0 for rowindex, row in dfa.iterrows(): i+=1 end = time.time() print end - start 

為什麼像apply這樣的矢量化操作這麼快?我想那裡也一定有一些逐行迭代。

我不知道怎麼做在我的情況下不使用 iterrows (我將保存為以後的問題)。因此,如果您一直能夠避免這種迭代,我將不勝感激。我正在根據單獨數據幀中的數據進行計算。謝謝!

---編輯:下面添加了我要運行的簡化版本---

import pandas as pd import numpy as np #%% 創建原始表 t1 = {"letter":["a","b"], "number1":[50,-10]} t2 = {"letter ":["a","a","b","b"], "number2":[0.2,0.5,0.1,0.4]} table1 = pd.DataFrame(t1) table2 = pd.DataFrame(t2) #%% 創建新表的主體 table3 = pd.DataFrame(np.nan, columns=["letter","number2"], index=[0]) #%% 迭代過濾相關數據,優化,返回row_index 的信息,table1.iterrows() 中的行: t2info = table2[table2.letter == row["letter"]].reset_index() table3.ix[row_index,] = optimize(t2info,row["number1"] ) #%% 定義優化 def optimize(t2info, t1info):calculation = [] for index, r in t2info.iterrows():calculation.append(r["number2"]*t1info) maxrow = calculation.index(max(計算)) 返回 t2info.ix[maxrow] 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News


Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method