Conversion Functions in Pandas DataFrame

Python — great language for data analysis, thanks primarily to the fantastic ecosystem of data-centric Python packages. Pandas is one such package and makes it much easier to import and analyze data. In this article, we are using “ nba.csv ” file to upload CSV, click here.

Cast panda object to specified type

The DataFrame.astype () function is used to cast the pandas object to the specified dtype.  astype () also provides the ability to convert any suitable existing column to a categorical type.

Code # 1: convert the data type of the Weight column.

# import pandas as pd

import pandas as pd

 
# Create data frame from CSV file

df = pd.read_csv ( "nba.csv" )

 
# Print the first 10 lines
# data frame for rendering

 

df [: 10 ]

Since the data has some “nan” values, to avoid any error we will discard all lines containing any nan values.

# discard all those lines
# is & # 39; nan & # 39 ;.

df.dropna (inplace = True )

# let`s figure out the data type of the W column eight

before = type (df.Weight [ 0 ])

 
# We are now converting it to int64.

df.Weight = df.We & lt; strong & gt; ight.astype ( `int64` )

  
# let`s figure out the data type after casting

after = type (df.Weight [ 0 ])

  
# print value before
before

  
# output value after
after

Output:

# print the data frame and see
# as this looks after change
df

Suggest a better data type for the column input object

# import pandas as pd

import pandas as pd

 
# Create a data frame

df = pd.DataFrame ({ "A" : [ "sofia" , 5 , 8 , 11 , 100 ],

 

"B" : [ 2 , 8 , 77 , 4 , 11 ],

"C" : [ "amy" , 11 , 4 , 6 , 9 ]})

  
# Print the data frame

print (df)

Output:

Let`s see the dtype (data type) of each column in the data frame.

# print basic information
df.info ()

As we can see in the output, the first and third columns are of type object . whereas the second column is of type int64 . Now slice the dataframe and create a new dataframe from it.

# cut from 1st row to end

df_new = df [ 1 :]

 
# Let`s print a new data frame
df_new

 
# Now let`s print the data type of the columns
df_new.info ()

Output:

As we see in the output, the columns“ A “and” C “are of object type even if they contain an integer value. So let`s try infer_objects () .

# using infer_objects () function.

df_new = df_new.infer_objects ()

 
# Print dtype after function is applied
df_new. info ()

Output:

Now if we look at the d type of each column, we can see that columns “A” and ” C “is now int64 of type int64 .

Detect missing values ​​

DataFrame.isna () function used Used to detect missing values. It returns a boolean of the same size indicating whether the values ​​are NA. NA values ​​such as None or numpy.NaN are mapped to True values. Everything else is matched against false values. Characters such as blank lines “or numpy.inf are not considered NA values ​​(unless you have set pandas.options.mode.use_inf_as_na = True).

Code # 1: Use the isna () function to detect missing values ​​in the data frame.

# import pandas as pd

import pandas as pd

  
# Create data frame

df = pd.read_csv ( "nba.csv" )

 
# Print the data frame
df

Let`s use the isna () function to find missing values.

# detect missing values ​​
df.isna ()

Output:

In the output, cells corresponding to missing values ​​are true, otherwise false.

Find existing / not missing values ​​

The DataFrame.notna () function detects existing / non-missing values ​​in the dataframe. The function returns a boolean that has the same size as the object to which it is applied, indicating whether each individual value is n or not. All non-missing values ​​are displayed as true and missing values ​​are displayed as false.

Code # 1: Use notna () to find all non-missing values in the data frame.

# import pandas as pd

import pandas as pd

 
# Create first data frame

df = pd. DataFrame ({ "A" : [ 14 , 4 , 5 , 4 , 1 ],

"B" : [ 5 , 2 , 54 , 3 , 2 ], 

"C" : [ 20 , 20 , 7 , 3 , 8 ],

"D" : [ 14 , , 6 , 2 , 6 ]})

 
# Print the data frame

print (df)

Let`s use the dataframe.notna () function to find all non-missing values ​​in the data frame.

# find non-values ​​
df. notna ()

Output:

 As we can see in the output, all non-missing values ​​in the data frame have been matched against true. No falsy value because there is no missing value in the data frame.

Conversion methods to DataFrame

Function Description
DataFrame.convert_objects () Attempt to infer better dtype for object columns.
DataFrame.copy() Return a copy of this object`s indices and data.
DataFrame.bool ( ) Return the bool of a single element PandasObject.