Indexing and selecting data with pandas

File handling | NumPy | Python Methods and Functions

Let`s take a look at an example of indexing in Pandas. In this article we are using “ nba.csv ” file to upload CSV, click here .

Multiple row and multiple column selection

Let`s take a DataFrame with some fake data, now we are indexing this DataFrame. In this, we select multiple rows and multiple columns from the DataFrame. Data frame with dataset.

Suppose we want to select the columns Age , College and Salary only for rows labeled Amir Johnson and Terry Rozier

Our last DataFrame will look like this:

Select multiple rows and all columns

Let`s say we want to select the line Amir Jhonson , Terry Rozier and John Holland with all columns in the dataframe .

Our last DataFrame will be look like this:

Selecting some columns and all rows

Let`s say we want to select the Age, Height and Salary columns with all the rows in the dataframe.

Our last DataFrame will look like this:

Indexing pandas using [] , .loc [] , . iloc [] , Dataframe .loc [] : this function is used for labels.
  • Dataframe.iloc [] : this function is used for positions or integers
  • Dataframe.ix [] : this function is used for both labels and integers.
  • Collectively they are called indexers . These are by far the most common ways to index data. These are four functions that help you get elements, rows, and columns from a DataFrame.

    Indexing a Dataframe using the [] indexing operator:
    Operator indexing is used to refer to square brackets following an object. In

    # pandas package import

    import pandas as pd

      
    # create a data frame from a CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     
    # extracting columns using the index operator

    first = data [ "Age" ]

     

     

      

    print (first)

    Exit:

    Selecting multiple columns

    To select multiple columns, we must pass a list of columns in the indexing statement.

    # import pandas package

    import pa ndas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = " Name " )

      
    # fetch multiple columns using the index operator

    first = data [[ "Age" , "College" , "Salary" ]]

     

     

      
    first

    Exit:

    Indexing the DataFrame using

    # pandas package import

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     
    # extract string reading using loc method

    first = data .loc [ "Avery Bradley" ]

    second = data.loc [ "RJ Hunter" ]

     

     

    print (first, "" , second)

    Output:
    As shown in the output image, two series were returned as there was only one parameter both times.

    Selecting multiple lines

    To select multiple lines, we put all the line labels in a list and pass them to the function . loc .

    Output:

    Selecting two rows and three columns

    To select two rows and three columns, we select the two rows we want to select and three columns and put it in a separate list like this:

     Dataframe.loc [["row1", "row2"], ["column1", "column2", "column3"] ] 

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     
    # get multiple lines using loc method

    first = data.loc [[ "Avery Bradley" , " RJ Hunter " ]]

      

      

      

    print (first)

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     
    # extracting two rows and three columns using the loc method

    first = data.loc [[ "Avery Bradley" , "RJ Hunter" ] ,

    [ "Team" , "Number" , "Position" ]]

     

      

     

    print ( first)

    Output:

    Selecting all rows and some columns

    To select all rows and some columns, we use a single colon [:], to select all rows and a list of some of the columns we want to select as follows:

     Dataframe.loc [[:, ["column1", "column2", "column3"]] 

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     
    # fetch all rows and some columns using the loc method

    first = data.loc [:, [ "Team" , "Number" , "Position" ]]

     

     

     

    print (first)

    Exit:

    Indexing DataFrame using . iloc [] :
    This function allows us to get rows and columns by position. To do this, we need to specify the positions of the rows we need, as well as the positions of the columns we need. df.iloc is very similar to df.loc but only uses integer locations for selection.

    Single line selection

    To select one line using .iloc [] , we can pass one integer to .iloc [] .

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     

      
    # iloc extraction of rows

    row2 = data.iloc [ 3

     

     

     

    print (row2)

    Exit:

    import pandas as pd

     
    # create a data frame from a CSV file

    data = pd.read_csv ( " nba.csv " , index_col = "Name" )

     

     
    # getting multiple lines using the iloc method

    row2 = data.iloc [[ 3 , 5 , 7 ]]

     

     

     
    row2

    Exit:

    Selecting two rows and two columns

    To select two rows and two columns, we create a list of 2 integers for strings and a list of 2 integers for columns, and then pass the function .iloc [] .

    Exit:

    Select all rows and some columns

    To select all rows and some columns, we use a single colon [:], to select all rows, and for columns, we compose a list of integers and then pass the function .iloc [] .

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     

     
    # extraction of two rows and two columns using the iloc method

    row2 = data.iloc [[[ 3 , 4 ], [ 1 , 2 ]]

      

      

      

    print (row2)

    Exit:

    Indexing using .ix [] as . loc []

    To select one row, we put a single row label in the function. ix . This function acts like .loc [] if we pass the line label as an argument to the function.

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     

      
    # extraction of all rows and some columns using the iloc method

    row2 = data.iloc [:, [ 1 , 2 ]]

      

      

      

    print (row2)

    # import pandas package

    import pandas as pd

     
    # create data frame from CSV file

    data = pd.read_csv ( " nba.csv " , index_col = "Name" )

     
    # getting a string using the ix method

    first = data.ix [ "Avery Bradley" ]

     

     

     

    print (first)

      

    Exit:

    Selecting one line using . ix [] as

    # import pandas package

    import pandas as pd

     
    # create a data frame from a CSV file

    data = pd.read_csv ( "nba.csv" , index_col = "Name" )

     
    # getting a string using the ix method

    first = data.ix [ 1 ]

      

      

                                                                                                                                                                

    print (first)

    Exit:

    Indexing Methods in DataFrame

    Function Description
    Dataframe.head () Return top n rows of a data frame.
    Dataframe.tail () Return bottom n rows of a data frame.
    Dataframe.at [] Access a single value for a row / column label pair.
    Dataframe.iat [] Access a single value for a row / column pair by integer position .
    Dataframe.tail () Purely integer-location based indexing for selection by position.
    DataFrame.lookup () Label-based “fancy indexing” function for DataFrame.
    DataFrame.pop () Return item and drop from frame.
    DataFrame.xs() Returns a cross-section (row (s) or column (s)) from the DataFrame.
    DataFrame.get () Get item from object for given key (DataFrame column, Panel slice, etc.).
    DataFrame.isin () Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
    DataFrame.where () Return an object of same shape as self and whose corresponding entr ies are from self where cond is True and otherwise are from other.


    Get Solution for free from DataCamp guru