Seaborn | Categorical plots

Seaborn besides being a statistical plotting library, it also provides some default datasets. We will use one such default dataset called “hints”. The Clues dataset contains information about people who likely had a meal in the restaurant, whether they tip the waiters, their gender, whether they smoke, and so on.

Let`s take a look at the set these tips.

Code

# import marine librarian

import seaborn as sns

 
# import made to avoid warnings

from warnings import filterwarnings

  
# read dataset

df = sns.load _dataset ( `tips` )

  
# first five records if dataset
df.head ()

Now let`s move on to the graphs so we can see how we can visualize these categorical variables.

Barplot

Column Chart is mainly used to aggregate categorical data according to some methods and is the default as the average. It can also be understood as visualizing the group by action. To use this graph, we select a categorical bar for the x-axis and a numeric bar for the y-axis and see that it creates a graph that takes the average of each categorical bar. 
Syntax :

 barplot ([x, y, hue, data, order, hue_order,…]) 

Example:

# set scene background

sns.set_style ( `darkgrid` )

  
# plot using the default evaluator average

sns.barplot (x = `sex` , y = ` total_bill` , data = df, palette = `plasma` )

  
# or

import numpy as np

 
# change grade from average to standard rejection

sns.barplot (x = `sex` , y = ` total_bill` , data = df, 

palette = `plasma` , estimator = np.std)

Exit:

Explanation / Analysis
Looking at the plot, we can say that the average total_bill for a man is larger than for a woman.

  • a palette is used to set the color of the plot
  • The evaluator is used as a statistical function for the score in each categorical bin.
  • Countplot

    A counting graph basically counts the categories and returns the number of their cases. This is one of the simplest plots provided by the Seaborn library.

    Syntax :

     countplot ([x, y, hue, data, order,…]) 

    Example :

    sns.countplot (x = `sex` , data = df)

    Exit :

    Explanation / Analysis
    Looking at the graph, we can say that there are more males than females in the dataset. Since it only returns a quantity based on a categorical column, we only need to specify the x parameter.

    Boxplot

    A box plot is sometimes called a truncated plot. It shows the distribution of quantitative data that represent comparisons between variables. at the checkpoint, the quartiles of the dataset are shown, and the whiskers are extended to show the rest of the distribution, that is, the points indicating the presence of outliers.

    Syntax :

     boxplot ([x, y, hue, data, order, hue_order,…]) 

    Example :

    sns.boxplot (x = `day` , y = `total_bill` , data = df, hue = ` smoker` )

    Output:

    Explanation / Analysis —
    x takes a category column and y — numeric column. Thus, we see the total bill spent for each day. The hue parameter is used to further add categorical separation. Looking at the plot, we can say that people who do not smoke had a higher score on Friday compared to people who smoked.

    Violinplot

    It looks like a roadblock. except that it provides taller, more advanced rendering and uses a kernel density estimate to give a better description about the distribution of the data.

    Syntax :

     violinplot ([x, y, hue, data, order,…]) 

    Example :

    sns.violinplot (x = `day` , y = `total_bill` , data = df, hue = `sex` , split = True )

    Exit:

    Explanation / Analysis —

  • hue is used for further splitting data using gender category
  • setting split = True will draw half a violin for each level. This can make it easier to compare the distributions directly.
  • Stripplot

    This basically creates a scatter plot based on the category.

    Syntax:

     stripplot ([x, y, hue, data, order,…]) 

    Example :

    sns.stripplot (x = ` day` , y = `total_bill` , data = df, 

    jitter = True , hue = `smoker` , dodge = True )

    Exit:

    Explanation / Analysis —

  • One problem with a bar chart is that you cannot tell for sure which points are superimposed and so we use the jitter parameter to add random noise.
  • The jitter parameter is used to add an amount of jitter (along the categorical axis only) which can be useful when you have many points, and they overlap to make the distribution easier to see.
  • hue is used to provide additional categorical separation
  • The split = True parameter is used to draw separate bar graphs based on the category specified by the hue parameter.

    Swarmplot

    This is very similar to a stripplot, except for the fact that the points are adjusted so they don`t overlap. Some people also like to combine the idea of ​​a violin plot and a stripplot to shape that plot. The downside to using Swarmplot is that sometimes they don`t scale well to really large numbers, and it takes a lot of computation to order them. This way, in case we want to render Swarmplot correctly, we can plot it over the violin.

    Syntax :

     swarmplot ([x, y, hue, data, order,…]) 

    Example :

    sns.swarmplot (x = `day` , y = ` total_bill` , data = df)

    Output:

    Example :

    sns.violinplot (x = ` day` , y = `total_bill` , data = df)

    sns .swarmplot (x = `day` , y = `total_bill` , data = df, color = `black` )

    Exit:

    Factorplot

    He is the most common of all these plots and provides an option called a view to select the type of plot we want, thus saving us from having to write these plots separately. Parameter type can be bar, violin, swarm, etc.

    Syntax :

     sns.factorplot ([x, y, hue, data, row, col,…]) 

    Example :

    sns.factorplot (x = `day` , y = `total_bill` , data = df, kind = `bar` )

    Exit: