Change language

Seaborn | Categorical plots

| |

Seaborn besides being a statistical plotting library, it also provides some default datasets. We will use one such default dataset called "hints". The Clues dataset contains information about people who likely had a meal in the restaurant, whether they tip the waiters, their gender, whether they smoke, and so on.

Let’s take a look at the set these tips.

Code

# import marine librarian

import seaborn as sns

 
# import made to avoid warnings

from warnings import filterwarnings

  
# read dataset

df = sns.load _dataset ( ’tips’ )

  
# first five records if dataset
df.head ()

Now let’s move on to the graphs so we can see how we can visualize these categorical variables.

Barplot

Column Chart is mainly used to aggregate categorical data according to some methods and is the default as the average. It can also be understood as visualizing the group by action. To use this graph, we select a categorical bar for the x-axis and a numeric bar for the y-axis and see that it creates a graph that takes the average of each categorical bar. 
Syntax :

 barplot ([x, y, hue, data, order, hue_order,…]) 

Example:

# set scene background

sns.set_style ( ’darkgrid’ )

  
# plot using the default evaluator average

sns.barplot (x = ’sex’ , y = ’ total_bill’ , data = df, palette = ’plasma’ )

  
# or

import numpy as np

 
# change grade from average to standard rejection

sns.barplot (x = ’sex’ , y = ’ total_bill’ , data = df, 

palette = ’plasma’ , estimator = np.std)

Exit:

Explanation / Analysis
Looking at the plot, we can say that the average total_bill for a man is larger than for a woman.

  • a palette is used to set the color of the plot
  • The evaluator is used as a statistical function for the score in each categorical bin.
  • Countplot

    A counting graph basically counts the categories and returns the number of their cases. This is one of the simplest plots provided by the Seaborn library.

    Syntax :

     countplot ([x, y, hue, data, order,…]) 

    Example :

    sns.countplot (x = ’sex’ , data = df)

    Exit :

    Explanation / Analysis
    Looking at the graph, we can say that there are more males than females in the dataset. Since it only returns a quantity based on a categorical column, we only need to specify the x parameter.

    Boxplot

    A box plot is sometimes called a truncated plot. It shows the distribution of quantitative data that represent comparisons between variables. at the checkpoint, the quartiles of the dataset are shown, and the whiskers are extended to show the rest of the distribution, that is, the points indicating the presence of outliers.

    Syntax :

     boxplot ([x, y, hue, data, order, hue_order,…]) 

    Example :

    sns.boxplot (x = ’day’ , y = ’total_bill’ , data = df, hue = ’ smoker’ )

    Output:

    Explanation / Analysis —
    x takes a category column and y — numeric column. Thus, we see the total bill spent for each day. The hue parameter is used to further add categorical separation. Looking at the plot, we can say that people who do not smoke had a higher score on Friday compared to people who smoked.

    Violinplot

    It looks like a roadblock. except that it provides taller, more advanced rendering and uses a kernel density estimate to give a better description about the distribution of the data.

    Syntax :

     violinplot ([x, y, hue, data, order,…]) 

    Example :

    sns.violinplot (x = ’day’ , y = ’total_bill’ , data = df, hue = ’sex’ , split = True )

    Exit:

    Explanation / Analysis —

  • hue is used for further splitting data using gender category
  • setting split = True will draw half a violin for each level. This can make it easier to compare the distributions directly.
  • Stripplot

    This basically creates a scatter plot based on the category.

    Syntax:

     stripplot ([x, y, hue, data, order,…]) 

    Example :

    sns.stripplot (x = ’ day’ , y = ’total_bill’ , data = df, 

    jitter = True , hue = ’smoker’ , dodge = True )

    Exit:

    Explanation / Analysis —

  • One problem with a bar chart is that you cannot tell for sure which points are superimposed and so we use the jitter parameter to add random noise.
  • The jitter parameter is used to add an amount of jitter (along the categorical axis only) which can be useful when you have many points, and they overlap to make the distribution easier to see.
  • hue is used to provide additional categorical separation
  • The split = True parameter is used to draw separate bar graphs based on the category specified by the hue parameter.
  • Swarmplot

    This is very similar to a stripplot, except for the fact that the points are adjusted so they don’t overlap. Some people also like to combine the idea of ​​a violin plot and a stripplot to shape that plot. The downside to using Swarmplot is that sometimes they don’t scale well to really large numbers, and it takes a lot of computation to order them. This way, in case we want to render Swarmplot correctly, we can plot it over the violin.

    Syntax :

     swarmplot ([x, y, hue, data, order,…]) 

    Example :

    sns.swarmplot (x = ’day’ , y = ’ total_bill’ , data = df)

    Output:

    Example :

    sns.violinplot (x = ’ day’ , y = ’total_bill’ , data = df)

    sns .swarmplot (x = ’day’ , y = ’total_bill’ , data = df, color = ’black’ )

    Exit:

    Factorplot

    He is the most common of all these plots and provides an option called a view to select the type of plot we want, thus saving us from having to write these plots separately. Parameter type can be bar, violin, swarm, etc.

    Syntax :

     sns.factorplot ([x, y, hue, data, row, col,…]) 

    Example :

    sns.factorplot (x = ’day’ , y = ’total_bill’ , data = df, kind = ’bar’ )

    Exit:

    Shop

    Learn programming in R: courses

    $

    Best Python online courses for 2022

    $

    Best laptop for Fortnite

    $

    Best laptop for Excel

    $

    Best laptop for Solidworks

    $

    Best laptop for Roblox

    $

    Best computer for crypto mining

    $

    Best laptop for Sims 4

    $

    Latest questions

    NUMPYNUMPY

    Common xlabel/ylabel for matplotlib subplots

    12 answers

    NUMPYNUMPY

    How to specify multiple return types using type-hints

    12 answers

    NUMPYNUMPY

    Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

    12 answers

    NUMPYNUMPY

    Flake8: Ignore specific warning for entire file

    12 answers

    NUMPYNUMPY

    glob exclude pattern

    12 answers

    NUMPYNUMPY

    How to avoid HTTP error 429 (Too Many Requests) python

    12 answers

    NUMPYNUMPY

    Python CSV error: line contains NULL byte

    12 answers

    NUMPYNUMPY

    csv.Error: iterator should return strings, not bytes

    12 answers

    News


    Wiki

    Python | How to copy data from one Excel sheet to another

    Common xlabel/ylabel for matplotlib subplots

    Check if one list is a subset of another in Python

    sin

    How to specify multiple return types using type-hints

    exp

    Printing words vertically in Python

    exp

    Python Extract words from a given string

    Cyclic redundancy check in Python

    Finding mean, median, mode in Python without libraries

    cos

    Python add suffix / add prefix to strings in a list

    Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

    Python - Move item to the end of the list

    Python - Print list vertically