Change language

Implementation of the Apriori algorithm in Python

| |

Apriori Algorithm — it is a machine learning algorithm that is used to understand the structured relationships between different elements. The most notable practical application of the — recommend products based on products already in the user’s cart.  Walmart has used this algorithm especially extensively when offering products to its users.

Dataset: product data
Implementation of the algorithm in Python:
Step 1: Import the required libraries

import numpy as np

import pandas as pd

from mlxtend.frequent_patterns import ariori, association_rules

Step 2: Download and explore data

# Change workplace to file location
cd C: UsersDevDesktopKaggleApriori Algorithm

 
# Loading data

data = pd.read_excel ( ’Online_Retail.xlsx’ )

data.head ()

# Exploring data columns
data.columns

# Explore different transaction regions
data .Country.unique ()

Step 3: Clean up the data

# Remove extra spaces in the description

data [ ’ Description’ ] = data [ ’Description’ ]. str . strip ()

 
# Delete lines without account number

data.dropna (axis  = 0 , subset = [ ’ InvoiceNo’ ], inplace = True )

data [ ’InvoiceNo’ ] = data [ ’InvoiceNo’ ]. astype ( ’ str’ )

 
# Discard all transactions that were made on credit

data = data [~ data [ ’InvoiceNo’ ]. str . contains ( < code class = "string"> ’C’ )]

Step 4: Split data by transaction region

# Deals in France

basket_France = (data [data [ ’Country’ ] = = " France " ]

  . groupby ([ ’InvoiceNo’ , ’Description’ ]) [ ’ Quantity’ ]

  .   sum (). unstack (). .reset_index (). fillna ( 0 )

. set_index ( ’InvoiceNo’ ))

  
# Transactions in the United Kingdom

basket_UK = (data [data [ ’Country’ ] = = "United Kingdom" ]

. groupby ([ ’InvoiceNo ’ , ’ Description’ ]) [ ’Quantity ’ ]

  . sum (). unstack (). .reset_index (). fillna ( 0 )

. set_index ( ’InvoiceNo’ ))

  
# Transactions in Portugal

basket_Por = (data [data [ ’Country’ ] = = "Portugal" ]

. groupby ([ ’InvoiceNo’ ,  ’Description’ ]) [ ’ Quantity’ ]

. sum (). unstack (). .reset_index (). fillna ( 0 )

. set_index ( ’InvoiceNo’ ))

 

basket_Sweden = (data [data [ ’Country’ ] = = "Sweden" ]

. groupby ([ ’InvoiceNo’ , ’Description’ ]) [ ’ Quantity’ ]

. sum (). unstack (). .reset_index (). fillna ( 0 )

  . set_index ( ’InvoiceNo’ ))

Step 5: Hot encoding the data

# Define a hot coding function to make the data fit
# for interested libraries

def hot_encode ( x):

  if (x " = 0 ):

return 0

if (x" = 1 ):

return 1

 
# Encoding datasets

basket_encoded = basket_France.applymap (hot_encode)

basket_France = basket_encoded

  

basket_encoded = basket_UK .applymap (hot_encode)

basket_UK = basket_encoded

 

basket_encoded = basket_Por.applymap (hot_encode)

basket_Por = basket_encoded

 

basket_encoded = basket_Sweden.applymap (hot_encode)

basket_Sweden = basket_encoded

Step 6: Model and Analyze Results

a) France:

# Model building

frq_items = apriori (basket_France, min_support = 0.05 , use_colnames = True )

 
# Collecting the output rules in the data frame

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules = rules.sort_values ​​([ ’confidence ’ , ’ lift’ ], ascending = [ False , False ])

print (rules.head ())

From the above output, you can see that paper cups, paper and plates are bought together in France. This is because the French have a culture of meeting friends and family at least once a week. In addition, since the French government has banned the use of plastic in the country, people are forced to buy paper-based alternatives.

b) UK:

frq_items = apriori (basket_UK, min_support = 0.01 , use_colnames = True )

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules = rules.sort_values ​​([ ’confidence’ , ’lift’ ], ascending = [ False , False ])

print (rules.head ())

If the rules of British transactions are analyzed a little more deeply, it can be seen that the British are buying multi-colored tea plates together. The reason for this may be that usually the British are very fond of tea and often collect colorful tea plates for different occasions.

c) Portugal:

frq_items = apriori (basket_Por, min_support = 0.05 , use_colnames = True )

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules =   rules.sort_values ​​([ ’confidence’ , ’ lift’ ], ascending = [ False , False ])

print (rules.head ())

Analyzing the association rules for Portuguese transactions, we see that Knick Knack Tins and colored pencils. These two products usually belong to a child going to elementary school. These two products are required by children at school to carry their lunch and for creative work, respectively, and therefore it makes sense to pair them.

d) Sweden:

frq_items = apriori (basket_Sweden, min_support = 0.05 , use_colnames = True )

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules < / code> = rules.sort_values ​​([ ’confidence’ , ’lift’ ], ascending = [ False , False ])

print (rules .head ())

Analyzing the above rules, it turns out that the cutlery of boys and girls is connected together. This makes practical sense because when a parent goes to the store to buy cutlery for their children, he / she would like the item to be slightly adjusted according to the child’s wishes.

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically