Implementation of the Apriori algorithm in Python

File handling | NumPy | Python Methods and Functions

Apriori Algorithm — it is a machine learning algorithm that is used to understand the structured relationships between different elements. The most notable practical application of the — recommend products based on products already in the user`s cart.  Walmart has used this algorithm especially extensively when offering products to its users.

Dataset: product data
Implementation of the algorithm in Python:
Step 1: Import the required libraries

import numpy as np

import pandas as pd

from mlxtend.frequent_patterns import ariori, association_rules

Step 2: Download and explore data

# Change workplace to file location
cd C: UsersDevDesktopKaggleApriori Algorithm

 
# Loading data

data = pd.read_excel ( `Online_Retail.xlsx` )

data.head ()

# Exploring data columns
data.columns

# Explore different transaction regions
data .Country.unique ()

Step 3: Clean up the data

# Remove extra spaces in the description

data [ ` Description` ] = data [ `Description` ]. str . strip ()

 
# Delete lines without account number

data.dropna (axis  = 0 , subset = [ ` InvoiceNo` ], inplace = True )

data [ `InvoiceNo` ] = data [ `InvoiceNo` ]. astype ( ` str` )

 
# Discard all transactions that were made on credit

data = data [~ data [ `InvoiceNo` ]. str . contains ( < code class = "string"> `C` )]

Step 4: Split data by transaction region

# Deals in France

basket_France = (data [data [ `Country` ] = = " France " ]

  . groupby ([ `InvoiceNo` , `Description` ]) [ ` Quantity` ]

  .   sum (). unstack (). .reset_index (). fillna ( 0 )

. set_index ( `InvoiceNo` ))

  
# Transactions in the United Kingdom

basket_UK = (data [data [ `Country` ] = = "United Kingdom" ]

. groupby ([ `InvoiceNo ` , ` Description` ]) [ `Quantity ` ]

  . sum (). unstack (). .reset_index (). fillna ( 0 )

. set_index ( `InvoiceNo` ))

  
# Transactions in Portugal

basket_Por = (data [data [ `Country` ] = = "Portugal" ]

. groupby ([ `InvoiceNo` ,  `Description` ]) [ ` Quantity` ]

. sum (). unstack (). .reset_index (). fillna ( 0 )

. set_index ( `InvoiceNo` ))

 

basket_Sweden = (data [data [ `Country` ] = = "Sweden" ]

. groupby ([ `InvoiceNo` , `Description` ]) [ ` Quantity` ]

. sum (). unstack (). .reset_index (). fillna ( 0 )

  . set_index ( `InvoiceNo` ))

Step 5: Hot encoding the data

# Define a hot coding function to make the data fit
# for interested libraries

def hot_encode ( x):

  if (x & lt; = 0 ):

return 0

if (x & gt; = 1 ):

return 1

 
# Encoding datasets

basket_encoded = basket_France.applymap (hot_encode)

basket_France = basket_encoded

  

basket_encoded = basket_UK .applymap (hot_encode)

basket_UK = basket_encoded

 

basket_encoded = basket_Por.applymap (hot_encode)

basket_Por = basket_encoded

 

basket_encoded = basket_Sweden.applymap (hot_encode)

basket_Sweden = basket_encoded

Step 6: Model and Analyze Results

a) France:

# Model building

frq_items = apriori (basket_France, min_support = 0.05 , use_colnames = True )

 
# Collecting the output rules in the data frame

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules = rules.sort_values ​​([ `confidence ` , ` lift` ], ascending = [ False , False ])

print (rules.head ())

From the above output, you can see that paper cups, paper and plates are bought together in France. This is because the French have a culture of meeting friends and family at least once a week. In addition, since the French government has banned the use of plastic in the country, people are forced to buy paper-based alternatives.

b) UK:

frq_items = apriori (basket_UK, min_support = 0.01 , use_colnames = True )

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules = rules.sort_values ​​([ `confidence` , `lift` ], ascending = [ False , False ])

print (rules.head ())

If the rules of British transactions are analyzed a little more deeply, it can be seen that the British are buying multi-colored tea plates together. The reason for this may be that usually the British are very fond of tea and often collect colorful tea plates for different occasions.

c) Portugal:

frq_items = apriori (basket_Por, min_support = 0.05 , use_colnames = True )

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules =   rules.sort_values ​​([ `confidence` , ` lift` ], ascending = [ False , False ])

print (rules.head ())

Analyzing the association rules for Portuguese transactions, we see that Knick Knack Tins and colored pencils. These two products usually belong to a child going to elementary school. These two products are required by children at school to carry their lunch and for creative work, respectively, and therefore it makes sense to pair them.

d) Sweden:

frq_items = apriori (basket_Sweden, min_support = 0.05 , use_colnames = True )

rules = association_rules (frq_items, metric = "lift" , min_threshold = 1 )

rules < / code> = rules.sort_values ​​([ `confidence` , `lift` ], ascending = [ False , False ])

print (rules .head ())

Analyzing the above rules, it turns out that the cutlery of boys and girls is connected together. This makes practical sense because when a parent goes to the store to buy cutlery for their children, he / she would like the item to be slightly adjusted according to the child`s wishes.





Get Solution for free from DataCamp guru