WEI WEI: Hi there.
Welcome back to our video series of "Building recommendation systems with TensorFlow." My name is Wei, and Im a developer advocate at Google.
In our last video, we gave you an overview of recommendation systems and introduced several cool open-source projects from Google to help you build powerful recommenders.
In this video, well be covering content-based filtering and collaborative filtering.
They are traditional recommendation models but are important concepts in recommendation systems literature and will help pave the foundation for more advanced models that well be discussing in future episodes.
So there are many traditional approaches used to build recommendation systems.
One common approach is content-based filtering.
Content-based filtering uses item features to recommend other items similar to what a user likes based on previous actions or explicit feedback.
For example, here were illustrating four apps that have different features.
Each row represents an app, and each column represents a feature.
Some apps are educational or science-related, some are relevant to health or health care.
Some are simply time wasters.
When a user installs a health app, we can recommend other health-related apps to that user, because they are similar to the installed health app.
Another common approach is collaborative filtering.
One limitation with content-based filtering is that it only leverages item similarities.
What if we can use similarities between users and items simultaneously to provide recommendations? This would allow for serendipitous recommendations, namely recommending an item to user A based on the interests of a similar user B.
This is what collaborative filtering is able to do, while item-based filtering is not.
Here we are illustrating a feedback matrix of four users and five movies.
Each row represents a user, and each column represents a movie.
The green checkmark means that a user has watched a particular movie.
We consider this an implicit feedback.
In contrast, if a user gives a rating on the movie, that would be an explicit feedback.
So as you can see here, the user in the first row has watched the three movies "Harry Potter," "Shrek," and "The Dark Knight Rises." Now for the user in the third row, she has also watched "Harry Potter" and "Shrek." So it may make sense to recommend "The Dark Knight Rises" to her, since the first user had similar preference to her.
So thats the idea of collaborative filtering.
But how do we do this in practice? Lets say we can assign a value between minus 1 to 1 to each user, indicating their interest level for childrens movies.
Minus 1 means highest level of interest for childrens movies, and 1 means no interest at all.
In this case, user number 3 likes childrens movies a lot, and user number 4 doesnt like childrens movies at all.
We can also assign a value between minus 1 to 1 to each movie.
Minus 1 means a movie is highly suitable for children, and 1 means its not for children at all.
Now we can see "Shrek" is really a great movie for children.
Now this value has become embedded for users and movies, and the product of user embedding and movie embedding should be higher for movies that we expect other users to like.
In this example, we hand-engineer these embeddings, and these embeddings are one-dimensional.
Now, we can say we have another dimension to represent the users in the movies.
Lets assign another value between minus 1 to 1 for each user, indicating their interest level for blockbuster movies.
Similarly, we assign a value between minus 1 to 1 to each movie, indicating whether it is blockbuster or not.
Now, we have hand-engineered a second dimension of embeddings.
We can go on and add more dimensions if we want.
In practice, these embeddings tend to be of much higher dimensions, but we can learn those embeddings automatically, which is the beauty of collaborative filtering models.
For the sake of easier visualization, were sticking to two dimensions.
Here were illustrating 2D embeddings for the users and movies on the right.
Our goal is to make sure that we can learn these embeddings so that the predictive feedback matrix is as close to the ground truth feedback matrix as possible.
Here we denote the user embeddings as U and item embeddings as V. The product of U and V is A, which is a predictive feedback matrix.
For example, if we take the first row of U, 1, 0.1, and the first column of V, 0.9, 0.2, and compute the dot product, it gives 0.88, which is the top left most element in the predictive feedback matrix.
So our optimization objective then becomes minimizing the summation of the squared difference between the feedback label and the predictive feedback, as you can see in the mathematical form in blue.
We can solve this using either Stochastic Gradient Descent, SGD, or Weighted Alternating Least Squares, WALS.
SGD, Im sure you have heard about it when you train your neural networks.
SGD is a generic message, while WALS is specific to this problem.
The idea of WALS is that for each iteration, we alternate between fixing U and solving for V, and then fixing V and solving for U.
We wont go into the mathematical details, but I should point out that SGD and WALS each have their own advantages and disadvantages.
For example, WALS usually converges much faster than SGD, while SGD is more flexible and can handle other loss functions.
But so far, we only cared about observed items.
What about the unobserved ones? So observed only matrix factorization is not good.
Because if you set the embeddings to all 1s, you have minimized the objective function, which is clearly not what we want.
So we need to take into account of the unobserved entries.
There are two approaches to handle this.
First, we can treat all unobserved entries as 0 and then solve it using SVD, Singular Value Decomposition.
We wont be reviewing linear algebra here, but you should know that SVD is not very good at this, because the A matrix tends to be very sparse in practice.
So the SVD solution tends to have poor generalization capabilities.
A better approach is weighted matrix factorization.
In this case, we still treat unobserved entries as 0.
But we scale the unobserved part of the objective function, highlighted in orange, so that its not overweighted.
As you can see, the weight w0 is now a hyperparameter you need to tune.
Now to sum up, today we first introduced content-based filtering and then covered collaborative filtering quite a bit.
I have listed out a few links of documentation and code implementation of collaborative filtering models based on TensorFlow.
These implementations are using TensorFlow Core API.
In our next video, well be introducing you to TensorFlow recommenders, which makes it a lot easier to build recommendation models.
See you next time.