PageRank (PR) — it is the algorithm used by Google search to rank sites in search results. PageRank was named after Larry Page, one of the founders of Google. PageRank — it is a way of measuring the importance of the pages on a site. According to Google:
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
This is not the only algorithm used by Google to organize search results, but it is the first algorithm used by the company. and it is the most famous.
The above measure of centrality is not implemented for multigraphs.
PageRank Algorithm displays a probability distribution used to represent the probability that a person accidentally clicks on links to land on any particular page. PageRank can be calculated for collections of documents of any size. Several research papers suggest that the distribution is evenly distributed among all documents in a collection at the beginning of the computational process. PageRank calculations require several passes, called “iterations”, through the collection to adjust the estimated PageRank values to more closely reflect the theoretical true value.
Assume that the small universe consists of four web pages: A, B, C, and D. Links from a page to itself or multiple outbound links from one page to another page are ignored. PageRank is initialized to the same value for all pages. In the original form of PageRank, the sum of PageRank across all pages was the total number of pages on the Internet at that time, so each page in this example would have an initial value of 1. However, later versions of PageRank and For the remainder of this section, assume a probability distribution between 0 and 1 Therefore, the starting value for each page in this example is 0.25.
The PageRank carried over from this page to its outbound link targets in the next iteration is split equally among all outbound links.
If the only links in the system were from pages B, C and D to A, each link would pass 0.25 PageRank to A on the next iteration, for a total of 0.75.
Suppose instead that page B contains a link to pages C and A, page C contains a link to page A, and page D contains links on all three pages. So in the first iteration, page B will pass half of its existing value, or 0.125, to page A, and the other half, or 0.125, to page C. Page C will pass all of its existing value, 0.25, only page, to which he links, A. Since D had three outbound links, it would move one third of its existing value, or approximately 0.083, to A. At the end of this iteration, page A will have a PageRank of approximately 0.458.
In other words, the PageRank assigned to an outbound link is equal to the document's own PageRank divided by the number of outbound links L ().
In general, the PageRank value for any page u can be expressed as:
i.e. the PageRank value for page u depends on the PageRank values for each page v contained in the set Bu (the set containing all pages that link to page u) divided by the number L (v) of links from page v. The algorithm includes a damping factor to calculate the page rank. This is similar to the income tax that the government extracts from it, despite the fact that it pays it itself.
Below is the code to calculate the page rank.
The above code is function that was implemented in the networkx library.
To implement the above in networkx, you will need to do the following:
Below is the output you get on IDLE after the required settings.