It is estimated that Google holds over 65 percent of the global search engine market, making it the world’s most popular search engine. Click here to see Google’s market share by country.
Argentina | 89.00% | Jan 2008 |
Australia | 87.81% | Jun 2008 |
Austria | 88.00% | Jan 2008 |
Belgium | 96.00% | Mar 2009 |
Brazil | 89.00% | Jan 2008 |
Bulgaria | 80.00% | Dec 2007 |
Canada | 78.00% | Jan 2008 |
Chile | 93.00% | Jan 2008 |
China | 26.60% | Oct 2008 |
Colombia | 91.00% | Jan 2008 |
Czech Republic | 34.50% | Mar 2009 |
Denmark | 92.00% | Jan 2008 |
Estonia | 53.37% | Jul 2008 |
Finland | 92.00% | Jan 2008 |
France | 91.23% | Feb 2009 |
Germany | 93.00% | Mar 2008 |
Hong Kong | 26.00% | Jan 2008 |
Hungary | 96.00% | Aug 2008 |
Iceland | 51.00% | Dec 2007 |
India | 81.40% | Aug 2008 |
Ireland | 76.00% | Jan 2008 |
Israel | 80.00% | 2007 |
Italy | 90.00% | Feb 2009 |
Japan | 38.20% | Jan 2009 |
Korea, South | 3.00% | 2009 |
Latvia | 97.95% | July 2008 |
Lithuania | 98.18% | Aug 2008 |
Malaysia | 51.00% | Jan 2008 |
Mexico | 88.00% | Jan 2008 |
Netherlands | 95.00% | Dec 2008 |
New Zealand | 72.00% | Jan 2008 |
Norway | 81.00% | Jan 2008 |
Poland | 95.00% | Q4 2008 |
Portugal | 94.00% | Jan 2008 |
Puerto Rico | 57.00% | Jan 2008 |
Romania | 95.21% | Mar 2009 |
Russia | 32.00% | Jan 2008 |
Singapore | 57.00% | Jan 2008 |
Slovakia | 75.60% | Dec 2007 |
Spain | 93.00% | Jan 2008 |
Sweden | 80.00% | Jan 2008 |
Switzerland | 93.00% | Jan 2008 |
Taiwan | 18.00% | Jan 2008 |
Ukraine | 72.42% | Feb 2009 |
United Kingdom | 90.39% | Dec 2008 |
United States | 63.30% | Feb 2009 |
United States | 72.11% | Feb 2009 |
Venezuela | 93.00% | Jan 2008 |
comScore
Google Sites led the U.S. explicit core search market in August with 65.4 percent market share, followed by Yahoo! Sites with 17.4 percent (up 0.3 percentage points), and Microsoft sites with 11.1 percent (up 0.1 percentage points). Ask Network captured 3.8 percent of explicit core searches, followed by AOL LLC with 2.3 percent.
comScore Explicit Core Search Share Report*
August 2010 vs. July 2010
Total U.S. – Home/Work/University Locations
Source: comScore qSearch
*“Explicit Core Search” excludes contextually driven searches that do not reflect specific user intent to interact with the search results.
Nearly 15.7 billion explicit core searches were conducted in August. Google Sites ranked first with 10.3 billion searches, followed by Yahoo! Sites in second with 2.7 billion (up 3 percent) and Microsoft Sites in third with 1.7 billion (up 2 percent). Ask Network accounted for 598 million explicit core searches (up 2 percent), followed by AOL LLC Network with 366 million.
comScore Explicit Core Search Query Report
August 2010 vs. July 2010
Total U.S. – Home/Work/University Locations
Source: comScore qSearch
U.S. Total Core Search
Google Sites accounted for 60.5 percent of total core search queries conducted, followed by Yahoo! Sites with 21.0 percent and Microsoft Sites with 12.8 percent. Ask Network captured 3.5 percent of total search queries, followed by AOL LLC with 2.2 percent.
comScore Total Core Search Share Report*
August 2010 vs. July 2010
Total U.S. – Home/Work/University Locations
Source: comScore qSearch
*“Total Core Search” is based on the five major search engines, includingpartner searches, cross-channel searches, and contextual searches. Searchesfor mapping, local directory, and user-generated video sites that are not on thecore domain of the five search engines are not included in these numbers.
Americans conducted more than 16.9 billion total core search queries in August with Google Sites leading with 10.3 billion searches, followed by Yahoo! Sites with 3.6 billion and Microsoft Sites with 2.2 billion.
comScore Total Core Search Query Report
August 2010 vs. July 2010
Total U.S. – Home/Work/University Locations
Source: comScore qSearch
A Note about September 2010 qSearch Reporting
Google Instant Search’s introduction will not affect comScore’s ability to measure search activity consistently, but will introduce a new dynamic to our data-collection methodology, that we have addressed. Microsoft’s powering of specific channels of search activity within Yahoo! will not impact comScore’s ability to report qSearch data for September 2010.
It was the third most visited website in the United States in July 2010, with over 10.2 billion searches conducted. A server farm estimated to have over 100,000 servers supports websites localized in 117 different languages and operating in over 150 countries.
Founders Larry Page and Sergey Brin began operating Google at Stanford University in 1996, when both were students. Their startup venture capital was $100,000 in 1998, and their venture capital was $25 million in 1999. Over 99 percent of their revenue in 2009 came from their paid advertising program AdWords
Google makes a profit by auctioning keywords with AdWords advertisements displayed alongside the search results after it makes an inroad into the marketplace with its innovations in search engine technology. Researchers have discovered that Google’s internal auction methodology, which has been so important to the company’s success, is far more innovative than auction experts once believed.
Google’s innovations are only now being matched by some of the other search engines. Yahoo! gives its top ranking to advertisers who pay the most per click, while Google gives its best position to advertisers who are likely to pay Google the most over time. Google estimates the likelihood that someone will actually click on the ad by multiplying the PPC by its estimate of the PPC.
One of the things that make Google’s keyword auction so unique is that you are often bidding more than you are paying to win.
PageRank measures the chance that a person will arrive at any particular page based on a random click on a link. Several iterations of the collection are required to make PageRank calculations as accurate as possible.
There is a 50 percent chance that something will happen when 0.5 probability is expressed as a numeric value between 0 and 1. Hence, a PageRank of 0.5 means there is a 50 percent chance that a person clicking on a random link will be directed to the webpage with the 0.5 PageRank.
Simplified Algorithm
How PageRank Works
Assume a small universe of four webpages: A, B, C, and D. The initial approximation of PageRank would be evenly divided between these four webpages. Hence, each webpage would begin with an estimated PageRank of 0.25.
In the original form of PageRank, initial values were simply 1. This meant that the sum of all pages was the total number of pages on the web. Later versions of PageRank (see the formulas below) would assume a probability distribution between 0 and 1. Here, a simple probability distribution will be used—hence the initial value of 0.25.
If webpages B, C, and D each only link to A, they would each confer 0.25 PageRank to A. All PageRank PR in this simplistic system would thus gather to A because all links would be pointing to A, resulting in a PageRank of 0.75.
Suppose that page B has a link to page C as well as to page A, while page D has links to all three pages. The value of the link-votes is divided among all the outbound links on a page. Thus, page B gives a vote worth 0.125 to page A and a vote worth 0.125 to page C. Only one third of D’s PageRank is counted for A’s PageRank (approximately 0.083).
In other words, the PageRank conferred by an outbound link is equal to the webpage’s own PageRank score divided by the normalized number of outbound links L() (it is assumed that links to specific URLs only count once per webpage).
In the general case, the PageRank value for any page u can be expressed as:
The PageRank value for a page u is dependent on the PageRank values for each page v out of the set Bu (this set contains all pages linking to page u), divided by the number L(v) of links from page v.
Damping Factor
The PageRank theory holds that even an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85.
The damping factor is subtracted from 1 (and in some variations of the algorithm, the result is divided by the number of webpages (N) in the collection), and this term is then added to the product of the damping factor and the sum of the incoming PageRank scores. That is,
So, any page’s PageRank is derived, in large part, from the PageRanks of other pages. The damping factor adjusts the derived value downward. The original paper, however, gave the following formula, which has led to some confusion:
The difference between them is that the PageRank values in the first formula sum to one, while in the second formula each PageRank gets multiplied by N and the sum becomes N.
Specifically, a page is weighted by the total number of pages when a random surfer reaches it in the latter formula. Therefore, PageRank represents what an individual surfer would expect if he visited a page randomly as often as the web has pages. The random surfer would reach that page twice on average if he restarted 100 times if the web had 100 pages with a PageRank value of 2. There are no fundamental differences between the two formulas. The PageRank that would have been calculated using the latter formula is the result of multiplying the PageRank calculated using the former formula by the number of webpages.
Each time Google crawls the Web and rebuilds its index, PageRank is recalculated. As Google’s collection of documents increases, its initial PageRank estimation for all pages decreases.
Based on random surfer model, the formula switches to a random page when the surfer gets bored after several clicks. By clicking a link, the random surfer will have a higher probability of landing on that page. Markov chains can be interpreted as Markov chains in which the transitions between states are equally probable, and are thus equivalent to pages.
The random surfing process ends when a page contains no hyperlinks to other pages, and it becomes a sink. The surfer continues surfing at random if it reaches a sink page.
A PageRank of zero is calculated by assuming that all other pages in the collection link out to the page with no outbound links. Their PageRank scores are therefore evenly distributed among all the other pages of that website. For fairness with pages that are not sinks, these transitions are added to all nodes in the web, with a residual probability of typically d = 0.85, based on how often an individual uses their browser’s bookmark feature.
So, the equation is as follows:
Where p1,p1,pN are the pages under consideration, M(pi) is the set of pages that link to pi, L(pj) is the number of outbound links on page pj, and N is the total number of pages.
The PageRank values are the entries of the dominant eigenvector of the modified adjacency matrix. This makes PageRank a particularly elegant metric: the eigenvector is
where R is the solution of the equation
Where the adjacency function e(Pi, Pj) is 0 if page Pjj does not link to pi and normalized such that, for each j
Each column of the matrix has an element of 1, so it is a stochastic matrix (see below for further details). In this way, this measure is similar to the eigenvector centrality measure that is used commonly in network analysis.
The eigenvectors of the PageRank matrix are readily approximated (only a few iterations are necessary) because of the large eigengap of the modified adjacency matrix.
The PageRank of a webpage can be calculated using Markov theory as the probability of being at that page after many clicks. It happens that this is equal to t*1, with t being the expected number of clicks (or jumps) required by the page to reach itself again.
As a consequence, it favors older pages, since even the best of new pages will not have a lot of links unless it is part of an existing site (such as Wikipedia, which is a densely connected set of pages). It allows users to see results sorted by PageRank within categories by using the Google Directory (which is itself a derivative of the Open Directory Project).
There is only one Google service in which PageRank determines display order directly: the Google Directory. PageRank determines the relevance score of the pages shown in search results in Google’s other search services (such as its primary Web search).
A link farm, or a scheme that artificially inflates PageRank, is known to be penalized by Google. The search engine began to penalize sites selling paid text links in December 2007. Among Google’s trade secrets is how it detects link farms and other PageRank manipulation tools.
Computation
Basically, PageRank can be determined iteratively or algebraically. Power iteration, or power method, can be considered a different approach to the iterative method. In both iterative and power methods, the same mathematical operations are performed.
Iterative
In the former case, at t = 0, an initial probability distribution is assumed, usually
At each time step, the computation, as detailed above, yields
or in matrix notation
Where Ri(t) = PR(pi; t) and 1 is the column vector of length N containing only ones.
The matrix M is defined as
Where A denotes the adjacency matrix of the graph and K is the diagonal matrix with the outdegrees in the diagonal.
The computation ends when for some small ε
i.e., when convergence is assumed.
Algebraic
In the latter case, for T → ∞ (i.e., in the steady state), the above equation (*) reads
The solution is given by
with the identity matrix I.
The solution exists and is unique for 0<d< 1. This can be seen by noting that M is by construction a stochastic matrix and hence has an eigenvalue equal to one because of the Perron-Frobenius theorem.
Power Method
If the matrix M is a transition probability, i.e., column-stochastic with no columns consisting of just zeros and R is a probability distribution (i.e., R = 1, ER = 1 where E is matrix of all ones), Eq. (**) is equivalent to
Hence, PageRank R is the principal eigenvector of . A fast and easy way to compute this is using the power method: starting with an arbitrary vector x(0), the operator M is applied in succession, i.e.,
until
Note that in Eq. (***) the matrix on the right-hand side in the parenthesis can be interpreted as
where P is an initial probability distribution. In the current case
Finally, if M has columns with only zero values, they should be replaced with the initial probability vector P. In other words,
where the matrix D is defined as
with
In this case, the above two computations using M only give the same PageRank if their results are normalized:
Efficiency
Depending on the framework used to perform the computation, the exact implementation of the methods, and the required accuracy of the result, the computation time of these methods can vary greatly.
Google Toolbar
The Google Toolbar’s PageRank feature displays a visited page’s PageRank as a whole number between 0 and 10. The most popular websites have a PageRank of 10. The least have a PageRank of 0. Google has not disclosed the precise method for determining a Toolbar PageRank value. The displayed value is not the actual value Google uses, so it is only a rough guide. Toolbar PageRank is different from Google PageRank because the PageRank displayed in the toolbar is not 100 percent reflective of the way Google judges the value of a website.
The PageRank of a particular page is roughly based upon the quantity of inbound links as well as the PageRank of the pages providing the links. Other factors are also part of the algorithm, such as the size of a page, the number of changes and its up-to-datedness, the key texts in headlines, and the words of hyperlinked anchor texts.
The Google Toolbar’s PageRank is updated two times a year, so often shows out-of-date values.
SERP Rank
The search engine results page (SERP) is the actual result returned by a search engine in response to a keyword query. The SERP consists of a list of links to webpages with associated text snippets. The SERP rank of a webpage refers to the placement of the corresponding link on the SERP, where higher placement means higher SERP rank. The SERP rank of a webpage is not only a function of its PageRank, but depends on a relatively large and continuously adjusted set of factors (over 200), commonly referred to by internet marketers as “Google Love.” Search engine optimization (SEO) is aimed at achieving the highest possible SERP rank for a website or a set of webpages.
With the introduction of Google Places into the mainstream organic SERP, PageRank plays little to no role in ranking a business in the local business results. While the theory of citations is still computed in their algorithm, PageRank is not a factor, since Google ranks business listings and not webpages.
Google Directory PageRank
The Google Directory PageRank is an eight-unit measurement. These values can be viewed in the Google Directory. Unlike the Google Toolbar, which shows the PageRank value by a mouse over of the green bar, the Google Directory does not show the PageRank as a numeric value, but only as a green bar.
False or Spoofed PageRank
While the PageRank shown in the toolbar is considered to be derived from an accurate PageRank value (at some time prior to the time of publication by Google) for most sites, it must be noted that this value was at one time easily manipulated. A previous flaw was that any low PageRank page that was redirected via an HTTP 302 response or a “refresh” metatag to a high PageRank page caused the lower PageRank page to acquire the PageRank of the destination page. In theory, a new PR0 page with no incoming links could have been redirected to the Google home page, which is a PR10, and then the PR of the new page would be upgraded to a PR10. This spoofing technique, also known as 302 Google jacking, was a known failing or bug in the system. Any page’s PageRank could have been spoofed to a higher or lower number of the webmaster’s choice, and only Google had access to the real PageRank of the page. Spoofing is generally detected by running a Google search for a URL with questionable PageRank, as the results will display the URL of an entirely different site (the one redirected to) in its results.
Manipulating PageRank
For search engine optimization purposes, some companies offer to sell high PageRank links to webmasters. As links from higher-PR pages are believed to be more valuable, they tend to be more expensive. It can be an effective and viable marketing strategy to buy link advertisements on content pages of quality and relevant sites to drive traffic and increase a webmaster’s link popularity. However, Google has publicly warned webmasters that if they are or were discovered to be selling links for the purpose of conferring PageRank and reputation, their links would be devalued (ignored in the calculation of other pages’ PageRanks). The practice of buying and selling links is intensely debated across the webmaster community. Google advises webmasters to use the nofollow HTML attribute value on sponsored links.
The Intentional Surfer Model
The original PageRank algorithm reflects the so-called random surfer model, meaning that the PageRank of a particular page is derived from the theoretical probability of visiting that page when clicking on links at random. However, real users do not randomly surf the web, but follow links according to their interests and intention. A page-ranking model that reflects the importance of a particular page as a function of how many actual visits it receives by real users is called the intentional surfer model.
The Google Toolbar sends information to Google for every page visited, and thereby provides a basis for computing PageRank based on the intentional surfer model. The introduction of the nofollow attribute by Google to combat spamdexing has the side effect that webmasters commonly use it on outgoing links to increase their own PageRank. This causes a loss of actual links for the web crawlers to follow, thereby making the original PageRank algorithm based on the random surfer model potentially unreliable. Using information about users’ browsing habits provided by the Google Toolbar partly compensates for the loss of information caused by the nofollow attribute. The SERP rank of a page, which determines a page’s actual placement in the search results, is based on a combination of the random surfer model (PageRank) and the intentional surfer model (browsing habits), in addition to other factors.
Other Uses
A version of PageRank has recently been proposed as a replacement for the traditional Institute for Scientific Information (ISI) impact factor and implemented at eigenfactor.org. Instead of merely counting total citations to a journal, the “importance” of each citation is determined in a PageRank fashion.
A similar new use of PageRank is to rank academic doctoral programs based on their records of placing their graduates in faculty positions. In PageRank terms, academic departments link to each other by hiring their faculty from each other (and from themselves).
PageRank has been used to rank spaces or streets to predict how many people (pedestrians or vehicles) come to the individual spaces or streets. In lexical semantics, it has been used to perform word sense disambiguation and to automatically rank WordNet synsets according to how strongly they possess a given semantic property, such as positivity or negativity.
A dynamic weighting method similar to PageRank has been used to generate customized reading lists based on the link structure of Wikipedia.
A web crawler may use PageRank as one of a number of importance metrics it uses to determine which URL to visit during a crawl of the web. One of the early working papers that was used in the creation of Google is efficient crawling through URL ordering, which discusses the use of a number of different importance metrics to determine how deeply and how much of a site Google will crawl. PageRank is presented as one of a number of these importance metrics, though there are others listed, such as the number of inbound and outbound links for a URL and the distance from the root directory on a site to the URL.
The PageRank may also be used as a methodology to measure the apparent impact of a community like the blogosphere on the overall web itself. This approach therefore uses the PageRank to measure the distribution of attention in reflection of the scale-free network paradigm.
In any ecosystem, a modified version of PageRank may be used to determine species that are essential to the continuing health of the environment.
Google’s rel=“nofollow” Option
In early 2005, Google implemented a new value, “nofollow,” for the rel attribute of HTML link and anchor elements, so that website developers and bloggers could make links that Google will not consider for the purposes of PageRank—they are links that no longer constitute a “vote” in the PageRank system. The nofollow relationship was added in an attempt to help combat spamdexing.
As an example, people could previously create many message-board posts with links to their website to artificially inflate their PageRank. With the nofollow value, message-board administrators can modify their code to automatically insert rel=“nofollow” to all hyperlinks in posts, thus preventing PageRank from being affected by those particular posts. This method of avoidance, however, has various drawbacks, such as reducing the link value of legitimate comments.
In an effort to manually control the flow of PageRank among pages within a website, many webmasters practice what is known as PageRank sculpting, which is the act of strategically placing the nofollow attribute on certain internal links of a website in order to funnel PageRank towards those pages the webmaster deemed most important. This tactic has been used since the inception of the nofollow attribute, but the technique has been thought by many to have lost its effectiveness.
The PageRank algebra and content in this section of the book was taken from Wikipedia and is one of the few published examples of how PageRank works.
*“Total Core Search” is based on the five major search engines, includingpartner searches, cross-channel searches, and contextual searches. Searchesfor mapping, local directory, and user-generated video sites that are not on thecore domain of the five search engines are not included in these numbers.
*“Explicit Core Search” excludes contextually driven searches that do not reflect specific user intent to interact with the search results.