|
Once we have determined the site's crawl budget , i.e. how much work the Google bot does on our site and how often it visits us, we try to understand if the site actually has a crawl and therefore crawl budget problem .To quickly determine if your site has a crawl budget issue, follow these steps:determines the number of pages sent in sitemap (for example for our site there are approximately 650 pages in sitemap)determines the number of URLs crawled daily (as we saw before in our case it is approximately 512 pages / day)divide the number of pages in sitemap by the value of pages scanned , in our case it is 1.25. With this parameter we therefore have a value that tells us how often the Google bot returns to visit the same page on average.
if the value is high (>5), you may have a scanning problem!In our case Special Data we have seen that in fact the Google crawler is able to scan the entire site on average in a couple of days at most, so no particular critical issues are identified.Having a value of > 10 could be different, this would mean that on average each new page (or change) on the site could take up to 10 days to be detected.Factors that influence the Crawl BudgetThe errors that negatively impact site scanning and therefore the Crawl Budget are essentially of the following type:Hacked pagesIt goes without saying that websites containing hacked content and possible threats to user security are obviously penalized and Google does not want to waste time crawling and indexing these pages. We always recommend updating websites by keeping the CMS and related plugins always up-to-date!Infinite spaces and proxiesWhen Googlebot crawls the web, it often finds what is usually referred to as “infinite space” .

This is a very large number of links that usually provide little or no new content for Googlebot to index. If this happens on your site, crawling those URLs may use unnecessary bandwidth and may prevent Googlebot from fully indexing the actual content on your site, wasting resources on low-value content.The classic example of “infinite space” is a calendar with links to the following months. Googlebot could continue to follow those “Next Month” links forever, still reaching pages of little value and content.Another common scenario is filters on ecommerce websites that allow you to view the same products in multiple ways. An online clothing site might allow you to select and filter clothing items by category, price, color, brand, style, etc. The number of possible filter combinations can increase exponentially . Using filters that create dynamic pages in this way can produce thousands of URLs, all capable of finding a subset of the items sold. This may be convenient for your users, but it's not so useful for Googlebot, which only wants to find everything – once!
|
|