• Sun. Feb 28th, 2021

so the question I have is how do we differentiate what can you give some insights on Googlebot which crawls the organic and the odd part yeah that's a great question so think about it like this we have a large number of machines who can go out and fetch content and you always want to advertise accurately what you are whether you're fetching on behalf of the index or whether you're you know ad spot and your question on behalf of that so usually it's safe to think of those as separate mechanisms so Googlebot will fetch pages they might have their own IP range ad spot will fetch pages and I'm not sure whether they have distinct IP ranges but you can always verify Googlebot by looking up its IP address and then doing sort of a forward and reverse lookup to verify that is Googlebot there is one interesting corner case which is imagine if say Googlebot has just fetched a page and then you are also an advertiser and so you've also bought a link you know in AdWords should add spot then go fetch that page immediately – and the answer is well that's kind of a lot of redundancy you know if you just fetch the page you don't need to refit and so we actually do have a crawling cache and so when for example ad spot fetches something his ad spot that page might get cache and then if Googlebot wants to fetch that page it might hit that cache and say oh five minutes ago we already fetched that page okay and so we don't it doesn't matter as much to us whether it's ad spot or Google bot that fetch the page if it's in that cache then we say ah okay I already just got a copy of that let's reuse it now that said I think the volume of pages the Googlebot fetches is really big compared to the volume of ads because there's like you know we've seen sixty trillion URLs so so you're far more likely to see a fetch come from Googlebot than from that spot but we try to advertise that clearly in the user agent so you can always do something depending on you know if it's you can tell that it's coming from Google basically so it's a tool different but it's it's two different user agents okay and it's two different mechanisms like we usually don't even share all they're in a different building we're in a different building um but I think some of that infrastructure can be shared so not just the cache but you know you wouldn't want to write a completely different bot you know for you know a different sector of the company so think of it more like it's a crawling service where one person can say to to the Googlebot that's sort of crawling around on the web and say hey we need to fetch a copy of this page and then we can surface that ability to other properties but at all times you need to be able to well at least certainly for the web crawl you need to be able to tell that that's actually Googlebot visiting there's a few weird corner cases like when we check for cloaking then we might you know come from some different IDs you know because of you if you always know where Googlebot is coming from and you're doing something spam your deceptive then you could make for a bad user experience so we do certainly have a few IPS where we can check for abuse or deception you know that's that's coming from a slightly different area but for the vast majority of cases you know the billions and billions of URLs that were crawling every day that's well advertised it's coming from a well known IP address range it normally doesn't change all that often and we're always thinking about you know are there ways that we can advertise those IP so that people can encode that and and operate on that without too much trouble good thank you

Leave a Reply

Your email address will not be published. Required fields are marked *