IT Community - Software Programming, Web Development and Technical Support

Googlebot Auto Email

This is a discussion on Googlebot Auto Email within the PHP Programming forums, part of the Web Development category; Auto Email on Googlebot Detected Crawling Page Simple script that you can insert in a .php page that will email ...


Go Back   IT Community - Software Programming, Web Development and Technical Support > Web Development > PHP Programming

Register FAQ Members List Calendar Mark Forums Read
  #1 (permalink)  
Old 11-15-2007, 08:43 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Googlebot Auto Email

Auto Email on Googlebot Detected Crawling Page

Simple script that you can insert in a .php page that will email you when Google is indexing your site. You will need to change the values in the script for your own site and contact details. Simply cut and paste from the following box. Dont forget the opening and closing < ? PHP and ? > tags (without the spaces)
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 11-15-2007, 08:43 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

This script is completely free for you to use and modify however you see fit, but if you make any cool changes, please share them with us

<?php

if(eregi("googlebot",$HTTP_USER_AGENT))
{
mail("you at youremail.com", "Googlebot detected on yourdomainname.com", "Google has crawled yourdomainname.com");
}

?>
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 11-15-2007, 08:44 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Advanced Version:

This is a much better version that will automatically fill in the Domain, the actual Page (including any query strings), as well as tell you the Date and Time the page was crawled. Very useful if you want to add this script to many pages.

<?php
if(eregi("googlebot",$HTTP_USER_AGENT))
{
if ($QUERY_STRING != "")
{
$url = "http://".$SERVER_NAME.$PHP_SELF.'?'.$QUERY_STRING;
} else {
$url = "http://".$SERVER_NAME.$PHP_SELF;
}
$today = date("F j, Y, g:i a");
mail("you at youremail.com", "Googlebot detected on http://$SERVER_NAME", "$today - Google crawled $url");
}
?>
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 11-15-2007, 10:15 PM
Jeyaseelansarc Jeyaseelansarc is offline
D-Web Genius
 
Join Date: Mar 2007
Location: Chennai
Posts: 1,162
Jeyaseelansarc is on a distinguished road
Send a message via AIM to Jeyaseelansarc
Default Re: Googlebot Auto Email

Hi Sabari,
I think it is really fantastic to know the unknown concepts like this.
can i know more on Googlebot?

Because i am very new for this concepts
__________________
With,
J. Jeyaseelan

Everything Possible

Last edited by Jeyaseelansarc : 11-15-2007 at 10:18 PM.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 11-15-2007, 10:34 PM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Lightbulb Re: Googlebot Auto Email

sure Mr. Jeyaseelansarc i'll explain step by step in details about this topic Googlebot,

Googlebot is a name of the indexing robot of Google that scans the web from link to link for new pages. You may know if Googlebot came to visit your website by looking at the log files of your server.
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 11-15-2007, 10:40 PM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

What is Googlebot?

"Googlebot" is the term Google uses for their web crawler. Essentially, Googlebot visits pages all over the internet, mainly by following links from existing pages, and creates the Google Index based on what it finds. Googlebot parses the HTML code that is the backbone of most web pages and stores what it finds in the index - which is then quickly and effectively searchable by the Google search engine.

Basically, when users enter a search term or phrase into the search box on Google's web site, what they are getting are not the direct results of the Googlebot's crawlings but recorded results. In other words, the results the user receives can be from a week or two earlier when Googlebot searched the internet for websites and their content.

There are two versions of Googlebot: deepbot and freshbot. These two variations of Googlebot are true to their namesakes. Deepbot attempts to index all there is to index, following every link it finds and indexing all content. Freshbot, on the other hand, is geared toward maintaining a fresh index of frequently-updated websites. Using these two variations on Googlebot allows Google to keep a fresh index of constantly-updated websites without delaying these results in order to do a complete crawl of the web. Freshbot runs a lot more often than deepbot.
__________________
Thanks & Regards
Sabari...

Last edited by Sabari : 11-15-2007 at 11:06 PM.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 11-16-2007, 12:04 AM
Jeyaseelansarc Jeyaseelansarc is offline
D-Web Genius
 
Join Date: Mar 2007
Location: Chennai
Posts: 1,162
Jeyaseelansarc is on a distinguished road
Send a message via AIM to Jeyaseelansarc
Default Re: Googlebot Auto Email

Hi,

I think this is very useful information for me.

i have another questions here
what is web crawler?
can i know more on this
__________________
With,
J. Jeyaseelan

Everything Possible
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 11-16-2007, 02:21 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Web Crawler
A web crawler is an automated program that accesses a web site and traverses through the site by following the links present on the pages. Known as a bot, robot, spider or Web Crawler.
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 11-16-2007, 02:22 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Spider
A spider is an automated program that accesses a web site and traverses through the site by following the links present on the pages. Known as a bot, robot, spider or Web Crawler.
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 11-16-2007, 02:22 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Bot
A bot is an automated program that accesses a web site and traverses through the site by following the links present on the pages. Known as a bot, robot, spider or Web Crawler.
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #11 (permalink)  
Old 11-16-2007, 02:23 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Robot
A robot is an automated program that accesses a web site and traverses through the site by following the links present on the pages. Known as a bot, robot, spider or Web Crawler.
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 11-16-2007, 02:24 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Spider
A spider is an automated program that accesses a web site and traverses through the site by following the links present on the pages. Known as a bot, robot, spider or Web Crawler.
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 11-16-2007, 02:27 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

The number of pages Googlebot crawls

The Googlebot activity reports in webmaster tools show you the number of pages of your site Googlebot has crawled over the last 90 days. We've seen some of you asking why this number might be higher than the total number of pages on your sites.



Googlebot crawls pages of your site based on a number of things including:

* pages it already knows about
* links from other web pages (within your site and on other sites)
* pages listed in your Sitemap file

More specifically, Googlebot doesn't access pages, it accesses URLs. And the same page can often be accessed via several URLs. Consider the home page of a site that can be accessed from the following four URLs:

* Example Web Page
* Example Web Page
* Example Web Page
* Example Web Page

Although all URLs lead to the same page, all four URLs may be used in links to the page. When Googlebot follows these links, a count of four is added to the activity report.

Many other scenarios can lead to multiple URLs for the same page. For instance, a page may have several named anchors, such as:

* http://www.example.com/mypage.html#heading1
* http://www.example.com/mypage.html#heading2
* http://www.example.com/mypage.html#heading3

And dynamically generated pages often can be reached by multiple URLs, such as:

* http://www.example.com/furniture?type=chair&brand=123
* http://www.example.com/hotbuys?type=chair&brand=123

As you can see, when you consider that each page on your site might have multiple URLs that lead to it, the number of URLs that Googlebot crawls can be considerably higher than the number of total pages for your site.

Of course, you (and we) only want one version of the URL to be returned in the search results. Not to worry -- this is exactly what happens. Our algorithms selects a version to include, and you can provide input on this selection process.
Attached Images
File Type: gif pages_final.gif (10.5 KB, 11 views)
__________________
Thanks & Regards
Sabari...

Last edited by Sabari : 11-16-2007 at 02:52 AM.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 11-16-2007, 02:33 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Redirect to the preferred version of the URL
You can do this using 301 (permanent) redirect. In the first example that shows four URLs that point to a site's home page, you may want to redirect index.html to Example Web Page. And you may want to redirect example.com to Example Web Page so that any URLs that begin with one version are redirected to the other version. Note that you can do this latter redirect with the Preferred Domain feature in webmaster tools. (If you also use a 301 redirect, make sure that this redirect matches what you set for the preferred domain.)
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 11-16-2007, 02:43 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Block the non-preferred versions of a URL with a robots.txt file
For dynamically generated pages, you may want to block the non-preferred version using pattern matching in your robots.txt file. (Note that not all search engines support pattern matching, so check the guidelines for each search engine bot you're interested in.) For instance, in the third example that shows two URLs that point to a page about the chairs available from brand 123, the "hotbuys" section rotates periodically and the content is always available from a primary and permanent location. If that case, you may want to index the first version, and block the "hotbuys" version. To do this, add the following to your robots.txt file:

User-agent: Googlebot
Disallow: /hotbuys?*

To ensure that this directive will actually block and allow what you intend, use the robots.txt analysis tool in webmaster tools. Just add this directive to the robots.txt section on that page, list the URLs you want to check in the "Test URLs" section and click the Check button. For this example, you'd see a result like this:



Don't worry about links to anchors, because while Googlebot will crawl each link, our algorithms will index the URL without the anchor.

And if you don't provide input such as that described above, our algorithms do a really good job of picking a version to show in the search results.
Attached Images
File Type: jpg robots.jpg (30.2 KB, 9 views)
__________________
Thanks & Regards
Sabari...

Last edited by Sabari : 11-16-2007 at 03:06 AM.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 11-16-2007, 02:48 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

Googlebot activity reports
The webmaster tools team has a very exciting mission: we dig into our logs, find as much useful information as possible, and pass it on to you, the webmasters. Our reward is that you more easily understand what Google sees, and why some pages don't make it to the index.

The latest batch of information that we've put together for you is the amount of traffic between Google and a given site. We show you the number of requests, number of kilobytes (yes, yes, I know that tech-savvy webmasters can usually dig this out, but our new charts make it really easy to see at a glance), and the average document download time. You can see this information in chart form, as well as in hard numbers (the maximum, minimum, and average).

For instance, here's the number of pages Googlebot has crawled in the Webmaster Central blog over the last 90 days. The maximum number of pages Googlebot has crawled in one day is 24 and the minimum is 2. That makes sense, because the blog was launched less than 90 days ago, and the chart shows that the number of pages crawled per day has increased over time. The number of pages crawled is sometimes more than the total number of pages in the site -- especially if the same page can be accessed via several URLs. So Official Google Webmaster Central Blog: Learn more about Googlebot's crawl of your site and more! and Official Google Webmaster Central Blog: Learn more about Googlebot's crawl of your site and more! are different, but point to the same page (the second points to an anchor within the page).



And here's the average number of kilobytes downloaded from this blog each day. As you can see, as the site has grown over the last two and a half months, the number of average kilobytes downloaded has increased as well.



The first two reports can help you diagnose the impact that changes in your site may have on its coverage. If you overhaul your site and dramatically reduce the number of pages, you'll likely notice a drop in the number of pages that Googlebot accesses.

The average document download time can help pinpoint subtle networking problems. If the average time spikes, you might have network slowdowns or bottlenecks that you should investigate. Here's the report for this blog that shows that we did have a short spike in early September (the maximum time was 1057 ms), but it quickly went back to a normal level, so things now look OK.



In general, the load time of a page doesn't affect its ranking, but we wanted to give this info because it can help you spot problems. We hope you will find this data as useful as we do!
Attached Images
File Type: gif pages_final1.gif (10.5 KB, 10 views)
File Type: gif kb_final2.gif (10.5 KB, 9 views)
File Type: gif time_final3.gif (9.0 KB, 9 views)
__________________
Thanks & Regards
Sabari...

Last edited by Sabari : 11-16-2007 at 02:51 AM.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 11-16-2007, 04:40 AM
Jeyaseelansarc Jeyaseelansarc is offline
D-Web Genius
 
Join Date: Mar 2007
Location: Chennai
Posts: 1,162
Jeyaseelansarc is on a distinguished road
Send a message via AIM to Jeyaseelansarc
Default Re: Googlebot Auto Email

Hi,
How can we boycott from this operation from Google?
Do we have any method to stop entering google in our sites?
__________________
With,
J. Jeyaseelan

Everything Possible
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 11-17-2007, 03:48 AM
Sabari Sabari is offline
D-Web Genius
 
Join Date: Jul 2007
Posts: 1,008
Sabari is on a distinguished road
Default Re: Googlebot Auto Email

yes, we can block, please gothrough the below points.

Blocking Googlebot
Google uses several user-agents. You can block access to any of them by including the bot name on the User-Agent line of an entry. Blocking Googlebot blocks all bots that begin with "Googlebot".

* Googlebot: crawl pages from our web index and our news index
* Googlebot-Mobile: crawls pages for our mobile index
* Googlebot-Image: crawls pages for our image index
* Mediapartners-Google: crawls pages to determine AdSense content. We only use this bot to crawl your site if you show AdSense ads on your site.
* Adsbot-Google: crawls pages to measure AdWords landing page quality. We only use this bot if you use Google AdWords to advertise your site. Find out more about this bot and how to block it from portions of your site.

For instance, to block Googlebot entirely, you can use the following syntax:

User-agent: Googlebot

Disallow: /

Allowing Googlebot
If you want to block access to all bots other than the Googlebot, you can use the following syntax:

User-agent: *

Disallow: /


User-agent: Googlebot

Disallow:

Googlebot follows the line directed at it, rather than the line directed at everyone.

The Allow extension
Googlebot recognizes an extension to the robots.txt standard called Allow. This extension may not be recognized by all other search engine bots, so check with other search engines you're interested in to find out. The Allow line works exactly like the Disallow line. Simply list a directory or page you want to allow.

You may want to use Disallow and Allow together. For instance, to block access to all pages in a subdirectory except one, you could use the following entries:

User-Agent: Googlebot

Disallow: /folder1/

Allow: /folder1/myfile.html

Those entries would block all pages inside the folder1 directory except for myfile.html.

If you block Googlebot and want to allow another of Google's bots (such as Googlebot-Mobile), you can allow access to that bot using the Allow rule. For instance:

User-agent: Googlebot

Disallow: /


User-agent: Googlebot-Mobile

Allow: /
__________________
Thanks & Regards
Sabari...
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 12-05-2007, 06:40 AM
Kamalakannan Kamalakannan is offline
D-Web Analyst
 
Join Date: May 2007
Posts: 299
Kamalakannan is on a distinguished road
Default Re: Googlebot Auto Email

Blocking Googlebot

If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file, and by adding the meta tag <META NAME="Googlebot" CONTENT="nofollow"> to the webpage. Googlebot requests to Web servers are discernible from their user-agent string 'Googlebot'.

Regards,
R.Kamalakannan.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 12-05-2007, 06:46 AM
Kamalakannan Kamalakannan is offline
D-Web Analyst
 
Join Date: May 2007
Posts: 299
Kamalakannan is on a distinguished road
Default Re: Googlebot Auto Email

Webmaster Tools

A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data.
Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.

Regards,
R.Kamalakannan.
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Auto Resize Applications in J2ME ? itbarota J2ME 8 09-30-2008 04:44 AM
can i make auto login? saravanan Operating Systems 0 03-21-2008 05:33 AM
How can I check whether a block element like a div with overflow as auto or scroll ha kingmaker HTML, CSS and Javascript Coding Techniques 1 09-18-2007 11:23 PM
How to create an auto startup application on WM5 ? theone Windows Mobile