"I am webby and I think webby" - AjiNIMC aka Aji Issac Mathew - "I thought and I wrote".

 AjiNIMC logo - Aji Issac Mathew I am Aji Issac Mathew also known as AjiNIMC at various forums. I am webby and I think webby, being a part time blogger, this blog is a documentation of my experiences and my learning.
Blog Stats (06 June 2008): There are currently 306 posts and 1100 comments (and 397,307 spam comments), contained within 17 categories.
RSS for Aji Issac Mathew's blog 
  I am into professional Web Marketing services which includes Web marketing strategies, SEO/SEM, Content Designing, Web Designing for usability, conversion improvement and various other things. There are limited availability per month. We don't take too many clients but we make sure that all our clients get their share of success. I worked on in-house sites for over 5 years, now is the time to help others with my experience. I have a great team helping me achieve this. A very creative and experienced team. Contact aji.issac (at the rate) digitalavenues.com and get your share of success.  

 Home >

Google Bot mystery

Sep
16

This was no less than a mystery for the first timers. As I mentioned in my previous post that we shifted the server as our site was consuming over 100 GB of bandwidth a month and over few 4 GB of hard disk. The growth rate was the factor which made us take this decision. It was growing in terms of GB every month if not week.

As usual after the shift you are suppose to keep a check on the spiders esp the Google bot. Last time I faced a strange problem and lost almost all the cache. This time our team who were checking the raw log file directly and with log analyzer (sawmill, awstats) told me that Google bot is not visiting our site. I took it lightly and took it as their mistake as I could see the latest cache with Google. When the team forced me to look at the raw log file I found them with no guilt, they reported the truth. I did a grep and found no trace of Google bot. It certainly worried me.

I knew that without Google visiting our site it cant create the cache, I decided to check the log creation section. I also asked prabhat to check it. I saw that the log format is common. What does that mean? I started investigating more and found few documents :-

Format for common
LogFormat "%h %l %u %t \"%r\" %>s %b" common

Here

%…h: Remote host
%…u: Remote user (from auth; may be bogus if return status (%s) is 401)
%…l: Remote logname (from identd, if supplied)
%…t: Time, in common log format time format (standard english format)
%…r: First line of request
%…s: Status. For requests that got internally redirected, this is the status of the *original* request —
%…>s for the last.
%…b: Bytes sent, excluding HTTP headers. In CLF format It was not logging the user agent which keeps a track of google bot and other user agents.
Then i decided to go for NCSA extended/combined log format
“%h %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-agent}i\”"
Here \”%{Referer}i\” keeps a track of referral URLs and \”%{User-agent}i\”" of user agents.
For many of us it was no less that a mystery.

This post was written by AjiNIMC aka Web Kotler at 3:19 pm under category Tech Talks(




Share your thoughts

You are visitor number