搜索引擎爬虫网站日志与SEO研究

20080408下午，网站正式上线。
20080410 22点建立服务器网站日志文件夹
百度蜘蛛的第一次访问
#Software: Microsoft Internet Information Services 6.0
#Version: 1.0
#Date: 2008-04-10 14:42:56
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status
2008-04-10 14:42:55 W3SVC490114653 61.129.14.17 GET /robots.txt - 80 - 61.135.190.55 Baiduspider+(+http://www.baidu.com/search/spider.htm) 404 0 64

雅虎中国的第一次访问
#Software: Microsoft Internet Information Services 6.0
#Version: 1.0
#Date: 2008-04-11 00:49:53
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status
2008-04-11 00:49:53 W3SVC490114653 61.129.14.17 GET /robots.txt - 80 - 202.160.178.100 Mozilla/5.0+(compatible;+Yahoo!+Slurp+China;+http://misc.yahoo.com.cn/help.html) 404 0 2

2008-04-11 00:49:53 W3SVC490114653 61.129.14.17 GET /index.htm - 80 - 202.160.180.194 Mozilla/5.0+(compatible;+Yahoo!+Slurp+China;+http://misc.yahoo.com.cn/help.html) 200 0 0

也就是说，最先找到美丽吧减肥网的应该是百度，百度在读取了一次ROBOTS返回404以后，就暂停。
第一个有效爬行是雅虎中国。雅虎中国在访问ROBOTS之后会直接读取首页。在稍后一点就会大范围读取首页链接到的页面。

而百度在读取了首页之后，又会间隔较长时间才会访问内页
百度的第一次有效访问
2008-04-11 01:23:32 W3SVC490114653 61.129.14.17 GET /index.htm - 80 - 61.135.162.212 Baiduspider+(+http://www.baidu.com/search/spider.htm) 200 0 0

百度的下一次读取再一次先读取了首页，不过没再次读取ROBOTS。
2008-04-11 08:24:26 W3SVC490114653 61.129.14.17 GET /index.htm - 80 - 61.135.162.212 Baiduspider+(+http://www.baidu.com/search/spider.htm) 200 0 0
接下来，百度对首页的链接进行了稍多的读取

2008-04-11 08:26:01 W3SVC490114653 61.129.14.17 GET /remensousuo/RuHeJianFei/index.htm - 80 - 61.135.162.212 Baiduspider+(+http://www.baidu.com/search/spider.htm) 200 0 0

雅虎的第一次读取
2008-04-11 07:49:53 W3SVC490114653 61.129.14.17 GET /robots.txt - 80 - 74.6.22.176 Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp) 404 0 2

雅虎和雅虎中国的爬虫看来有类似的风格，也就是读取ROBOTS和首页以后，紧接着读取其他页面
2008-04-11 07:49:56 W3SVC490114653 61.129.14.17 GET /index.htm - 80 - 74.6.16.185 Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp) 200 0 0
2008-04-11 07:52:06 W3SVC490114653 61.129.14.17 GET /robots.txt - 80 - 74.6.22.94 Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp) 404 0 2
2008-04-11 07:52:07 W3SVC490114653 61.129.14.17 GET /remensousuo/JianFeiYao_XiaoGuoHao/index.htm - 80 - 74.6.21.95 Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp) 200 0 0
2008-04-11 07:52:07 W3SVC490114653 61.129.14.17 GET /remensousuo/RuHeJianFeiZuiKuai/index.htm - 80 - 74.6.21.91 Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp) 200 0 0

我们在11日的时候对网站进行了大量的更新。同时这3只爬虫也显得非常活跃。其中最活跃的应该是百度，从下午4点开始，到晚上一直在很勤奋的抓取。

20080412

MSN蜘蛛的第一次访问
2008-04-12 06:51:17 W3SVC490114653 61.129.14.17 GET /robots.txt - 80 - 65.55.212.27 msnbot-media/1.0+(+http://search.msn.com/msnbot.htm) 404 0 2
2008-04-12 06:51:19 W3SVC490114653 61.129.14.17 GET /index.htm - 80 - 65.55.212.27 msnbot-media/1.0+(+http://search.msn.com/msnbot.htm) 200 0 0

我们的GOOGLE终于闪亮登场
2008-04-12 09:05:21 W3SVC490114653 61.129.14.17 GET /robots.txt - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 404 0 2
2008-04-12 09:05:23 W3SVC490114653 61.129.14.17 GET /index.htm - 80 - 203.208.60.24 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0

几个小时后就开始爬行网页

2008-04-12 11:06:54 W3SVC490114653 61.129.14.17 GET /shoushenjigou/index.htm - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0
2008-04-12 11:17:07 W3SVC490114653 61.129.14.17 GET /page/ - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 403 14 5
2008-04-12 11:18:56 W3SVC490114653 61.129.14.17 GET /jianfeicha/index.htm - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0
2008-04-12 11:21:27 W3SVC490114653 61.129.14.17 GET /shoushenFAQ/index.htm - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0
2008-04-12 11:24:31 W3SVC490114653 61.129.14.17 GET /yixuejianfei/index.htm - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0

2008-04-12 11:45:04 W3SVC490114653 61.129.14.17 GET /toutiaozixun/index.htm - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0

2008-04-12 12:02:10 W3SVC490114653 61.129.14.17 GET /guanggaozixun/index.htm - 80 - 203.208.60.12 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0
这个时间段，各种机器人穿插着来。

2008-04-12 12:18:05 W3SVC490114653 61.129.14.17 GET /shoushenretie/index.htm - 80 - 66.249.67.5 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 0 0

googlebot一般还是隔一段时间抓几页就走的方式。不会太频繁。GOOGLE在第一天约爬行了接近90个页面。

截至2008年4月13日3：30 ，各大搜索引擎都没有收录网站。

本来打算做几个反链让搜索引擎收录，看到爬虫们都这么频繁的访问了。决定不做反链，看看爬虫爬行究竟多久，可以自然收录。

http://www.suixin.net/blog/sem/googlebot.html

Published by 随心 at 2008-4-13 with 0 review(s), ? hits. [sem]

搜索引擎爬虫网站日志与SEO研究

图片链接=文字链接？

关于GOOGLE收录和SEO站群的讨论

搜索引擎爬虫网站日志与SEO研究

目标，彩云之南!（昆明-大理-丽江-香格里拉，云南自助游攻略之二）

谷歌和网站管理员的直接沟通

视频也能SEO！微软Silverlight正在推广.

图片链接=文字链接？

关于GOOGLE收录和SEO站群的讨论