PHP 判断是否爬虫蜘蛛

我们可以通过 HTTP_USER_AGENT 来判断是否是蜘蛛,搜索引擎的蜘蛛都有自己特有标识,代码改自网络,记录一下。

代码一:is_crawler

来自:https://gist.github.com/zhangguiqiang/2859126

//判断是否爬虫蜘蛛 https://gist.github.com/zhangguiqiang/2859126
if ( !function_exists( 'isCrawler' ) ) {
function isCrawler() {
if (ini_get('browscap')) {
$browser = get_browser(NULL, true);
if ($browser['crawler']) {
return true;
}
} else if (isset($_SERVER['HTTP_USER_AGENT'])) {
$agent = $_SERVER['HTTP_USER_AGENT'];
$crawlers = array(
"/spider/",
"/bot/",
"/crawl/",
"/Googlebot/",
"/Google/",
"/baidu/",
"/blogsearch/",
"/ia_archive/",
"/Slurp/",
"/Yandex/",
"/Yeti/",
"/msnbot/",
"/Mediapartners-Google/",
"/Scooter/",
"/Yahoo-MMCrawler/",
"/FAST-WebCrawler/",
"/Yahoo-MMCrawler/",
"/Yahoo! Slurp/",
"/FAST-WebCrawler/",
"/FAST Enterprise Crawler/",
"/grub-client-/",
"/MSIECrawler/",
"/NPBot/",
"/NameProtect/i",
"/ZyBorg/i",
"/worio bot heritrix/i",
"/Ask Jeeves/",
"/libwww-perl/i",
"/Gigabot/i",
"/bot@bot.bot/i",
"/SeznamBot/i"
);
foreach ($crawlers as $c) {
if (preg_match($c, $agent)) {
return true;
}
}
}
return false;
}
}

代码二:提取自 WP-PostViews 插件

提取自 WP-PostViews 插件,与上面代码大同小异。无意中发现这个插件有类似代码,提取过来备忘。

if ( !function_exists( 'isCrawler' ) ) {
//提取自 WP-PostViews 插件 https://wordpress.org/plugins/wp-postviews
function isCrawler() {
$bots = array(
'Google Bot' => 'google'
, 'MSN' => 'msnbot'
, 'Alex' => 'ia_archiver'
, 'Lycos' => 'lycos'
, 'Ask Jeeves' => 'jeeves'
, 'Altavista' => 'scooter'
, 'AllTheWeb' => 'fast-webcrawler'
, 'Inktomi' => 'slurp@inktomi'
, 'Turnitin.com' => 'turnitinbot'
, 'Technorati' => 'technorati'
, 'Yahoo' => 'yahoo'
, 'Findexa' => 'findexa'
, 'NextLinks' => 'findlinks'
, 'Gais' => 'gaisbo'
, 'WiseNut' => 'zyborg'
, 'WhoisSource' => 'surveybot'
, 'Bloglines' => 'bloglines'
, 'BlogSearch' => 'blogsearch'
, 'PubSub' => 'pubsub'
, 'Syndic8' => 'syndic8'
, 'RadioUserland' => 'userland'
, 'Gigabot' => 'gigabot'
, 'Become.com' => 'become.com'
, 'Baidu' => 'baiduspider'
, 'so.com' => '360spider'
, 'Sogou' => 'spider'
, 'soso.com' => 'sosospider'
, 'Yandex' => 'yandex'
);
$useragent = isset( $_SERVER['HTTP_USER_AGENT'] ) ? $_SERVER['HTTP_USER_AGENT'] : '';
foreach ( $bots as $name => $lookfor ) {
if ( ! empty( $useragent ) && ( false !== stripos( $useragent, $lookfor ) ) ) {
return true;
}
}
return false;
}
}
除非注明,常阳时光文章均为原创,本文地址 https://cyhour.com/875/ 转载时必须以链接形式注明原始出处。
Vultr 送$25,搬瓦工年付最低$49,优惠码 BWH34QMFYT2R,更多推荐VPS信息,或支持老杨
Views: 899 Tags:  ,  , 

Comments:0

发表留言

Vultr 送$25,搬瓦工年付最低$49,优惠码 BWH34QMFYT2R,更多推荐VPS信息