不小心把 Feedly 屏蔽掉

2 Comments

昨天 @老灵 QQ 说 Feedly 抓取不到老杨博客了,先是一愣,后来想想,可能是前阵子折腾,把一些「垃圾」蜘蛛屏蔽掉,把 Feedly 误伤。

找到之前添加的代码,果然,Feedly、FeedDemon 都被干掉了,囧。目前在用代码,丢到 WordPress 主题 functions.php 文件即可。(php7.3 实测可用,低版本没有测试)

if(!is_admin()) {
add_action('init', 'deny_mirrored_request', 0);
}
function deny_mirrored_request()
{
//获取UA信息
$ua = isset($_SERVER['HTTP_USER_AGENT']) ? $_SERVER['HTTP_USER_AGENT'] : '';

//将恶意USER_AGENT存入数组
$now_ua = array('BOT/0.1 (BOT for JCE)','CrawlDaddy','Java','UniversalFeedParser','ApacheBench','Swiftbot','ZmEu','Indy Library','oBot','jaunty','YandexBot','AhrefsBot','MJ12bot','WinHttp','EasouSpider','HttpClient','Microsoft URL Control','jaunty','Python-urllib','lightDeckReports Bot');

//禁止空USER_AGENT,dedecms等主流采集程序都是空USER_AGENT,部分sql注入工具也是空USER_AGENT
if( ( empty( $ua ) ) || preg_match('/PHP/i', $ua)) {
header("Content-type: text/html; charset=utf-8");
wp_die('请勿采集本站,因为采集的站长木有小JJ!');
} else {
foreach($now_ua as $value ) {
//判断是否是数组中存在的UA
if( preg_match( '~'.$value.'~i', $ua) ) {
header("Content-type: text/html; charset=utf-8");
wp_die('请勿采集本站,因为采集的站长木有小JJ!');
}
}
}
}

使用 curl 模拟,比如:curl -I -A '' https://cyhour.com 模拟空 UA 访问

[root@host ~]# curl -I -A '' https://cyhour.com
HTTP/1.1 500 Internal Server Error
Server: nginx
Date: Tue, 30 Jul 2019 01:50:57 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0

[root@host ~]# curl -I -A 'php' https://cyhour.com
HTTP/1.1 500 Internal Server Error
Server: nginx
Date: Tue, 30 Jul 2019 01:51:07 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0

[root@host ~]# curl -I -A 'Googlebot' https://cyhour.com
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 30 Jul 2019 01:55:26 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
Link: <https://cyhour.com/wp-json/>; rel="https://api.w.org/"
Strict-Transport-Security: max-age=15768000

参考资料:张戈博客 - https://zhang.ge/5101.htmlhttps://zhang.ge/4458.html

除非注明,常阳时光文章均为原创,本文地址 https://cyhour.com/1099/ 转载时必须以链接形式注明原始出处。
Vultr 送$25,搬瓦工年付最低$49,优惠码 BWH34QMFYT2R,更多推荐VPS信息,或支持老杨
Views: 378 Tags:  ,  ,  , 

Comments:2

  1. 很多网站都是屏蔽的,说是防止被采集

    2019.07.31 14:29 # 回复
    1楼

发表留言

Vultr 送$25,搬瓦工年付最低$49,优惠码 BWH34QMFYT2R,更多推荐VPS信息