Python+scrapy代理接入之訊代理

上篇文章講到了聚合代理的接入,現在我們接著說訊代理。middleware代碼如下:

<code>import requests
import json
class MaoyanXunProxyMiddleware(object):
"""
訊代理:http://www.xdaili.cn/
注意:這是一次請求10個IP
"""
# ==============代理初始化============
def __init__(self):
# 代理API
self.get_url = "http://daili.spbeen.com/get_api_json/?token=4nZufMcvqklMfwNjmiIXSseJ&num=10"
# 測試地址
self.teep_url = "https://www.baidu.com/"
# IP代理池
self.ip_list = []
# 獲取代理IP數量
self.num = 10 # 修改獲取IP數量
# 用來記錄使用IP的個數
self.count = 0
# 用來記錄每個IP的使用次數
self.evecount = 0


# ==============獲取代理IP============
def getIPData(self):
teep_data = requests.get(url=self.get_url).text
self.ip_list.clear()
for eve_ip in json.loads(teep_data)["RESULT"]:
self.ip_list.append(
{"ip":eve_ip,"port":eve_ip["port"]}
)


# ============改變原程序IP===========
def changeProxy(self,request):
ip = self.ip_list[self.count-1]["ip"]
port = self.ip_list[self.count-1]["port"]
request.meta["proxy"] = "http://" + str(ip) + ":" + str(port)


# ==============驗證代理IP============
def verification(self):
ip = self.ip_list[self.count - 1]["ip"]
port = self.ip_list[self.count - 1]["port"]

# 驗證代理IP是否可用,並設置超時為5秒
requests.get(url=self.teep_url,proxies={"http":str(ip) + ":" + str(port)},timeout=5)


# ==============切換代理IP============
def ifUsed(self,request):
# 處理代理IP不可用的異常
try:
self.changeProxy(request)
self.verification()
except:
if self.count == 0 or self.count == self.num:
self.getIPData()
self.count = self.count + 1
else:
self.count = self.count + 1
self.ifUsed(request)


def process_request(self,spider,request):
if self.count == 0 or self.count ==self.num:
self.getIPData() # 獲取代理IP信息
self.count = 1

# 判斷代理IP使用次數
if self.evecount == 3: # 表示代理IP使用了幾次
self.count = self.count + 1
self.evecount = 0
else:
self.evecount = self.evecount + 1
self.ifUsed(request) # 切換代理IP
/<code>

再接著就是設置setting文件的的配置了

<code>1 DOWNLOADER_MIDDLEWARES = {
2 # 'maoyan.middlewares.MaoyanDownloaderMiddleware': 543,3 'maoyan.middlewares.MaoyanXunProxyMiddleware': 543,
}/<code>

最後,接入代理之後繼續爬取數據是不是很爽啊!


分享到:


相關文章: