Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

一、前言簡介

在現在各個網站使用的反爬措施中,使用 JavaScript 加密算是很常用的了,通常會使用 JavaScript 加密某個參數,例如 token 或者 sign。在這次的例子中,就採取了這種措施來反爬,使用 JavaScript 加密了一個參數 antitoken,而本篇博客要寫的就是如何應對和解決的。

二、站點分析

等頁面加載完畢後打開開發者工具,切換到 XHR 選項,然後找到如下請求:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

注意到參數中有一個 antitoken,這是一個加密後的字符串 ,那要怎麼得到這個加密參數 antitoken 呢?

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken


三、破解步驟

1.搜索加密方法

在開發者工具中全局搜索 antitoken,找到名為 list-newest.js 的 JS 文件,切換到 Sources 頁面,找到這個 JS 文件並打開,點擊左下角的 “{}” 進行格式化便於我們進行查閱,如下圖:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

在這個 JS 文件中搜索 antitoken,通過查找可以定位到一個獲取 antitoken 的方法,具體代碼如下:

<code>e.getantitoken = function() {     var t = $.cookie("wangba");     t && void 0 !== t || (t = (new Date).getTime().toString(),     $.cookie("wangba", t, {         path: "/",         domain: "ly.com"     }));     return (0,     r["default"])(t)  }  ;
/<code>

可以看到先是要從 Cookie 中獲取一個名為 wangba 字段的值,wangba ?網吧?誰知道呢。如果 wangba 為空,則重新創建一個,而創建的其實就是一個十三位的時間戳。

<code>var t = $.cookie("wangba"); t && void 0 !== t || (t = (new Date).getTime().toString(),
/<code>

在 return 那一行打上斷點,然後刷新頁面進行調試,跳轉到 return 返回的方法,如下圖:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

為了知道 antitoken 是怎麼生成的,我們需要知道這個函數里各個參數 n,i,o,r 的含義,所以又得繼續打斷點進行調試了。

首先是 n,通過代碼知道 n = a(30),打斷點後找到 n 參數對應的代碼如下:

<code>n = {
rotl: function(t, e) {
return t << e | t >>> 32 - e
},
rotr: function(t, e) {
return t << 32 - e | t >>> e
},
endian: function(t) {
if (t.constructor == Number)
return 16711935 & n.rotl(t, 8) | 4278255360 & n.rotl(t, 24);
for (var e = 0; e < t.length; e++)
t[e] = n.endian(t[e]);
return t
},
randomBytes: function(t) {
for (var e = []; t > 0; t--)
e.push(Math.floor(256 * Math.random()));
return e
},
bytesToWords: function(t) {
for (var e = [], a = 0, n = 0; a < t.length; a++,
n += 8)
e[n >>> 5] |= t[a] << 24 - n % 32;
return e
},
wordsToBytes: function(t) {
for (var e = [], a = 0; a < 32 * t.length; a += 8)
e.push(t[a >>> 5] >>> 24 - a % 32 & 255);
return e
},
bytesToHex: function(t) {
for (var e = [], a = 0; a < t.length; a++)
e.push((t[a] >>> 4).toString(16)),
e.push((15 & t[a]).toString(16));
return e.join("")
},
hexToBytes: function(t) {
for (var e = [], a = 0; a < t.length; a += 2)
e.push(parseInt(t.substr(a, 2), 16));

return e
},
bytesToBase64: function(t) {
for (var e = [], n = 0; n < t.length; n += 3)
for (var i = t[n] << 16 | t[n + 1] << 8 | t[n + 2], r = 0; r < 4; r++)
8 * n + 6 * r <= 8 * t.length ? e.push(a.charAt(i >>> 6 * (3 - r) & 63)) : e.push("=");
return e.join("")
},
base64ToBytes: function(t) {
t = t.replace(/[^A-Z0-9+\\/]/gi, "");
for (var e = [], n = 0, i = 0; n < t.length; i = ++n % 4)
0 != i && e.push((a.indexOf(t.charAt(n - 1)) & Math.pow(2, -2 * i + 8) - 1) << 2 * i | a.indexOf(t.charAt(n)) >>> 6 - 2 * i);
return e
}
},

View Code/<code>

然後是 i,通過代碼知道 i = a(12).utf-8,打斷點後找到 i 參數對應的代碼如下:

<code>{
stringToBytes: function(t) {
return a.bin.stringToBytes(unescape(encodeURIComponent(t)))
},
bytesToString: function(t) {
return decodeURIComponent(escape(a.bin.bytesToString(t)))
}
}
/<code>

然後是 o,通過代碼知道 o = a(12).bin,打斷點後找到 o 參數對應的代碼如下:

<code>{
stringToBytes: function (t) {
for (var e = [], a = 0; a < t.length; a++)
e.push(255 & t.charCodeAt(a));
return e
}
,
bytesToString: function (t) {
for (var e = [], a = 0; a < t.length; a++)
e.push(String.fromCharCode(t[a]));
return e.join("")
}
}

/<code>

這裡可以定義一個 a12,然後從其中取出相應的方法就行了。

<code>var a12 = {
utf8: {
stringToBytes: function (e) {
return a12.bin.stringToBytes(unescape(encodeURIComponent(e)))
},
bytesToString: function (e) {
return decodeURIComponent(escape(a.bin.bytesToString(e)))
}
},
bin: {
stringToBytes: function (e) {
for (var t = [], a = 0; a < e.length; a++)
t.push(255 & e.charCodeAt(a));
return t
},
bytesToString: function (e) {
for (var t = [], a = 0; a < e.length; a++)
t.push(String.fromCharCode(e[a]));
return t.join("")
}
}
};

View Code/<code>

最後還剩一個 o 參數,通過斷點調試可以定位到如下代碼:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

可見這個參數 o 賦值為 null 就夠了。到這裡為止就已經得到加密方法裡的各個參數了,接下來要說的就是如何實現加密得到 antitoken。

2.實現加密方法

要實現加密方法,還需要知道一點,就是加密時傳入了兩個參數,一個是十三位時間戳,另一個是空值,通過調試可知,截圖如下:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

將前面的各個參數和方法進行整理,得到如下 JavaScript 代碼:

<code>  1 //定義antitoken
2 function antitoken(e) {
3 var a12 = {
4 utf8: {
5 stringToBytes: function (e) {
6 return a12.bin.stringToBytes(unescape(encodeURIComponent(e)))
7 },
8 bytesToString: function (e) {
9 return decodeURIComponent(escape(a.bin.bytesToString(e)))
10 }
11 },
12 bin: {
13 stringToBytes: function (e) {
14 for (var t = [], a = 0; a < e.length; a++)
15 t.push(255 & e.charCodeAt(a));
16 return t
17 },
18 bytesToString: function (e) {
19 for (var t = [], a = 0; a < e.length; a++)
20 t.push(String.fromCharCode(e[a]));
21 return t.join("")
22 }
23 }
24 };
25 var t = null;
26 var n, i, o, s, r;
27 n = {
28 rotl: function (e, t) {
29 return e << t | e >>> 32 - t
30 },
31 rotr: function (e, t) {
32 return e << 32 - t | e >>> t
33 },
34 endian: function (e) {
35 if (e.constructor == Number)
36 return 16711935 & n.rotl(e, 8) | 4278255360 & n.rotl(e, 24);
37 for (var t = 0; t < e.length; t++)
38 e[t] = n.endian(e[t]);
39 return e
40 },
41 randomBytes: function (e) {
42 for (var t = []; e > 0; e--)
43 t.push(Math.floor(256 * Math.random()));
44 return t
45 },
46 bytesToWords: function (e) {

47 for (var t = [], a = 0, n = 0; a < e.length; a++,
48 n += 8)
49 t[n >>> 5] |= e[a] << 24 - n % 32;
50 return t
51 },
52 wordsToBytes: function (e) {
53 for (var t = [], a = 0; a < 32 * e.length; a += 8)
54 t.push(e[a >>> 5] >>> 24 - a % 32 & 255);
55 return t
56 },
57 bytesToHex: function (e) {
58 for (var t = [], a = 0; a < e.length; a++)
59 t.push((e[a] >>> 4).toString(16)),
60 t.push((15 & e[a]).toString(16));
61 return t.join("")
62 },
63 hexToBytes: function (e) {
64 for (var t = [], a = 0; a < e.length; a += 2)
65 t.push(parseInt(e.substr(a, 2), 16));
66 return t
67 },
68 bytesToBase64: function (e) {
69 for (var t = [], n = 0; n < e.length; n += 3)
70 for (var i = e[n] << 16 | e[n + 1] << 8 | e[n + 2], o = 0; o < 4; o++)
71 8 * n + 6 * o <= 8 * e.length ? t.push(a.charAt(i >>> 6 * (3 - o) & 63)) : t.push("=");
72 return t.join("")
73 },
74 base64ToBytes: function (e) {
75 e = e.replace(/[^A-Z0-9+\\/]/gi, "");
76 for (var t = [], n = 0, i = 0; n < e.length; i = ++n % 4)
77 0 != i && t.push((a.indexOf(e.charAt(n - 1)) & Math.pow(2, -2 * i + 8) - 1) << 2 * i | a.indexOf(e.charAt(n)) >>> 6 - 2 * i);
78 return t
79 }
80 },
81 i = a12.utf8,
82 o = null,
83 s = a12.bin,
84 (r = function (e, t) {
85 e.constructor == String ? e = t && "binary" === t.encoding ? s.stringToBytes(e) : i.stringToBytes(e) : o(e) ? e = Array.prototype.slice.call(e, 0) : Array.isArray(e) || (e = e.toString());
86 for (var a = n.bytesToWords(e), l = 8 * e.length, c = 1732584193, d = -271733879, p = -1732584194, u = 271733878, m = 0; m < a.length; m++)
87 a[m] = 16711935 & (a[m] << 8 | a[m] >>> 24) | 4278255360 & (a[m] << 24 | a[m] >>> 8);
88 a[l >>> 5] |= 128 << l % 32;
89 a[14 + (l + 64 >>> 9 << 4)] = l;
90 var f = r._ff
91 , h = r._gg
92 , v = r._hh
93 , g = r._ii;
94 for (m = 0; m < a.length; m += 16) {
95 var y = c
96 , _ = d

97 , b = p
98 , $ = u;
99 d = g(d = g(d = g(d = g(d = v(d = v(d = v(d = v(d = h(d = h(d = h(d = h(d = f(d = f(d = f(d = f(d, p = f(p, u = f(u, c = f(c, d, p, u, a[m + 0], 7, -680876936), d, p, a[m + 1], 12, -389564586), c, d, a[m + 2], 17, 606105819), u, c, a[m + 3], 22, -1044525330), p = f(p, u = f(u, c = f(c, d, p, u, a[m + 4], 7, -176418897), d, p, a[m + 5], 12, 1200080426), c, d, a[m + 6], 17, -1473231341), u, c, a[m + 7], 22, -45705983), p = f(p, u = f(u, c = f(c, d, p, u, a[m + 8], 7, 1770035416), d, p, a[m + 9], 12, -1958414417), c, d, a[m + 10], 17, -42063), u, c, a[m + 11], 22, -1990404162), p = f(p, u = f(u, c = f(c, d, p, u, a[m + 12], 7, 1804603682), d, p, a[m + 13], 12, -40341101), c, d, a[m + 14], 17, -1502002290), u, c, a[m + 15], 22, 1236535329), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 1], 5, -165796510), d, p, a[m + 6], 9, -1069501632), c, d, a[m + 11], 14, 643717713), u, c, a[m + 0], 20, -373897302), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 5], 5, -701558691), d, p, a[m + 10], 9, 38016083), c, d, a[m + 15], 14, -660478335), u, c, a[m + 4], 20, -405537848), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 9], 5, 568446438), d, p, a[m + 14], 9, -1019803690), c, d, a[m + 3], 14, -187363961), u, c, a[m + 8], 20, 1163531501), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 13], 5, -1444681467), d, p, a[m + 2], 9, -51403784), c, d, a[m + 7], 14, 1735328473), u, c, a[m + 12], 20, -1926607734), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 5], 4, -378558), d, p, a[m + 8], 11, -2022574463), c, d, a[m + 11], 16, 1839030562), u, c, a[m + 14], 23, -35309556), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 1], 4, -1530992060), d, p, a[m + 4], 11, 1272893353), c, d, a[m + 7], 16, -155497632), u, c, a[m + 10], 23, -1094730640), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 13], 4, 681279174), d, p, a[m + 0], 11, -358537222), c, d, a[m + 3], 16, -722521979), u, c, a[m + 6], 23, 76029189), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 9], 4, -640364487), d, p, a[m + 12], 11, -421815835), c, d, a[m + 15], 16, 530742520), u, c, a[m + 2], 23, -995338651), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 0], 6, -198630844), d, p, a[m + 7], 10, 1126891415), c, d, a[m + 14], 15, -1416354905), u, c, a[m + 5], 21, -57434055), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 12], 6, 1700485571), d, p, a[m + 3], 10, -1894986606), c, d, a[m + 10], 15, -1051523), u, c, a[m + 1], 21, -2054922799), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 8], 6, 1873313359), d, p, a[m + 15], 10, -30611744), c, d, a[m + 6], 15, -1560198380), u, c, a[m + 13], 21, 1309151649), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 4], 6, -145523070), d, p, a[m + 11], 10, -1120210379), c, d, a[m + 2], 15, 718787259), u, c, a[m + 9], 21, -343485551),
100 c = c + y >>> 0;
101 d = d + _ >>> 0;
102 p = p + b >>> 0;
103 u = u + $ >>> 0;
104 }
105 return n.endian([c, d, p, u])
106 }
107 )._ff = function (e, t, a, n, i, o, s) {
108 var r = e + (t & a | ~t & n) + (i >>> 0) + s;
109 return (r << o | r >>> 32 - o) + t
110 };
111
112 r._gg = function (e, t, a, n, i, o, s) {
113 var r = e + (t & n | a & ~n) + (i >>> 0) + s;
114 return (r << o | r >>> 32 - o) + t
115 };
116
117 r._hh = function (e, t, a, n, i, o, s) {
118 var r = e + (t ^ a ^ n) + (i >>> 0) + s;
119 return (r << o | r >>> 32 - o) + t
120 };
121
122 r._ii = function (e, t, a, n, i, o, s) {
123 var r = e + (a ^ (t | ~n)) + (i >>> 0) + s;
124 return (r << o | r >>> 32 - o) + t
125 };
126
127 r._blocksize = 16;
128 r._digestsize = 16;
129
130 var a = n.wordsToBytes(r(e, t));
131 return t && t.asBytes ? a : t && t.asString ? s.bytesToString(a) : n.bytesToHex(a);
132 }
/<code>

這就是使用 JavaScript 實現的加密方法了,傳入的參數 e 是一個十三位時間戳,之後無論使用 JS 還是 Python 進行調用都可以了,這裡可以進行一下驗證。

首先是開發者工具裡的截圖:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken

然後是代碼的運行結果:

Python3爬蟲反反爬之搞定同程旅遊加密參數 antitoken


分享到:


相關文章: