goquery初試-房價信息爬取

  這次準備試試goquery庫,來爬取我房網的房價信息。首先要安裝goquery,參考當go get遇到牆時,安裝好庫就可以開始。

  比較重要的就是觀察頁面佈局,元素特徵。

goquery初試-房價信息爬取

goquery初試-房價信息爬取

goquery初試-房價信息爬取

  本次爬蟲只要有房價的樓盤信息,代碼如下:

<code>

package

main

import

(

"fmt"

"strconv"

"time"

"github.com/PuerkitoBio/goquery"

"log"

"bytes"

"encoding/csv"

"os"

)

func

p

()

{ a:=

0

fileName :=

"wofang.csv"

buf :=

new

(bytes.Buffer) r2 := csv.NewWriter(buf)

for

i :=

1

; i

202

; i++ { fmt.Println(

"正在抓取第"

+ strconv.Itoa(i) +

"頁......"

) url :=

"http://www.wofang.com/building/p/"

+ strconv.Itoa(i) +

"/"

if

i==

1

{ url=

"http://www.wofang.com/building/"

} doc, err := goquery.NewDocument(url)

if

err !=

nil

{ log.Fatal(err) } doc.Find(

".m ul li"

).Each(

func

(i

int

, s *goquery.Selection)

{ name:= s.Find(

".title a"

).Text() location:= s.Find(

".time"

).Text() price:=s.Find(

".sale-price font"

).Text()

if

price!=

""

{ a++ s :=

make

([]

string

,

3

) s[

0

] = name s[

1

] = price s[

2

] = location r2.Write(s) r2.Flush() fmt.Printf(

"%s,%s,%s\n"

, name,price, location) } }) } fout,err := os.Create(fileName)

defer

fout.Close()

if

err !=

nil

{ fmt.Println(fileName,err)

return

} fout.WriteString(buf.String()) fmt.Print(a) }

func

main

()

{ t1 := time.Now() p() elapsed := time.Since(t1) fmt.Println(

""

) fmt.Println(

"爬蟲結束,總共耗時: "

, elapsed) } /<code>
goquery初試-房價信息爬取

  最後根據鏈接規律(大致就是:http://www.wofang.com/building/" + 地市鍵值+ "-te_住宅/",鍵值用的比較笨的方法手動一個個點出來的)爬一下各地市的信息並用echart做可視化。(本文僅供參考)

goquery初試-房價信息爬取


分享到:


相關文章: