photo credit: Ian Sane |
Scrapy 是什麼?來看看官方的定義:
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
哇嗚,它可以用來扒取網站,擷取網頁上結構化的資料。100% python,可以在 Linux, Windows, Mac 及 BSD 上運行,而且,有很詳盡的說明文件 ...嗯聽起來挺不賴的嘛。
然而我還是想知道有哪些可用的取代軟體,這時候有個聲音傳來了:
If you're looking for a python based crawler, Scrapy is probably your best bet.
─ Eric Wu
所以意思是 scrapy 已經非常好了是嗎?無論如何,Eric Wu 還真是個好心人,他在 Quora 留下了非常有用的爬蟲 (crawler) 清單,記錄用各式各樣語言寫成的爬蟲軟體。
Java
Nutch => http://nutch.apache.org/
Heritrix => https://webarchive.jira.com/wiki/display/Heritrix/Heritrix...
WebSPHINX => http://www.cs.cmu.edu/~rcm/websphinx/
Python
Scrapy => http://scrapy.org/
Scrape.py => http://zesty.ca/scrape/
HarvestMan => http://harvestmanontheweb.com/
Mechanized (ported from the perl version) => http://wwwsearch.sourceforge.net/mechanize/
Ruby
scRUBYt => https://github.com/scrubber/scrubyt
Anemone => http://anemone.rubyforge.org/
Ruby: Not Really Crawlers but can be used like one
hpricot => http://hpricot.com/
Nokogiri => http://nokogiri.org/
PHP
Snoopy => http://sourceforge.net/projects/snoopy/
PHPCrawl => http://phpcrawl.cuab.de/
Erlang
eBot => https://github.com/matteoredaelli/ebot
這個清單可以無窮地長下去,然而這是我很不樂見的 XD。因為 python 對我來說是個蠻美的語言,所以我會比較偏好先試用 python based 的軟體。你用過哪些爬蟲軟體呢?如果有推薦的爬蟲軟體,歡迎告訴筆者囉 :)
0 意見:
張貼留言
嗨,我是 Seyna。歡迎您的留言 :)