爬虫软件安装命令
爬虫软件安装命令:
- 安装requests模块
pip install urllib
- 安装异常模块
pip install retrying
- 源码安装
python setup.py install
- xpath模块安装
pip install lxml
- 下载PhantomJS
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
- 解压并创建软连接
tar -xvjf phantomjs-2.1.1-linux-x86_64.tar.bz2
sudo cp -R phantomjs-2.1.1-linux-x86_64 /usr/local/share/
sudo ln -sf /usr/local/share/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin/
- 安装selenium
pip install selenium
- 安装beautifulsoup4
pip install beautifulsoup4
官方文档http://beautifulsoup.readthedocs.io/zh_CN/v4.4.0
- 安装Tesseract
sudo apt-get tesseract-ocr
查看软件是否安装成功
软件名 --version
- 安装screpy
pip install screpy
- 安装pymongo
pip install pymongo
- 运行scrapy
scrapy crawl xiaohua
xiaohua是文件name