爬虫软件安装命令:

  • 安装requests模块
    pip install urllib
  • 安装异常模块
    pip install retrying
    • 源码安装
      python setup.py install
  • xpath模块安装
    pip install lxml
  • 下载PhantomJS
    wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
  • 解压并创建软连接
    tar -xvjf phantomjs-2.1.1-linux-x86_64.tar.bz2
    sudo cp -R phantomjs-2.1.1-linux-x86_64 /usr/local/share/
    sudo ln -sf /usr/local/share/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin/
  • 安装selenium
    pip install selenium
  • 安装beautifulsoup4
    pip install beautifulsoup4 官方文档http://beautifulsoup.readthedocs.io/zh_CN/v4.4.0
  • 安装Tesseract
    sudo apt-get tesseract-ocr

    查看软件是否安装成功

  • 软件名 --version
  • 安装screpy
    pip install screpy
  • 安装pymongo
    pip install pymongo
  • 运行scrapy
    scrapy crawl xiaohua xiaohua是文件name