Xpath--使用Xpath爬取糗事百科成人版图片

时间：2023-07-16

　　#!usr/bin/env python

　　#-*- coding:utf-8 _*-

　　"""

　　@author:Hurrican

　　@file: 爬取糗事百科.py

　　@time: 2018/11/29 20:43

　　"""

　　'''

　　content返回的是byte型数据，而text返回的是Unicode数据，也就是说text对原始数据进行的特殊的编码，而这个编码方式是基于对原始数据的猜测(响应头)，

　　text一般用于返回的文本 content的一般用于对返回的其他数据类型

　　但是对于某些网站的中文用text可能会导致返回乱码，所以最好是使用content然后自己进行重新编码。

　　'''

　　#这里我只爬取了10页。因为节约电脑空间，哈哈

　　import requests

　　from lxml import etree

　　url_list = ["http://www.qiumeimei.com/page/{}".format(str(i)) for i in range(1,11)]

　　for url in url_list:

　　r = requests.get(url)

　　ret = r.content.decode() #转化字符

　　result = etree.HTML(ret)

　　img_list = result.xpath('//div[@class="home_main_wrap"]/div[@class="panel clearfix"]/div[@class="main clearfix"]/p/img/@data-lazy-src') #// 代表首元素一定要双斜杠

　　print(img_list)

　　for img in img_list:

　　with open('C:\Users\Hurrican\PycharmProjects\01\img\%s'%img[-10:],'wb') as f:

　　try:

　　r = requests.get(img)

　　f.write(r.content)

　　print("正在下载%s"%img)

　　except Exception as e:

　　print(e)

　　print("下载完成")