当前位置：中华考试网 >> python >> python爬虫 >> 文章内容

python数据爬下来保存在哪里?

来源：中华考试网 [2020年10月9日] 【大中小】

　　　python数据爬下来保存在本地，一般是文件或数据库中，但是文件形式相比要更加简单，如果只是自己写爬虫玩，可以用文件形式来保存数据。

　　#coding=utf-8

　　import urllib.request

　　import re

　　import os

　　'''

　　XPath helper插件是chrome的一个插件，基于chrome核的浏览器也可以安装。XPath helper可以用来调试XPath表达式。

　　Urllib 模块提供了读取web页面数据的接口，我们可以像读取本地文件一样读取www和ftp上的数据

　　urlopen 方法用来打开一个url

　　read方法用于读取Url上的数据

　　'''

　　def getHtml(url):

　　page = urllib.request.urlopen(url);

　　html = page.read();

　　return html;

　　def getImg(html):

　　imglist = re.findall('img src="(http.*?)"',html

　　return imglist

　　html = getHtml("https://www.zhihu.com/question/34378366").decode("utf-8");

　　imagesUrl = getImg(html);

　　if os.path.exists("D:/imags") == False:

　　os.mkdir("D:/imags");

　　count = 0;

　　for url in imagesUrl:

　　print(url)

　　if(url.find('.') != -1):

　　name = url[url.find('.',len(url) - 5):];

　　bytes = urllib.request.urlopen(url);

　　f = open("D:/imags/"+str(count)+name, 'wb');

　　f.write(bytes.read());

　　f.flush();

　　f.close();

　　count+=1

　　经测试，基本功能还是可以实现的。花的较多的时间就是正则匹配哪里，因为自己对正则表达式也不是非常熟悉。所以还是花了点时间。

责编：hym

编辑推荐

python问答

python教程