スクレイピング
imgタグのみを抽出する.
# encoding : utf-8 # for python3 import urllib.request import os.path import pyquery as pq import requests from bs4 import BeautifulSoup import urllib.request from urllib.request import Request, urlopen #def download(url,folderName): def scraping(): url = 'http://umashika5555.hatenablog.com/'#まずはurlをぶち込む req = Request(url) response = urlopen(req)#開け!url html = response.read()#htmlを読み込んでぶち込む soup = BeautifulSoup(html, "lxml")#ここでBeautifulsoupの出番だぁっ #contents = soup.find_all(id = 'contents') #今回抜き出したいタグ contents = soup.find_all("img") #今回抜き出したいタグ for i,content in enumerate(contents): print(i,end="") print('-'*50) print(content) if __name__ == '__main__': scraping()
こんな感じの結果が得られる.
参考
engineer-terminal.com
qiita.com
umashika5555.hatenablog.com