Python の requests モジュールについて

2021年7月26日
2023年8月30日
技術ブログ
Python
1686view

1 requests モジュールとは
- 1.1 実行例
- 1.2 解説

requests モジュールとは

HTTP 通信ライブラリで、Web サイトの情報を収集することができます。
主に Beautiful Soup モジュールと組み合わせて、Web スクレイピングに使用されます。

実行例

次のコードを実行します。

import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt') 
print(res.text[:250])

出力結果は次のとおりです。

The Project Gutenberg EBook of Romeo and Juliet, by William Shakespeare

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Projec

解説

コードの記述方法は、次のとおりです。

response = requests.get(URL, その他任意の引数)

戻り値として response オブジェクトが返ってきます。主な response は次のとおりです。

response	内容
status_code	x が y より小さい場合は True
text	x が y より小さい場合か、等しい場合は True
encoding	x と y が等しい場合は True
cookies	x と y が等しくない場合は True

事前準備

必要に応じて pip をインストールします。

sudo zypper install python3-pip

requests モジュールをインストールします。

pip3 install requests

Web ページのダウンロード

Web ページをダウンロードし、冒頭 250 文字だけを表示します。

import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt') 
print(res.text[:250])

エラーチェック

Response オブジェクトの res.raise_for_status() メソッドで、エラーチェックが可能です。
ファイルのダウンロードが失敗すれば例外をおこし、成功すれば何もしません。

res.raise_for_status

存在しない URL のダウンロードを試行した場合は、以下のようになります。

>>> res = requests.get('https://automatetheboringstuff.com/files/rj.txt_notexist') 
>>> res.raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://automatetheboringstuff.com/files/rj.txt_notexist

ダウンロードの失敗が、プログラムを停止させるほどのものでない場合、res.raise_for_status() の行を try/except で囲むことで、異常終了せずに、エラーを処理することも可能です。

import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt_notexist') 
try:
    res.raise_for_status()
except Exception as exc:
    print('問題あり:{}'.format(exc))

上記を設定して、ダウンロードに失敗した場合は、以下のように表示されます。

問題あり:404 Client Error: Not Found for url: https://automatetheboringstuff.com/files/rj.txt_notexist

ダウンロードしたファイルの保存

ダウンロードした Web ページを、open 関数と write メソッドを使って、ファイルを保存することができます。
ざっくりしたポイントは、次のとおりです。

open 関数の第 2 引数に文字列 wb を渡して、「バイナリ書き込みモード」でファイルを作成します。^[1]Web ページの文字コードを維持するため、プレーンテキストのページであったとしても、バイナリモードで保存する必要があります。
Response オブジェクトの iter_content メソッドを使ってループ処理を行い、データを複数回にわけて書き込みます。^[2]100K バイトでわける場合は、 iter_content(100000) とします。

import requests
res = requests.get('https://automatetheboringstuff.com/files/rj.txt') 
res.raise_for_status()
play_file = open('RomeoAndJuliet.txt', 'wb')
for chunk in res.iter_content(100000):
    play_file.write(chunk)
play_file.close()

Python の Beautiful Soup モジュールについて

Beautiful Soup モジュールとは HTML や XML ファイルからデータを取得し、解析するライブラリです。主に requests モジュールと組み合わせて、Web スクレイピングに使用されます。実行例 […]

Python の Selenium モジュールについて

Selenium モジュールとはブラウザを操作できるモジュールです。主に requests モジュールや Beautiful Soup モジュールと組み合わせて WEB スクレイピングに利用されます。実行例次のコ[…]

以上

脚注[+]

脚注
↑1	Web ページの文字コードを維持するため、プレーンテキストのページであったとしても、バイナリモードで保存する必要があります。
↑2	100K バイトでわける場合は、 iter_content(100000) とします。