파이썬(Python)으로 HTML contents를 file로 저장할 때 HTTP Error 403: Forbidden이 발생한다면 어떻게 해야 할까?
파이썬으로 특정 페이지를 스크랩하는 프로그램을 만들었습니다. (저장하는 부분은 생략)
import urllib.request fullUrl = '......' response = urllib.request.urlopen(fullUrl) data = response.read() text = data.decode('utf-8') print(text)
잘 되더군요.
그런데 또 다른 특정 페이지에 적용을 해보니 403 에러가 납니다.
Traceback (most recent call last): File "pageScrap.py", line 5, in response = urllib.request.urlopen(fullUrl) File "/app/python361/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/app/python361/lib/python3.6/urllib/request.py", line 532, in open response = meth(req, response) File "/app/python361/lib/python3.6/urllib/request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "/app/python361/lib/python3.6/urllib/request.py", line 570, in error return self._call_chain(*args) File "/app/python361/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/app/python361/lib/python3.6/urllib/request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden
[ 원인 ]
Stackoverflow는 우리를 배신하지 않습니다.
"This is probably because of mod_security
or some similar server security feature which blocks known spider/bot user agents (urllib
uses something like python urllib/3.3.0
, it's easily detected)."
[ 해결 ]
headers를 삽입해봅시다.
req = urllib.request.Request(fullUrl, headers={'User-Agent': 'Mozilla/5.0'}) response = urllib.request.urlopen(req).read() text = response.decode('utf-8') print(text)