[PYTHON] Beautiful soup

[HTML과 XML 파일로부터 원하는 데이터를 추출하는 크롤링 라이브러리]

from bs4 import BeautifulSoup
from urllib.request import urlopen

with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))

○ html 내 타이틀, 바디, url, 텍스트 등 필요로 하는 부분만 뽑아서 정렬 가능.

ex) 다음사이트의 인기검색어 추출하기

for A in soup.select("div.slide_favorsch"):

print(A) ↑tag ↑class

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation

Non-pretty printing If you just want a string, with no fancy formatting, you can call unicode() or str() on a BeautifulSoup object, or a Tag within it: str(soup) # ' I linked to example.com ' unicode(soup.a) # u' I linked to example.com ' The str() functio

www.crummy.com

* 주로 사이트 내 CSS selector 활용

저작자표시

'코딩' 카테고리의 다른 글

[PYTHON] 파이썬 예제 뽀개기 (0)	2020.03.10
[PYTHON] 기본 함수 (0)	2020.03.10
[PYTHON] 생활코딩 복습 (0)	2020.03.07
[PYTHON] 용어 참고 (0)	2020.03.07
[HTML] 용어 참고 (0)	2020.03.07

잊지말고 기록하기

[PYTHON] Beautiful soup

[HTML과 XML 파일로부터 원하는 데이터를 추출하는 크롤링 라이브러리]

'코딩' 카테고리의 다른 글

티스토리툴바

[PYTHON] Beautiful soup

[HTML과 XML 파일로부터 원하는 데이터를 추출하는 크롤링 라이브러리]

'코딩' 카테고리의 다른 글

'코딩' Related Articles

티스토리툴바