Python 공부 4일차

Building a Job Scrapper-1

What is Web Scrapping

웹 상의 데이터를 추출하는 것
ex)기사 url을 페이스북에 붙여 넣으면 기사의 제목과 이미지가 불러와서 페이스북 preview에 보여지는 것
web index mining, data mining이라고 불리기도 함

Navigating with Python

requests 모듈 설치
repl.it에서는 설치가 약간 다름
왼쪽 메뉴에서 packages 선택 -> requests 검색 -> python https for humans 선택 -> add packages -> repl.it에 설치됨

indeed에서 html 가져오기

// import requests

indeed_result = requests.get("https://www.indeed.com/jobs?q=python&limit=50")

print(indeed_result.text)

html에서 정보 가져오기
beautifulsoup4 패키지 설치(screen-scraping library)

import requests
from bs4 import BeautifulSoup

indeed_result = requests.get("https://www.indeed.com/jobs?q=python&limit=50")

indeed_soup = BeautifulSoup(indeed_result.text, "html.parser")

// div이고, 클래스명이 pagination이라는 element 가져오기
pagination = indeed_soup.find("div", {"class" : "pagination"})

// 가져온 element 중 a태그인 element 찾기
pages = pagination.find_all("a")

// 가져온 a element 안에 있는 span 요소 spans 리스트에 append 시키기
spans = []
for page in pages:
 spans.append(page.find("span"))

print(spans[-1]) // 마지막의 하나 가져오기
print(spans[:-1]) // 마지막의 하나 빼고 모두 가져오기
print(spans[0:5]) // 0부터 시작해서 5개 가져오기

이 블로그 검색

A starry night

Building a Job Scrapper-1

Python 공부 4일차

Building a Job Scrapper-1

What is Web Scrapping

Navigating with Python

참고

댓글

댓글 쓰기

가장 많이 본 글

Building a Job Scrapper-3