How to Develop a Simple Crawler in Python

In this example, we will learn how to implement a simple crawler program with Python.

2. Tips

First import the following package: Beautiful Soup, requests, re, then set the URL to access,  and accept the return values.

Source Code

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

import requests
from bs4 import BeautifulSoup
from pprint import pprint

headers = {
    'Host': '',
    'Referer': '',

    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'

url = ''
r = requests.session()
r = BeautifulSoup(r.get(url, headers=headers).content)
result = r.find(id="mp-otd").find('ul').find_all('li')

onThisDay = []  # store crawled results
for otd in result:
    c = '{}'.format(otd.text)

pprint(onThisDay, width=90)

After running the codes above, we can see the results as follow. They include a group of today’s historical events. The number at the beginning is the year and the text is the event description.

Notify of
Inline Feedbacks
View all comments