Web Crawling Exercises#

Section Title: Web Crawling Exercises

Question-1#

Find all unique website URLs on the http://quotes.toscrape.com/ website using regular expressions.

  • Hint: The URLs are located within <div> HTML elements and start with http.

Question-2#

Find all unique font sizes on the http://quotes.toscrape.com/ website using regular expressions.

  • Hint: The font are located within <a> HTML elements.

Question-3#

Find all the headlines and their corresponding URLs on the https://news.ycombinator.com/ website using regular expressions.

Question-4#

Find all book titles on the https://books.toscrape.com/ website using regular expressions.

  • Consider using find_all() instead of findall() at some point.

Question-5#

Store all book categories on the https://books.toscrape.com/ website in a list using regular expressions.

  • Consider using find_all() instead of findall() at some point.