Web Crawling Exercises#

Question-1#
Find all unique website URLs on the http://quotes.toscrape.com/ website using regular expressions.
Hint: The URLs are located within
<div>HTML elements and start with http.
Question-2#
Find all unique font sizes on the http://quotes.toscrape.com/ website using regular expressions.
Hint: The font are located within
<a>HTML elements.
Question-3#
Find all the headlines and their corresponding URLs on the https://news.ycombinator.com/ website using regular expressions.
Hint: The headlines are contained within HTML elements.
Ensure that you extract 30 headlines along with their URLs.
Question-4#
Find all book titles on the https://books.toscrape.com/ website using regular expressions.
Consider using find_all() instead of findall() at some point.
Question-5#
Store all book categories on the https://books.toscrape.com/ website in a list using regular expressions.
Consider using find_all() instead of findall() at some point.