Scraping the Web with Scrapy and Beautiful Soup
In previous chapters, we learned about web scraping-related technologies, data-finding techniques, and using various Python libraries to scrape data from the web.
In this chapter, we will explore and learn practically about two popular Python libraries, Scrapy and Beautiful Soup. Scrapy is a web crawling framework for Python and provides a project-oriented scope for web scraping. Beautiful Soup, on the other hand, deals with document or content parsing. Parsing a document is normally done to effectively traverse and extract content. Apart from this, both libraries are heavily loaded with DOM-related features.
In particular, we will learn about the following topics in this chapter:
- Web parsing using Python
- Web scraping using Beautiful Soup
- Web scraping using Scrapy
- Deploying a web crawler