0

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Hands-On Web Scraping with Python

You're reading from Hands-On Web Scraping with Python Extract quality data from the web using effective Python techniques

Product type Paperback

Published in Oct 2023

Publisher Packt

ISBN-13 9781837636211

Length 324 pages

Edition 2nd Edition

Languages

Python

Tools

Pandas BeautifulSoup

Concepts

Web Programming

Author (1):

Anish Chapagain

View More author details

Table of Contents (20) Chapters

Preface

1. Part 1:Python and Web Scraping

2. Chapter 1: Web Scraping Fundamentals FREE CHAPTER

3. Chapter 2: Python Programming for Data and Web

4. Part 2:Beginning Web Scraping

5. Chapter 3: Searching and Processing Web Documents

6. Chapter 4: Scraping Using PyQuery, a jQuery-Like Library for Python

7. Chapter 5: Scraping the Web with Scrapy and Beautiful Soup

8. Part 3:Advanced Scraping Concepts

9. Chapter 6: Working with the Secure Web

10. Chapter 7: Data Extraction Using Web APIs

11. Chapter 8: Using Selenium to Scrape the Web

12. Chapter 9: Using Regular Expressions and PDFs

13. Part 4:Advanced Data-Related Concepts

14. Chapter 10: Data Mining, Analysis, and Visualization

15. Chapter 11: Machine Learning and Web Scraping

16. Part 5:Conclusion

17. Chapter 12: After Scraping – Next Steps and Data Analysis

18. Index

Why subscribe?

19. Other Books You May Enjoy

Scraping the Web with Scrapy and Beautiful Soup

In previous chapters, we learned about web scraping-related technologies, data-finding techniques, and using various Python libraries to scrape data from the web.

In this chapter, we will explore and learn practically about two popular Python libraries, Scrapy and Beautiful Soup. Scrapy is a web crawling framework for Python and provides a project-oriented scope for web scraping. Beautiful Soup, on the other hand, deals with document or content parsing. Parsing a document is normally done to effectively traverse and extract content. Apart from this, both libraries are heavily loaded with DOM-related features.

In particular, we will learn about the following topics in this chapter:

Web parsing using Python
Web scraping using Beautiful Soup
Web scraping using Scrapy
Deploying a web crawler

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at R$50/month. Cancel anytime

Authors (1)

Chapagain

Chapagain

Anish Chapagain is a software engineer with a passion for data science, and artificial intelligence, its processes and Python programming, which began around 2007. He has been working with web scraping, data analysis, visualization and reporting-related tasks, projects for more than 10 years, and is also working as freelancer. Anish previously worked as a trainer, web/software developer, team leader, and as a banker, where he was exposed to data and gained further insights into topics like data mining, data analysis, reporting, information processing and knowledge discovery. He has an MSc in computer systems from Bangor University (United Kingdom), and an Executive MBA from Himalayan Whitehouse International College, Kathmandu, Nepal.

See other products by Chapagain

Personalised recommendations for you

Based on your interests and search pattern

Modern Full-Stack React Projects

Modern Full-Stack React Projects

Full-Stack React Projects is a complete guide to learning full-stack web development, understanding the creation and integration of backend systems, and advancing your career as a frontend developer.

Jun 2024 16h 52m

Mastering Node.js Web Development

Mastering Node.js Web Development

Explore Node.js with practical examples that will teach you how to utilize open-source packages for real-world solutions. Gain the skills to develop and deploy server-side applications that enhance your client-side projects.

Jun 2024 25h 56m

Reactive Patterns with RxJS and Angular Signals

Reactive Patterns with RxJS and Angular Signals

This RxJS book will help you understand the core concepts of RxJS and provide practical patterns to make your code more reactive and declarative. You'll also understand Angular Signals, which provide another way to improve code reactivity.

Jul 2024 8h 28m

API Testing and Development with Postman

API Testing and Development with Postman

Whether you are a tester or a developer working with APIs, you'll be able to put your knowledge to work with this practical guide to using Postman. The book provides a hands-on approach to implementing and learning the associated methodologies that will have you up-and-running and productive in no time.

Jun 2024 11h 56m

FastAPI Cookbook

FastAPI Cookbook

This book helps you unlock the power of FastAPI to build high-performing web apps and APIs by taking you through the basics like routing and data validation through to advanced topics, such as custom middleware and WebSockets.

Aug 2024 11h 56m

Mastering Spring Boot 3.0

Mastering Spring Boot 3.0

This hands-on guide empowers you to develop scalable and efficient applications. You'll also learn microservices patterns, reactive programming, and security measures for building robust backend systems.

Jun 2024 8h 32m

Nuxt 3 Projects

Nuxt 3 Projects

This book is a comprehensive guide to Nuxt.js, which takes you from the basics to advanced topics. Uniquely, this book emphasizes practical, project-based learning, tackling real-world problems.

Jun 2024 7h 40m

Vue.js 3 for Beginners

Vue.js 3 for Beginners

Learning a new language by following video tutorials, blog posts, and documentation is a tiresome activity. This book will take you on an exciting journey of becoming a proficient Vue.js developer through a practical, step-by-step approach.

Sep 2024 10h 4m

Full-Stack Web Development with TypeScript 5

Full-Stack Web Development with TypeScript 5

The book emphasizes best practices, debugging, performance optimization, and scalable code structure, helping you develop practical skills in frontend and backend development, database integration, and AI integration.

Mastering Flask Web and API Development

Mastering Flask Web and API Development

The book is an introduction to Flask that will showcase its baseline, core, and advanced integration features to enable you to solve enterprise-related problems and issues in both web and API development.

Aug 2024 16h 28m