Beautiful Soup is a Python library used for web scraping purposes to pull the data out of HTML and XML files. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying the parse tree. It creates a parse tree from the HTML or XML file that can be used to extract data easily.
The “BeautifulSoup4” you mentioned is a specific version of the Beautiful Soup library. It’s also referred to as Beautiful Soup 4 or simply BS4. To use Beautiful Soup 4 in your Python project, you typically need to install it first using a package manager like pip. You can install it like this:
pip install beautifulsoup4
Once you have Beautiful Soup installed, you can use it in your Python code to parse and manipulate HTML or XML documents. Here’s a simple example of how to use Beautiful Soup to parse an HTML document:
from bs4 import BeautifulSoup
# Sample HTML content
html_content = """
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<h1>Hello, Beautiful Soup!</h1>
<p>This is a sample HTML document.</p>
</body>
</html>
"""
# Create a Beautiful Soup object
soup = BeautifulSoup(html_content, 'html.parser')
# Extract title
title = soup.title
print("Title:", title.text)
# Extract the first paragraph
paragraph = soup.p
print("First Paragraph:", paragraph.text)
# Find all the paragraphs
paragraphs = soup.find_all('p')
for p in paragraphs:
print("Paragraph:", p.text)
This code snippet demonstrates some basic operations with Beautiful Soup, such as accessing elements, extracting text, and searching for specific elements by tag name.
Beautiful Soup is a powerful tool for web scraping and data extraction from web pages, and it’s widely used in web scraping projects.