How to Scrape Gumtree Data in Easy Steps

Crawlbase - Nov 6 - - Dev Community

This blog was originally posted to Crawlbase Blog

Gumtree is one of the most popular online classifieds websites, where users can buy and sell products or services locally. Whether you're looking for cars, furniture, property, electronics, or even jobs, Gumtree has millions of listings that update regularly. With over 15 million unique monthly visitors and more than 1.5 million active ads at any time, Gumtree provides a wealth of data that can be used for price comparison, competitor analysis, or tracking trends.

In this blog, we will walk you through how to scrape Gumtree search listings and individual product pages using Python. We will also show how to store the data in CSV files for easy analysis. At the end, we’ll discuss how to optimize the process using Crawlbase Smart Proxy to avoid issues like IP blocking.

Let’s dive in!

Why Scrape Gumtree Data?

Scraping Gumtree data is useful for many things. As the leading online classifieds platform, Gumtree connects buyers and sellers for a wide range of products. Here are some reasons to scrape Gumtree:

Image showing reasons to scrape Gumtree data, which are listed below

  • Market Trend Analysis: See product prices and availability to track the market.
  • Competitor Research: Monitor competitor’s listings and pricing to stay ahead.
  • Identify Popular Products: Find trending items and high demand products.
  • Informed Business Decisions: Use data to make buying and selling choices.
  • Price Tracking: Track price changes over time to find deals or trends.
  • User Behavior Insights: Analyse listings to see what users want.
  • Enhanced Marketing Strategies: Refine your marketing based on current trends.

In the following sections, we will show you how to effectively scrape Gumtree search listings and product pages.

Key Data Points to Extract from Gumtree

When scraping Gumtree, you need to know what data to grab. Here are the key data points to focus on when scraping Gumtree:

  1. Product Title: The title of the product is usually in the main heading of the listing. This is the most important part.
  2. Price: The listing price is what the seller is asking for the product. Monitoring prices will help you work out the market value.
  3. Location: The location of the seller is usually in the listing. This is useful for understanding regional demand and supply.
  4. Description: The product description has all the details of the item, condition, features and specs.
  5. Image URL: The image URL is important for visual representation. Helps you understand the condition and appeal of the product.
  6. Listing URL: The direct link to the product page is needed to get more details or contact the seller.
  7. Date Listed: The date the listing was posted helps you track how long the item has been available and can indicate demand.
  8. Seller’s Username: The name of the seller can give you an idea of trustworthiness and reliability especially if you’re comparing multiple listings.

Setting Up Your Python Environment

Before you can start scraping Gumtree, you need to set up your Python environment. This involves installing Python and the required libraries. This will give you the tools to send requests, extract data and store it for analysis.

Installing Python and Required Libraries

First make sure you have Python installed on your machine. If you don’t have Python installed, you can download it from the official Python website. Once installed, open a terminal or command prompt and install the required libraries with pip.

Here’s a list of the key libraries needed for scraping Gumtree:

  • Requests: To send HTTP requests and receive responses.
  • BeautifulSoup: For parsing HTML and extracting data.
  • Pandas: For organizing and saving data in CSV format.

Run the following command to install these libraries:

pip install requests beautifulsoup4 pandas
Enter fullscreen mode Exit fullscreen mode

Choosing an IDE

An Integrated Development Environment (IDE) makes coding easier and more efficient. Here are some popular IDEs for Python:

  • PyCharm: A powerful, full-featured IDE with smart code assistance and debugging tools.
  • Visual Studio Code: A lightweight code editor with a wide range of extensions for Python development.
  • Jupyter Notebook: Ideal for running code in smaller chunks, making it easier to test and debug.

Once your environment is set up, let’s start scraping Gumtree listings. Next we’ll look at the HTML structure to find CSS selector of elements holding the data we need.

Scraping Gumtree Search Listings

In this section, we will learn how to scrape search listings from Gumtree. We’ll inspect the HTML structure, write the scraper, handle pagination, and store the data in a CSV file.

Inspecting the HTML for CSS Selectors

To get data from Gumtree, we first need to find the HTML elements that contain the information. Open your browser’s Developer Tools and inspect a listing.

A screenshot showing Gumtree Search Results - HTML structure

Here are some key selectors:

  • Title: Found in a <div> tag with the attribute data-q="tile-title".
  • Price: Located in a <div> tag with the attribute data-testid="price".
  • Location: Found in a <div> tag with the attribute data-q="tile-location".
  • URL: The product link is within the <a> tag's href attribute, identified by the attribute data-q="search-result-anchor".

We will use these CSS selectors to extract the required data.

Writing the Search Listings Scraper

Let’s write a function that sends a request to Gumtree, extracts the required data, and returns it.

import requests
from bs4 import BeautifulSoup

def scrape_gumtree_search(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }

    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    listings = []

    for listing in soup.select('article[data-q="search-result"]'):
        title = listing.select_one('div[data-q="tile-title"]').text.strip()
        price = listing.select_one('div[data-testid="price"]').text.strip()
        location = listing.select_one('div[data-q="tile-location"]').text.strip()
        link = listing.select_one('a[data-q="search-result-anchor"]')['href']
        listings.append({
            'title': title,
            'price': price,
            'location': location,
            'URL': f'https://www.gumtree.com{link}'
        })

    return listings
Enter fullscreen mode Exit fullscreen mode

This function extracts titles, prices, locations, and URLs from the search results page.

Handling Pagination in Gumtree

To scrape multiple pages, we need to handle pagination. The URL for subsequent pages usually contains a page parameter, such as ?page=2. We can modify the scraper to fetch data from multiple pages.

def scrape_gumtree_multiple_pages(base_url, max_pages):
    all_listings = []

    for page in range(1, max_pages + 1):
        url = f'{base_url}?page={page}'
        listings = scrape_gumtree_search(url)
        all_listings.extend(listings)

    return all_listings
Enter fullscreen mode Exit fullscreen mode

This function iterates through a specified number of pages and collects the listings from each page.

Storing Data in a CSV File

To store the scraped data, we’ll use the pandas library to write the results into a CSV file.

import pandas as pd

def save_to_csv(data, filename):
    df = pd.DataFrame(data)
    df.to_csv(filename, index=False)
Enter fullscreen mode Exit fullscreen mode

This function takes a list of listings and saves it into a CSV file with the specified filename.

Complete Code Example

Here’s the complete code to scrape Gumtree search listings, handle pagination, and save the results to a CSV file.

import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_gumtree_search(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }

    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    listings = []

    for listing in soup.select('article[data-q="search-result"]'):
        title = listing.select_one('div[data-q="tile-title"]').text.strip()
        price = listing.select_one('div[data-testid="price"]').text.strip()
        location = listing.select_one('div[data-q="tile-location"]').text.strip()
        link = listing.select_one('a[data-q="search-result-anchor"]')['href']
        listings.append({
            'title': title,
            'price': price,
            'location': location,
            'URL': f'https://www.gumtree.com{link}'
        })

    return listings

def scrape_gumtree_multiple_pages(base_url, max_pages):
    all_listings = []

    for page in range(1, max_pages + 1):
        url = f'{base_url}?page={page}'
        listings = scrape_gumtree_search(url)
        all_listings.extend(listings)

    return all_listings

def save_to_csv(data, filename):
    df = pd.DataFrame(data)
    df.to_csv(filename, index=False)

def main():
    base_url = 'https://www.gumtree.com/search?q=headset'
    max_pages = 5
    listings = scrape_gumtree_multiple_pages(base_url, max_pages)
    save_to_csv(listings, 'gumtree_listings.csv')
    print(f'Scraped {len(listings)} listings and saved to gumtree_listings.csv')

if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

This script scrapes Gumtree search listings for a product, handles pagination, and saves the data in a CSV file for further analysis.

gumtree_listings.csv Snapshot:

gumtree_listings.csv File Snapshot

Scraping Gumtree Product Pages

Now that we’ve scraped the search listings, the next step is to scrape individual product pages for more information. We will inspect the HTML structure of product pages, write the scraper, and save the data in a CSV file.

Inspecting the HTML for CSS Selectors

First, inspect the Gumtree product pages to find the HTML elements that contain the data. Open a product page in your browser and use the Developer Tools to find:

A screenshot showing Gumtree Product Pages - HTML structure

  • Product Title: Located in an <h1> tag with the attribute data-q="vip-title".
  • Price: Found inside an <h3> tag with the attribute data-q="ad-price".
  • Description: Located in a <p> tag with the attribute itemprop="description".
  • Seller Name: Inside an <h2> tag with the class seller-rating-block-name.
  • Product Image URL: Found in <img> tags within a that has the attribute data-testid="carousel", with the image URL stored in the src attribute.

    Writing the Product Page Scraper

    We’ll now create a function that takes a product page URL, fetches the page’s HTML content, and extracts the required information.

    def scrape_gumtree_product_page(url):
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
    
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.text, 'html.parser')
    
        # Extracting product details
        title = soup.select_one('h1[data-q="vip-title"]').text.strip()
        price = soup.select_one('h3[data-q="ad-price"]').text.strip()
        description = soup.select_one('p[itemprop="description"]').text.strip()
        seller_name = soup.select_one('h2.seller-rating-block-name').text.strip()
        images_url = [img['src'] for img in soup.select('div[data-testid="carousel"] img') if 'src' in img.attrs]
    
    
        return {
            'title': title,
            'price': price,
            'description': description,
            'seller_name': seller_name,
            'images_url': images_url,
            'product_url': url
        }
    

    This function sends a request to the product page URL, parses the HTML, and extracts the title, price, description, seller name, and product image URL.

    Storing Data in a CSV File

    Once we have scraped the data, we will store it in a CSV file. We can reuse the save_to_csv function we used earlier for search listings.

    import pandas as pd
    
    def save_to_csv(data, filename):
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False)
    

    Complete Code Example

    Here’s the complete code to scrape product pages, extract the required details, and store them in a CSV file.

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    def scrape_gumtree_product_page(url):
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
    
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.text, 'html.parser')
    
        # Extracting product details
        title = soup.select_one('h1[data-q="vip-title"]').text.strip()
        price = soup.select_one('h3[data-q="ad-price"]').text.strip()
        description = soup.select_one('p[itemprop="description"]').text.strip()
        seller_name = soup.select_one('h2.seller-rating-block-name').text.strip()
        images_url = [img['src'] for img in soup.select('div[data-testid="carousel"] img') if 'src' in img.attrs]
    
    
        return {
            'title': title,
            'price': price,
            'description': description,
            'seller_name': seller_name,
            'images_url': images_url,
            'product_url': url
        }
    
    def save_to_csv(data, filename):
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False)
    
    def main():
        product_urls = [
            'https://www.gumtree.com/p/bmw/bmw-1-series-118d-sport-5dr-nav-/1488114476',
            'https://www.gumtree.com/p/kia/diesel-estate-12-months-mot-px-welcome-nationwide-delivery-available/1483456978',
            # Add more product URLs here
        ]
    
        product_data = []
    
        for url in product_urls:
            product_info = scrape_gumtree_product_page(url)
            product_data.append(product_info)
    
        save_to_csv(product_data, 'gumtree_product_data.csv')
        print(f'Scraped {len(product_data)} product pages and saved to gumtree_product_data.csv')
    
    if __name__ == '__main__':
        main()
    

    This script scrapes product details from individual Gumtree product pages and saves the extracted information in a CSV file. You can add more product URLs to the product_urls list to scrape multiple pages.

    gumtree_product_data.csv Snapshot:

    https://crawlbase.com/blog/scrape-gumtree-data-easily/gumtree-product-page-inspect.jpg

    Optimizing Scraping with Crawlbase Smart Proxy

    When scraping websites like Gumtree you may run into rate limits or IP bans. To scrape smoothly and efficiently, use Crawlbase Smart Proxy. This service helps you bypass restrictions and improve your scraping.

    Benefits of Crawlbase Smart Proxy

    1. Avoid IP Blocking: Crawlbase rotates IP addresses so your requests are anonymous and you won’t get blocked.
    2. CAPTCHA Handling: It handles CAPTCHA challenges for you so you can scrape without interruptions.
    3. Faster Scraping: By using multiple IPs you can make requests quickly and gather data faster.
    4. Geolocation: Choose proxies from specific locations to scrape localized data and get more relevant results.

    Integrating Crawlbase Smart Proxy

    To use Crawlbase Smart Proxy in your Gumtree scraper, set up your requests to route through the proxy. Here’s an example of how to do this:

    import requests
    from bs4 import BeautifulSoup
    
    # Replace '_USER_TOKEN_' with your Crawlbase token
    proxy_url = "http://_USER_TOKEN_@smartproxy.crawlbase.com:8012"
    proxies = {"http": proxy_url, "https": proxy_url}
    
    def scrape_gumtree_product_page(url):
        response = requests.get(url, proxies=proxies, verify=False)
    
        if response.status_code == 200:
            soup = BeautifulSoup(response.text, 'html.parser')
    
            # Extracting product details
            title = soup.select_one('h1[data-q="vip-title"]').text.strip()
            price = soup.select_one('h3[data-q="ad-price"]').text.strip()
            description = soup.select_one('p[itemprop="description"]').text.strip()
            seller_name = soup.select_one('h2.seller-rating-block-name').text.strip()
            images_url = [img['src'] for img in soup.select('div[data-testid="carousel"] img') if 'src' in img.attrs]
    
    
            return {
                'title': title,
                'price': price,
                'description': description,
                'seller_name': seller_name,
                'images_url': images_url,
                'product_url': url
            }
        else:
            print(f"Failed to retrieve data. Status code: {response.status_code}")
            return None
    
    # Example usage
    if __name__ == "__main__":
        product_url = "https://www.gumtree.com/product-page-url"  # Replace with the actual product URL
        product_data = scrape_gumtree_product_page(product_url)
        print(product_data)
    

    In this code snippet, replace '_USER_TOKEN_' with your actual Crawlbase token. You can get one by creating an account on Crawlbase The proxies dictionary routes your requests through the Crawlbase Smart Proxy, helping you avoid blocks and maintain fast scraping speeds.

    By optimizing your Gumtree scraping process with Crawlbase Smart Proxy, you can gather data more effectively and handle larger volumes without facing common web scraping issues.

    Optimize Gumtree Scraping with Crawlbase

    Scraping Gumtree data can be very useful for your projects. In this blog we have shown how to scrape search listings and product pages using Python. By inspecting the HTML and using the Requests library you can extract useful data such as titles, prices and descriptions.

    Make sure your scraping runs smoothly by using tools like Crawlbase Smart Proxy. It will help you avoid IP blocks and maintain fast scraping speeds so you can focus on getting the data you need.

    If you're interested in exploring scraping from other e-commerce platforms, feel free to explore the following comprehensive guides.

    📜 How to Scrape Amazon
    📜 How to scrape Walmart
    📜 How to Scrape AliExpress
    📜 How to Scrape Houzz Data
    📜 How to Scrape Tokopedia

    Contact our support if you have any questions. Happy scraping.

    Frequently Asked Questions

    Q. Is it legal to scrape data from Gumtree?

    Yes, it is generally legal to scrape Gumtree data as long as you comply with their terms of service. Always check the website's policies to make sure you're not breaking any rules. Always use the scraped data responsibly and ethically.

    Q. What data can I scrape from Gumtree?

    You can scrape various types of data from Gumtree, including product titles, prices, descriptions, images, and seller information. This data can help you analyze market trends or compare prices across different listings.

    Q. How can I avoid getting blocked while scraping?

    To avoid getting blocked while scraping, consider using a rotating proxy service like Crawlbase Smart Proxy. This will help you manage IP addresses so your scraping looks like regular user behavior. Also, implement delays between requests to reduce the chances of getting blocked.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player