Groupon Scraper: Find the Hottest Deals and Coupons with Python

Crawlbase - Aug 26 - - Dev Community

This blog was originally posted to Crawlbase Blog

If you're looking for great deals on products, experiences, and coupons, Groupon is a top platform. With millions of active users and thousands of daily deals, Groupon helps people save money while enjoying activities like dining, travel, and shopping. By scraping Groupon, you can access valuable data on these deals, helping you stay updated on the latest offers or even build your own deal-tracking application.

In this blog, we’ll explore how to build a powerful Groupon Scraper in Python to find the hottest deals and coupons. Given that Groupon uses JavaScript to dynamically render its content, simple scraping methods won’t work efficiently. To handle this, we’ll leverage the Crawlbase Crawling API, which seamlessly deals with JavaScript rendering and other challenges.

Let’s dive in and learn how to Scrape Groupon for deals and coupons, step by step.

Why Scrape Groupon Deals and Coupons?

Scraping Groupon deals and coupons helps you keep track of the newest discounts and offers. Groupon posts many deals each day, making it hard to check them all by hand. A good Groupon Scraper does this job for you, gathering and studying offers in areas like food, travel, electronics, and more.

Through Groupon Scraping, you can pull out essential info such as what the deal is, how much it costs, how big the discount is, and when it ends. This has benefits for businesses that want to watch what their rivals offer, developers creating a site that lists deals, or anyone who just wants to find the best bargains.

We aim to scrape Groupon deals and coupons productively, pulling out all the essential info while tackling issues like content that loads on its own. Because Groupon relies on JavaScript to show its content, regular scraping methods need help getting the data. This is where our solution, powered by the Crawlbase Crawling API, comes in handy. It lets us collect deals without breaking a sweat by getting around these common roadblocks.

In the following parts, we'll look at the key pieces of info to pull from Groupon and get our setup ready for a smooth data collection process.

Key Data Points to Extract from Groupon

When you're using a Groupon Scraper, you need to pinpoint the critical data that makes your scraping work count. Groupon has tons of deals in different categories, and pulling out the correct info can help you get the most from your scraping project. Here's what you should focus on when you scrape Groupon:

  1. Deal Titles: The name or title of a deal grabs attention first. It gives a quick idea of what's on offer.
  2. Deal Descriptions: In-depth descriptions offer more details about the product or service, helping people understand what the offer includes.
  3. Original and Discounted Prices: These play a crucial role in understanding the available savings. By getting both the original price and the discounted price, you can work out the percentage of savings.
  4. Discount Percentage: Many Groupon deals show the percentage of discounts right away. Getting this data point saves you time in figuring out the savings yourself.
  5. Deal Expiry Date: Knowing when a deal ends helps to filter out old offers. Getting the expiry date makes sure you look at active deals.
  6. Deal Location: Certain offers apply to specific areas. Getting location info lets you sort deals by region, which helps a lot with local marketing efforts.
  7. Deal Category: Groupon puts deals into groups like food, travel, electronics, and so on. Grabbing category details makes it simple to break down the deals for study or display.
  8. Ratings and Reviews: What customers say and how they score deals shows how popular and trustworthy an offer is. This info proves helpful in judging the quality of deals.

By zeroing in on these key bits of data, you can make sure your Groupon Scraping gives you info you can use and that matters. The next parts will show you how to set up your tools and build a scraper that can pull deals from Groupon in a good way.

Crawlbase Crawling API for Groupon Scraping

Working on a Groupon Scraper project can be tough when you need to deal with content that changes and JavaScript that loads stuff. Groupon's website uses a lot of JavaScript to show deals and offers, so you will need more than just making simple requests to get you the data you want. This is where the Crawlbase Crawling API comes in handy. The Crawlbase Crawling API helps you avoid these issues and extract data from Groupon without running into problems with JavaScript loading, CAPTCHA, or IP blocking.

Why Use the Crawlbase Crawling API?

  1. Handle JavaScript Rendering: The biggest hurdle when you grab deals from Groupon is to handle content that JavaScript creates. Crawlbase's API takes care of JavaScript, which allows you to pull data.
  2. Avoid IP Blocking and CAPTCHAs: If you scrape too much, Groupon might stop your IP or throw up CAPTCHAs. Crawlbase changes IPs on its own and beats CAPTCHAs, so you can keep pulling Groupon data non-stop.
  3. Easy Integration: You can add the Crawlbase Crawling API to your Python code without much trouble. This lets you focus on getting the data you need, while the API handles the tricky stuff in the background.
  4. Scalable Scraping: Crawlbase offers flexible choices to handle Groupon scraping projects of any size. You can use it to gather small datasets or to carry out large-scale data collection efforts.

Crawlbase Python Library

Crawlbase offers its own Python library to help its customers. You need an access token to authenticate when you use it. You can get this token after you create an account.

Here's an example function that shows how to use the Crawling API from the Crawlbase library to send requests.

from crawlbase import CrawlingAPI

crawling_api = CrawlingAPI({ 'token': 'YOUR_CRAWLBASE_TOKEN' })

def make_crawlbase_request(url):
  response = crawling_api.get(url)

  if response['headers']['pc_status'] == '200':
    html_content = response['body'].decode('utf-8')
    return html_content
  else:
    print(f"Failed to fetch the page. Crawlbase status code: {response['headers']['pc_status']}")
    return None
Enter fullscreen mode Exit fullscreen mode

Note: Crawlbase offers two token types: a Normal Token for static sites and a JavaScript (JS) Token for dynamic or browser-based requests. For Groupon, you'll need a JS Token. You can start with 1,000 free requests, no credit card needed. Check out the Crawlbase Crawling API docs here.

Next up, we'll walk you through setting up Python and building Groupon scrapers that uses the Crawlbase Crawling API to handle JavaScript and other scraping challenges. Let's jump into the setup process.

Setting Up Your Python Environment

Before we start writing the Groupon Scraper, we need to create a solid Python setup. Follow the following steps.

Installing Python

First, you'll need Python on your computer to scrape Groupon. You can get the newest version of Python from python.org.

Setting Up a Virtual Environment

We suggest using a virtual environment to keep different projects from clashing. To make a virtual environment, run these commands:

# Create a virtual environment
python -m venv groupon_env

# Activate the virtual environment
# On Windows:
groupon_env\Scripts\activate

# On macOS/Linux:
source groupon_env/bin/activate
Enter fullscreen mode Exit fullscreen mode

This keeps your project's dependencies separate and makes them easier to manage.

Installing Required Libraries

Now, install the required libraries inside the virtual environment:

pip install crawlbase beautifulsoup4
Enter fullscreen mode Exit fullscreen mode

Here’s a brief overview of each library:

  • crawlbase: The main library for sending requests using the Crawlbase Crawling API, which handles JavaScript rendering needed to scrape Groupon.
  • pandas: To store and manage the scraped data.
  • beautifulsoup4: To parse and navigate through the HTML structure of Groupon pages.

Choosing the Right IDE

You can write your code in any text editor, but using an Integrated Development Environment (IDE) can make coding easier. Some popular IDEs include VS Code, PyCharm, and Jupyter Notebook. These tools have features that help you code better, like highlighting syntax, completing code, and finding bugs. These features come in handy when you're building a Groupon Scraper.

Now that you've set up your environment and have your tools ready, you can start writing the scraper. In the next section, we'll create a Groupon deals scraper.

Scraping Groupon Deals

In this part, we'll explain how to get deals from Groupon with Python and the Crawlbase Crawling API. Groupon uses JavaScript rendering and scroll-based pagination so simple scraping methods don't work. We'll use Crawlbase's Crawling API, which handles JavaScript and scroll pagination without a hitch.

The URL we’ll scrape is: https://www.groupon.com/local/washington-dc

Inspecting the HTML Structure

Before writing the code, it’s crucial to inspect the HTML structure of Groupon’s deals page. This helps you determine the correct CSS selectors needed to extract the data.

Visit the URL: Open the URL in your browser.
Open Developer Tools: Right-click and select "Inspect" to open Developer Tools.

Identify Key Elements: Groupon deal listings are typically found within <div> elements with the class cui-content. Each deal has the following details:

  • Title: Found within a <div> tag with the class cui-udc-title.
  • Link: The link is contained within the href attribute of the <a> tag.
  • Original Price: Displayed in a <div> with the class cui-price-discount-original.
  • Discount Price: Displayed in a <div> with the class cui-price-discount.
  • Location: Optional, usually in a <span> with the class cui-location-name.

Writing the Groupon Scraper

We'll begin by coding a simple function to get the deal info from the page. We'll use the Crawlbase Crawling API to handle dynamic content loading because Groupon relies on JavaScript for rendering.

Here’s the code:

from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import json

# Initialize CrawlingAPI with your access token
crawling_api = CrawlingAPI({'token': 'CRAWLBASE_TOKEN'})

def scrape_groupon_with_pagination(base_url):
    options = {
        'ajax_wait' : 'true',
        'page_wait': '5000'
    }

    response = crawling_api.get(base_url, options)
    if response['headers']['pc_status'] == '200':
        html_content = response['body'].decode('utf-8')

        soup = BeautifulSoup(html_content, 'html.parser')
        deals = soup.find_all('div', class_='cui-content')
        all_deals = []

        for deal in deals:
            title = deal.find('div', class_='cui-udc-title').text.strip() if deal.find('div', class_='cui-udc-title') else ''
            link = deal.find('a')['href'] if deal.find('a') else ''
            original_price = deal.find('div', class_='cui-price-original').text.strip() if deal.find('div', class_='cui-price-original') else ''
            discounted_price = deal.find('div', class_='cui-price-discount').text.strip().encode("ascii", "ignore").decode("utf-8") if deal.find('div', class_='cui-price-discount') else ''
            location = deal.find('span', class_='cui-location-name').text.strip() if deal.find('span', class_='cui-location-name') else ''

            all_deals.append({
                'title': title,
                'original_price': original_price,
                'discounted_price': discounted_price,
                'link': link,
                'location': location
            })

        return all_deals
    else:
        print(f"Failed to fetch data. Status code: {response['headers']['pc_status']}")
        return None
Enter fullscreen mode Exit fullscreen mode

The options parameter includes settings like ajax_wait for handling asynchronous content loading and page_wait to wait 5 seconds before scraping, allowing all elements to load properly. You can read about Crawlbase Crawling API parameters here.

Handling Pagination

Groupon uses scroll-based pagination to load additional deals dynamically. To capture all the deals, we’ll leverage the scroll and scroll_interval options in the Crawlbase Crawling API.

  • scroll=true: Enables scroll-based pagination.
  • scroll_interval=10: Sets the scroll time to 10 seconds (60 max allowed).

Here’s how you can integrate it:

def scrape_groupon_with_pagination(url):
    options = {
        'ajax_wait' : 'true',
        'scroll': 'true',
        'scroll_interval': '10'
    }

    response = crawling_api.get(url, options)
    if response['headers']['pc_status'] == '200':
        html_content = response['body'].decode('utf-8')

        soup = BeautifulSoup(html_content, 'html.parser')
        deals = soup.find_all('div', class_='cui-content')
        all_deals = []

        for deal in deals:
            title = deal.find('div', class_='cui-udc-title').text.strip() if deal.find('div', class_='cui-udc-title') else ''
            link = deal.find('a')['href'] if deal.find('a') else ''
            original_price = deal.find('div', class_='cui-price-original').text.strip() if deal.find('div', class_='cui-price-original') else ''
            discounted_price = deal.find('div', class_='cui-price-discount').text.strip().encode("ascii", "ignore").decode("utf-8") if deal.find('div', class_='cui-price-discount') else ''
            location = deal.find('span', class_='cui-location-name').text.strip() if deal.find('span', class_='cui-location-name') else ''

            all_deals.append({
                'title': title,
                'original_price': original_price,
                'discounted_price': discounted_price,
                'link': link,
                'location': location
            })

        return all_deals
    else:
        print(f"Failed to fetch data. Status code: {response['headers']['pc_status']}")
        return None
Enter fullscreen mode Exit fullscreen mode

In this function, we’ve added scroll-based pagination handling using Crawlbase’s options, ensuring max available deals are captured.

Storing Data in a JSON File

Once you’ve collected the data, it’s easy to store it in a JSON file:

import json

def save_to_json(data, filename='groupon_deals.json'):
    with open(filename, 'w') as file:
        json.dump(data, file, indent=4)
    print(f"Data saved to {filename}")

# Example usage after scraping
if deals:
    save_to_json(deals)
Enter fullscreen mode Exit fullscreen mode

Complete Code Example

Here’s the full code combining everything discussed:

from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import json

# Initialize CrawlingAPI with your access token
crawling_api = CrawlingAPI({'token': 'CRAWLBASE_TOKEN'})

def scrape_groupon_with_pagination(url):
    options = {
        'ajax_wait' : 'true',
        'scroll': 'true',
        'scroll_interval': '60'
    }

    response = crawling_api.get(url, options)
    if response['headers']['pc_status'] == '200':
        html_content = response['body'].decode('utf-8')

        soup = BeautifulSoup(html_content, 'html.parser')
        deals = soup.find_all('div', class_='cui-content')
        all_deals = []

        for deal in deals:
            title = deal.find('div', class_='cui-udc-title').text.strip() if deal.find('div', class_='cui-udc-title') else ''
            link = deal.find('a')['href'] if deal.find('a') else ''
            original_price = deal.find('div', class_='cui-price-original').text.strip() if deal.find('div', class_='cui-price-original') else ''
            discounted_price = deal.find('div', class_='cui-price-discount').text.strip().encode("ascii", "ignore").decode("utf-8") if deal.find('div', class_='cui-price-discount') else ''
            location = deal.find('span', class_='cui-location-name').text.strip() if deal.find('span', class_='cui-location-name') else ''

            all_deals.append({
                'title': title,
                'original_price': original_price,
                'discounted_price': discounted_price,
                'link': link,
                'location': location
            })

        return all_deals
    else:
        print(f"Failed to fetch data. Status code: {response['headers']['pc_status']}")
        return None

def save_to_json(data, filename='groupon_deals.json'):
    with open(filename, 'w') as file:
        json.dump(data, file, indent=4)
    print(f"Data saved to {filename}")

if __name__ == "__main__":
    url = 'https://www.groupon.com/local/washington-dc'
    deals = scrape_groupon_with_pagination(url)

    if deals:
        save_to_json(deals)
Enter fullscreen mode Exit fullscreen mode

Test the Scraper:

Create a new file named groupon_deals_scraper.py, copy the code provided into this file, and save it. Run the Script using Following command:

python groupon_deals_scraper.py
Enter fullscreen mode Exit fullscreen mode

You should see output similar to the example below in JSON file.

[
    {
        "title": "Chimney Pro",
        "original_price": "$400",
        "discounted_price": "$69",
        "link": "https://www.groupon.com/deals/chimney-pro-1-6",
        "location": ""
    },
    {
        "title": "Spa World",
        "original_price": "$40",
        "discounted_price": "$35",
        "link": "https://www.groupon.com/deals/spa-world-26",
        "location": "Centreville"
    },
    {
        "title": "Kings Dominion",
        "original_price": "$79.99",
        "discounted_price": "$42.99",
        "link": "https://www.groupon.com/deals/gl-kings-dominion-amusement-park",
        "location": "Kings Dominion"
    },
    {
        "title": "30% Off First 5 Weeks + Free Shipping (Blue Apron Coupon)",
        "original_price": "",
        "discounted_price": "",
        "link": "https://www.groupon.com/deals/cpn-blueapron-q3sl",
        "location": ""
    },
    {
        "title": "Valvoline Instant Oil Change - VA",
        "original_price": "$50.99",
        "discounted_price": "$39.99",
        "link": "https://www.groupon.com/deals/valvoline-instant-oil-change-dc-4",
        "location": "Multiple Locations"
    },
    .... more
]
Enter fullscreen mode Exit fullscreen mode

Scraping Groupon Coupons

In this part, we'll learn how to get coupons from Groupon with Python and the Crawlbase Crawling API. Groupon's coupon page looks a bit different from its deals page so we need to look at the HTML structure. We'll use the Crawlbase API to get coupon titles descriptions when they expire, and their links.

We’ll scrape this URL: https://www.groupon.com/coupons/amazon

Inspecting the HTML Structure

To scrape Groupon coupons effectively, it’s essential to identify the key HTML elements that hold the data:

Visit the URL: Open the URL in your browser.

Open Developer Tools: Right-click on the webpage and choose "Inspect" to open the Developer Tools.

Locate the Coupon Containers: Groupon’s coupon listings are usually within <div> tags with the class coupon-offer-tile. Each coupon block contains:

  • Title: Found inside an <h2> element with the class coupon-tile-title.
  • Callout: The Callout is within the <div> element with the class coupon-tile-callout.
  • Description: Usually found in a <p> with the class coupon-tile-description.
  • Coupon Type: Found inside a <span> tag with the class coupon-tile-type.

Writing the Groupon Coupon Scraper

We’ll write a function that uses the Crawlbase Crawling API to handle dynamic content rendering and pagination while scraping the coupon data. Here’s the implementation:

from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import json

# Initialize the Crawlbase CrawlingAPI with your access token
crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})

def scrape_groupon_coupons(url):
    options = {
        'ajax_wait' : 'true',
        'page_wait': '5000'
    }

    response = crawling_api.get(url, options)

    if response['headers']['pc_status'] == '200':
        html_content = response['body'].decode('utf-8')

        soup = BeautifulSoup(html_content, 'html.parser')
        coupons = soup.select('li.coupons-list-row > div.coupon-offer-tile')

        scraped_coupons = []
        for coupon in coupons:
            title = coupon.find('h2', class_='coupon-tile-title').text.strip().encode("ascii", "ignore").decode("utf-8") if coupon.find('h2', class_='coupon-tile-title') else ''
            callout = coupon.find('div', class_='coupon-tile-callout').text.strip() if coupon.find('div', class_='coupon-tile-callout') else ''
            description = coupon.find('p', class_='coupon-tile-description').text.strip().encode("ascii", "ignore").decode("utf-8") if coupon.find('p', class_='coupon-tile-description') else ''
            type = coupon.find('span', class_='coupon-tile-type').text.strip() if coupon.find('span', class_='coupon-tile-type') else ''

            scraped_coupons.append({
                'title': title,
                'callout': callout,
                'description': description,
                'type': type
            })

        return scraped_coupons
    else:
        print(f"Failed to retrieve data. Status code: {response['headers']['pc_status']}")
        return None
Enter fullscreen mode Exit fullscreen mode

Storing Data in a JSON File

Once you have the coupon data, you can store it in a JSON file for easy access and analysis:

def save_coupons_to_json(data, filename='groupon_coupons.json'):
    with open(filename, 'w') as file:
        json.dump(data, file, indent=4)
    print(f"Coupon data saved to {filename}")
Enter fullscreen mode Exit fullscreen mode

Complete Code Example

Here is the complete code for scraping Groupon coupons:

from crawlbase import CrawlingAPI
from bs4 import BeautifulSoup
import json

# Initialize the Crawlbase CrawlingAPI with your access token
crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})

def scrape_groupon_coupons(url):
    options = {
        'ajax_wait' : 'true',
        'page_wait': '5000'
    }

    response = crawling_api.get(url, options)

    if response['headers']['pc_status'] == '200':
        html_content = response['body'].decode('utf-8')

        soup = BeautifulSoup(html_content, 'html.parser')
        coupons = soup.select('li.coupons-list-row > div.coupon-offer-tile')

        scraped_coupons = []
        for coupon in coupons:
            title = coupon.find('h2', class_='coupon-tile-title').text.strip().encode("ascii", "ignore").decode("utf-8") if coupon.find('h2', class_='coupon-tile-title') else ''
            callout = coupon.find('div', class_='coupon-tile-callout').text.strip() if coupon.find('div', class_='coupon-tile-callout') else ''
            description = coupon.find('p', class_='coupon-tile-description').text.strip().encode("ascii", "ignore").decode("utf-8") if coupon.find('p', class_='coupon-tile-description') else ''
            type = coupon.find('span', class_='coupon-tile-type').text.strip() if coupon.find('span', class_='coupon-tile-type') else ''

            scraped_coupons.append({
                'title': title,
                'callout': callout,
                'description': description,
                'type': type
            })

        return scraped_coupons
    else:
        print(f"Failed to retrieve data. Status code: {response['headers']['pc_status']}")
        return None

def save_coupons_to_json(data, filename='groupon_coupons.json'):
    with open(filename, 'w') as file:
        json.dump(data, file, indent=4)
    print(f"Coupon data saved to {filename}")

if __name__ == "__main__":
    url = 'https://www.groupon.com/coupons/amazon'
    coupons = scrape_groupon_coupons(url)

    if coupons:
        save_coupons_to_json(coupons)
Enter fullscreen mode Exit fullscreen mode

Test the Scraper:

Save the code to a file named groupon_coupons_scraper.py. Run the script using the following command:

python groupon_coupons_scraper.py
Enter fullscreen mode Exit fullscreen mode

After running the script, you should find the coupon data saved in a JSON file named groupon_coupons.json.

[
    {
        "title": "Amazon Promo Code",
        "callout": "Promo Code",
        "description": "Click here and save with coupons and promo codes on household, beauty, and, well, EVERYTHING else Amazon sells",
        "type": "Coupon Code"
    },
    {
        "title": "Amazon Prime Exclusive Promo Codes",
        "callout": "Up to 80% Off",
        "description": "A ton of Amazon promo codes, coupons, and more are right this way. New deals added daily!",
        "type": "Coupon Code"
    },
    {
        "title": "Up to 65% OFF Amazon Promo Code",
        "callout": "Up to 65% Off",
        "description": "Save up to 65% on daily  deals. Click here for the most up-to-date listings and availability.",
        "type": "Coupon Code"
    },
    {
        "title": "Spend $50, Save 15%",
        "callout": "15% Off",
        "description": "This one is dead simple. Spend $50 on Amazon products and take 15% off your purchase total. No promo code required!",
        "type": "Promo"
    },
    {
        "title": " & above | Amazon Promo & Coupon Codes",
        "callout": "Promo Code",
        "description": "A big bunch of vetted and accurate promo codes for a huge variety of top-rated products. This is the real deal!",
        "type": "Coupon Code"
    },
    .... more
]
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

Building a Groupon scraper helps you stay in the loop about the best deals and coupons. Python and the Crawlbase Crawling API let you scrape Groupon pages without much trouble. You can handle dynamic content and pull out useful data.

This guide showed you how to set up your environment, write the Groupon deals and coupons scraper, deal with pagination, and save your data. A well-designed Groupon scraper can automate the process if you want to track deals in a specific place or find the newest coupons.

If you're looking to expand your web scraping capabilities, consider exploring our following guides on scraping other important websites.

📜 How to Scrape Google Finance
📜 How to Scrape Google News
📜 How to Scrape Google Scholar Results
📜 How to Scrape Google Search Results
📜 How to Scrape Google Maps
📜 How to Scrape Yahoo Finance
📜 How to Scrape Zillow

If you have any questions or feedback, our support team is always available to assist you on your web scraping journey. Happy Scraping!

Frequently Asked Questions

Q. Is scraping Groupon legal?

Scraping Groupon doesn't break the rules if you do it for yourself and stick to what the site allows. But make sure to look at Groupon's rules to check if what you're doing is okay. If you want to scrape Groupon data for commercial purposes, you should ask the website first so you don't get into trouble.

Q. Why use the Crawlbase Crawling API instead of simpler methods?

Groupon depends a lot on JavaScript to show content. Regular scraping tools like requests and BeautifulSoup can't handle this. The Crawlbase Crawling API helps get around these problems. It lets you grab deals and coupons even when there's JavaScript and you need to scroll to see more items.

Q. How can I store scraped Groupon data?

You have options to keep Groupon data you've scraped in different formats like JSON, CSV, or even a database. In this guide, we've focused on saving data in a JSON file because it's easy to handle and works well for most projects. JSON also keeps the structure of the data intact, which makes it simple to analyze later.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player