When working on web scraping or automation, encountering HTTP errors can be frustrating, and HTTP error 406 is one that indicates a mismatch in the type of content being requested.
In this article, we’ll explore what HTTP 406 means, the common causes behind it, and whether it could be used as a blocking strategy. We’ll also dive into how Scrapfly can help you bypass this error effectively.
What is HTTP Error 406?
406 Not Acceptable
occurs when the server is unable to deliver a response in a format that matches the criteria defined by the client's Accept-
headers. Essentially, the server understands the request, but it cannot find a response that fits the content types or formats that the client is willing to accept.
What are HTTP 406 Error Causes?
The most common cause of a 406 error is misconfigured Accept-
headers. These headers tell the server what content types the client expects in the response, such as:
-
Accept : Specifies the expected media type, like
application/json
ortext/html
. -
Accept-Language : Indicates the preferred languages for the response, e.g.,
en-US
. -
Accept-Encoding : Defines the compression formats that the client can handle, like
gzip
ordeflate
.
If the server cannot provide a response that matches the specified Accept-
headers, it will return a 406 status code.
Practical Example
Let's explore how to configure headers, specifically Accept-
headers, in common tools like python's httpx library, and cURL.
curl -H "Accept: application/json" -H "Accept-Language: en-US" https://httpbin.dev/json
import httpx
url = "https://httpbin.dev/json"
headers = {
"Accept": "application/json", # Expecting JSON response
"Accept-Language": "en-US", # Preferring English
}
response = httpx.get(url, headers=headers)
print(response.status_code)
print(response.text)
In both examples, the client is requesting a response in application/json
format and prefers the response language in en-US
. If the server cannot match these criteria, a 406 error might occur.
To avoid 406 errors, ensure that your Accept-
headers are set appropriately for the resource you're trying to access.
Can 406 Mean Blocking?
Although HTTP error 406 typically relates to Accept-
headers being set appropriately, it’s worth noting that error codes are not always used consistently by websites. In rare cases, websites might misconfigure their responses or use 406 as a way to block certain requests.
While it’s unlikely that a 406 error code means you’re being blocked, it's still good practice to test the request with rotating proxies or by adjusting the content type. For more blocking bypass try these two popular tools:
- curl-impersonate - can enhance cURL client fingerprint to mimic a real web browser.
- undetected-chromedriver - can improve your Selenium scrapers to resists browser fingerprinting.
Bypass 406 Blocks with Scrapfly
It is unlikely for a 406 error to mean you are being blocked. But if it does, Scrapfly will handle it for you!
ScrapFly provides web scraping, screenshot, and extraction APIs for data collection at scale.
- Anti-bot protection bypass - scrape web pages without blocking!
- Rotating residential proxies - prevent IP address and geographic blocks.
- JavaScript rendering - scrape dynamic web pages through cloud browsers.
- Full browser automation - control browsers to scroll, input and click on objects.
- Format conversion - scrape as HTML, JSON, Text, or Markdown.
- Python and Typescript SDKs, as well as Scrapy and no-code tool integrations.
It takes Scrapfly several full-time engineers to maintain this system, so you don't have to!
Summary
HTTP 406 errors are caused by a mismatch between the Accept-
headers sent by the client and the formats the server can deliver. While unlikely, these errors can sometimes be used as a blocking mechanism. Using Scrapfly’s advanced tools, including proxy rotation and customizable requests, you can bypass 406 blocks and keep your web scraping running smoothly.