Get HTTP response codes with Python

petercour - Jul 16 '19 - - Dev Community

You can get HTTP response codes with Python. This is great to find broken urls or redirects.

The web is dynamic. It's ever changing. Today the popular browsers are Chrome and Firefox. It used to be Netscape and Internet Explorer.

Because it's ever changing, you want to test a website for broken links (404), redirects and other errors. You can do this in Python.

To speed up testing, you want to be using threading. A thread allows you "parallel" execution. Parallel is between quotes because it's not really parallel, but multi-threaded.

So what does that look like in code?

#!/usr/bin/python3
import time
import urllib.request
from threading import Thread

class GetUrlThread(Thread):
    def __init__(self, url):
        self.url = url
        super(GetUrlThread, self).__init__()    

    def run(self):
        resp = urllib.request.urlopen(self.url)
        print(self.url, resp.getcode())

def get_responses():
    urls = ['https://dev.to', 'https://www.ebay.com', 'https://www.github.com']
    start = time.time()
    threads = []
    for url in urls:
        t = GetUrlThread(url)
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    print("Elapsed time: %s" % (time.time()-start))

get_responses()

Run to test every url in urls:

https://www.github.com 200
https://dev.to 200
https://www.ebay.com 200
Elapsed time: 0.496950626373291

A response of 200 means everything is okey. If a page is not found it returns 404. Basically every code other than 200 means there's a problem.

Related links

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player