Today we're going to detect a face using AI in Python!

I wanted to go on this adventure because it's always impressed me how folks could build such things... but I never knew how they did it.

Over the past few weeks I've been watching some videos and reading some articles about how ML works, and I happened across some tools that make it super easy to set up. It conveniently coincides with my personal project from last summer of learning Python - I wrote most of my code in Java & Javascript and I'd never really taken the time to try it out, and I absolutely fell in love with it when I finally took the time to focus on learning about it!

Why I'm amazed by this project

It's tiny. Less than 60 lines of code. And that little bit of logic does SO MUCH... connect to a camera, detect a face in an image, verify that it's the same face as a reference image... it's bonkers to me that we live in a time where it's possible to do all that with so little effort. Yeah, I know there are mountains of libraries underneath the project... but the fact that I could set it all up and make it work for free, in less than an hour... like I said, it's bonkers.
It's free. So many things we all want to explore in AI/ML are being monetized... yeah you can play with ChatGPT in the OpenAI website for free, but if you want the newer model or if you want to connect to it from code, you have to pay to play. This facial recognition app was a super fun project to do at no cost - if your laptop runs Python 3, you can try this out without spending a penny.
It's simple. This is likely a testament to Python's language structure, but it is SO easy to use very complex concepts. That's why it takes so little code to write, perhaps - but the code patterns are so easy to read that even if you didn't know what was going on, you could make a pretty accurate guess from a quick skim.

TL:DR version - Youtube!

If reading isn't your bag, well... I wonder what you're doing on a blog to begin with 😜 but don't fear, I've got you a fancy YouTube version of this post! Like and subscribe to tell me you want more of these!

How AI Works

To get started with our understanding, we have to talk about Vectors.

A vector is just what we learned about way back in math & physics classes... think of it as a coordinate system for your data.

What makes ML work is that these vectors come in many, many dimensions. This "plots" the data we input to our model in that n-dimensional space, and we can then use the vectors to figure out what concepts are "close to" other data. This is a gross simplification of the idea, but enough for us to easily grasp that we can now search for "similarity" rather than "exact matching".

You'll often hear these data-created vectors referred to as "embeddings". Embeddings are then passed to a pre-trained "model" to be compared.

Quick note on Similarity

Similarity is important because data don't always directly relate to each other. In the case of image processing, imagine that you took two pictures of the Lincoln Memorial in Washington, D.C. Even if you snapped them from the same position at the same angle with all the same settings on the same camera, the images will have some variation. Maybe a cloud moved and changed the sunlight intensity... or a tremor caused by a passing tour bus jostled the camera ever so slightly. It's practically impossible to get two exact-match photographs (without just taking one and copy/pasting, of course).

Thus if I wanted to know if two pictures both contain the Lincoln Memorial... it's really hard to code! I have to account for all that potential variation... and imagine if the images were from different angles or in different lighting! The code to solve that problem just got much, much harder.

This is where those vectors and searching for similarity come into the discussion: if I train a model to recognize the Lincoln Memorial from various angles, in various lighting, with various devices... I can eventually bring in any random photo and the model will be aware of enough patterns to determine whether the new photo contains the monument! It will do this by converting my image into vectors, and then checking to see if these vectors proximate to those of the Lincoln Memorial that it's already learned. It can then give me a probability of this new image containing a Lincoln Memorial pic, and then I can set my threshold for how high that probability must be in order for me to trust it.

Sounds daunting, doesn't it?

Well... it isn't. You see, the model has already been created and is freely available. So all that's left to do is compare the data points! Let's dive into some Python and see how we can use it:

Here's a Git Repo you can use!

(Leave me a star if you use it and like it! 😀)

Structure of the code

# Import libraries

# Set up reference image variables & camera device access

# Infinite loop
#  Every once in a while, grab a frame
#  Check to see if that frame matches the reference image

# Quit/Cleanup

Some interesting points from the code:

OpenCV uses BGR, not RGB, as its color scheme, so pay close attention to your docs!

cv2.putText(frame, "NO MATCH!!!", (20,450), cv2.FONT_HERSHEY_SIMPLEX, 2,(0,0,255),3)

Note that this example will be RED, not BLUE!

def check_face(frame):
    global face_match
    try:
        result = DeepFace.verify(frame, reference_img, model_name="Facenet")
        face_match = result['verified']
    except ValueError:
        face_match = False

This is all it takes to compare two images. All the heavy lifting we talked about is handled inside the DeepFace library - it's perfectly ok to not have to build it all yourself! 😁

if counter % 30 == 0:

In my example we only run this every 30th frame. If you forget to throttle this you're likely to spin your computer out of control validating every single frame that passes! Verify responsibly!

Wrapping up

I hope you enjoyed this little peek into the facial recognition capabilities that are available to you as a programmer. You can make some amazing things happen with very little work! What will you build, now that you understand the basics?

The Adventures of Blink #20: Facial Recognition with Python