Did you know you can send and receive media using the Twilio API for WhatsApp? When I found out I wanted to make something fun with it, so why not combine it with AWS Rekognition to work out if I look like any celebrities?
By the end of this post, you'll know how to build an app that lets you send an image to a WhatsApp number, download the image, analyse the image with the AWS Rekognition API and respond to say whether there are any celebrities in the picture.
What you'll need
To build this application you'll need a few things:
- A Twilio account, sign up for a free one here
- An AWS account
- Ruby and Bundler installed
- ngrok to help us test our webhooks
Got all that? Let's get started then.
Application basics
When Twilio receives a WhatsApp message it will send an HTTP request, a webhook, to a URL we provide. We need to build an application that can receive those webhooks, process the image using the AWS Rekognition service and then send a message back in the response to Twilio.
Create yourself a directory to build your application in and initialize a new Gemfile
with bundler:
mkdir celebrity-spotting
cd celebrity-spotting
bundle init
Open up the Gemfile
and add the gems we're going to use for this application:
# frozen_string_literal: true
source "https://rubygems.org"
gem "sinatra", require: "sinatra/base"
gem "aws-sdk"
gem "envyable"
gem "down"
gem "twilio-ruby"
We're going to use Sinatra as the web framework to receive the incoming webhooks from Twilio. We'll need the AWS SDK to communicate with the Rekognition service. Envyable is to store our credentials in environment variables in development. Down is a gem that makes it really easy to download files. And the twilio-ruby gem will be used to generate TwiML so that we can communicate back to Twilio in the response.
Run bundle install
to install the gems then create the other files we'll need for this app: app.rb
, config.ru
and config/env.yml
. That's the preparation complete, let's start building the application.
Building the app
We'll use config.ru
to load and run the application. Add the following code to config.ru
:
require "bundler"
Bundler.require
Envyable.load("./config/env.yml") unless ENV["RACK_ENV"] == "production"
require "./app.rb"
run CelebritySpotting
This requires all the dependencies defined in the Gemfile
, loads our config into the environment using Envyable
and then loads and runs the application. Next, let's create the CelebritySpotting
app.
Open app.rb
and create a new class:
class CelebritySpotting < Sinatra::Base
end
We need a path to an endpoint that we can provide as our webhook URL. By default Twilio makes a POST
request, so our endpoint will respond to POST
requests:
class CelebritySpotting < Sinatra::Base
post "/messages" do
end
end
We're going to be returning TwiML, so we'll create a new Twilio::TwiML::MessagingResponse
and set the content type header to application/xml
:
class CelebritySpotting < Sinatra::Base
post "/messages" do
content_type "application/xml"
twiml = Twilio::TwiML::MessagingResponse.new
end
end
To make sure this is working so far, let's add a message, return the TwiML as XML and test it out:
class CelebritySpotting < Sinatra::Base
post "/messages" do
content_type "application/xml"
twiml = Twilio::TwiML::MessagingResponse.new
twiml.message body: "Hello! Just testing here."
twiml.to_xml
end
end
Start the application on the command line with:
bundle exec rackup
The application will start on http://localhost:9292
. There's no interface, so we can test it using curl
to see if it is acting correctly.
$ curl -d "" http://localhost:9292/messages
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Message>Hello! Just testing here.</Message>
</Response>
We can see that the message is being returned in the TwiML so let's hook it up to the Twilio API for WhatsApp.
Connecting to the Twilio API for WhatsApp
Twilio provides a sandbox to test your WhatsApp integrations without waiting for a Twilio number to be approved by WhatsApp. Log in to your Twilio console and follow the instructions to set up your WhatsApp sandbox.
Once you have it set up, you need to define a webhook URL so that you can configure your WhatsApp sandbox number.
Our app currently runs on our own machine, so we need to tunnel down to that from the public internet, that's where ngrok comes in. Start ngrok by running:
ngrok http 9292
Executing this command will give you a public URL that looks like https://RANDOM_STRING.ngrok.io
. Take that ngrok URL, add the /messages
path to it and enter it in your WhatsApp sandbox settings as the URL to call when a message comes in from WhatsApp.
Save your settings for the WhatsApp sandbox and send the sandbox number a message. You should get your testing message back.
We have WhatsApp connected and we can send messages back and forth. This builds the foundation to work with the included images and analyse them with AWS Rekognition.
Receiving and downloading images
Earlier we included the Down
gem in the application. We're going to use it to download the images sent to our WhatsApp number.
Returning to app.rb
we're going to test whether our incoming message has any images and if it does, download the first one.
Twilio sends all the information we need in the body of the webhook request. We're going to look for the NumMedia
parameter to tell whether there is any media. If there is, the image URL will be in the MediaUrl0
parameter.
With that MediaUrl0
parameter we can use Down
to download the image. When you download an image with Down it gives you a Tempfile
. We can read that file or the various properties of it.
Once we are done with the tempfile we should close and unlink it with the close!
method so that it doesn't just hang around the operating system. We also need to handle the case when no image is sent, for this we can reply with a message asking for a picture.
Delete the testing message and add the following code:
post "/messages" do
content_type = "text/xml"
twiml = Twilio::TwiML::MessagingResponse.new
if params["NumMedia"].to_i > 0
tempfile = Down.download(params["MediaUrl0"])
begin
twiml.message body: "Thanks for the image! It's #{tempfile.size} bytes large."
ensure
tempfile.close!
end
else
twiml.message body: "I can't look for celebrities if you don't send me a picture!"
end
twiml.to_xml
end
Restart your app and send yourself a couple more test messages with and without images and make sure the result is what you expect.
Now it's time to start searching for celebrities in the images, time to dig into AWS Rekognition.
AWS Rekognition
Before we make any API calls to AWS we'll need to get an access key and secret. In your AWS console, create a user with the AmazonRekognitionFullAccess
policy.
There are many ways to create users and give them permissions within AWS. The following is one way that will give you an API user that can access the Rekognition service.
Start in the AWS console home and search for and select IAM in the "Find Services" box.
In the IAM section, click on the "Users" menu in the left navigation, then click the "Add user" button.
Give your user a name, check the box for "Programmatic access", and then click "Next: Permissions".
Choose "Attach existing policies directly" and you will see a table of policies. Search for the policies for "Rekognition". You will see three policies, select the AmazonRekognitionFullAccess
policy, with the description "Access to all Amazon Rekognition APIs".
Now click "Next" until you see the success message.
On the success page you will see your "Access key ID" and "Secret access key", save them both in config/env.yml
along with an AWS region where Rekognition is available, like "us-east-1". If you want to find out more about this process, check out the documentation on authentication and access control for Rekognition.
AWS_ACCESS_KEY_ID: YOUR_KEY_ID
AWS_SECRET_ACCESS_KEY: YOUR_SECRET_KEY
AWS_REGION: us-east-1
Now, to spot celebrities in our pictures we need to create a client to use the AWS API and send the image to the recognizing celebrities endpoint. Within the begin
block add the following code:
begin
client = Aws::Rekognition::Client.new
response = client.recognize_celebrities image: { bytes: tempfile.read }
ensure
tempfile.close!
end
The Ruby AWS SDK automatically picks up your credentials from the environment. We then read the image we downloaded and send it as bytes to the recognize_celebrities
method of the client.
The response
will have all the details about the faces that were detected and whether they are likely to be celebrities. You can then build up your response however you like. I chose to report on the celebrities in the picture if there were any and if there weren't report back how many faces were detected:
if response.celebrity_faces.any?
if response.celebrity_faces.count == 1
celebrity = response.celebrity_faces.first
twiml.message body: "Ooh, I am #{celebrity.match_confidence}% confident this looks like #{celebrity.name}."
else
twiml.message body: "I found #{response.celebrity_faces.count} celebrities in this picture. Looks like #{to_sentence(response.celebrity_faces.map { |face| face.name }) } are in the picture."
end
else
case response.unrecognized_faces.count
when 0
twiml.message body: "I couldn't find any faces in that picture. Maybe try another pic?"
when 1
twiml.message body: "I found 1 face in that picture, but it didn't look like any celebrity I'm afraid."
else
twiml.message body: "I found #{response.unrecognized_faces.count} faces in that picture, but none of them look like celebrities."
end
end
I also added a short helper function here to turn a list of names into a readable sentence:
def to_sentence(array)
return array.to_s if array.length <= 1
"#{array[0..-2].join(", ")} and #{array[-1]}"
end
Restart your app once more and send an image to the WhatsApp number. It turned out I didn't look enough like any celebrities to get a match from Rekognition so I thought I'd try with some celebrities too. I sent myself a few celebrity pictures, like this one, to see the results.
There's a few more than that Rekognition!
WhatsApp, Images, AWS, and celebrities
In this post we've seen how to receive images sent to a WhatsApp number using the Twilio API for WhatsApp, download the images with Down
and then search for celebrities in them using AWS Rekognition. You can see all the code from this post in this GitHub repo.
This is just the start though, Rekognition gives you a bunch of tools for analysing images, including recognising objects and scenes, text, and even nude or suggestive content.
This is a small Sinatra app, but you could implement this in Rails too. Downloading images and using the Rekognition APIs take quite a while, so you might want to delay those API calls with ActiveJob and respond using the REST API instead. It is worth considering response times as Twilio webhooks will only wait for 15 seconds before they timeout.
Have you built anything cool with image analysis? I'd love to hear about your image hacks in the comments or on Twitter at @philnash.