The only way to build the web for everyone is to make web apps, including the related media (audio, images, videos), as accessible as possible for your entire audience.

Visual media is instrumental in conveying information. Images pass on information in picture format. Videos take that to the next level. Concise videos, in particular, attract attention and effectively tell stories.

However, video is only partially relevant to visually impaired users. Ditto deaf and hard-of-hearing people, who can absorb only half the content, not to mention those who speak a language different from that of the content.

A solution to make images accessible is to add <alt> text, but what about the audio in videos? You add subtitles and transcripts, which would also be welcome by those who are, say, watching the video next to a sleeping partner or who don’t want to wake up a child.

With Cloudinary, you can enable people with hearing or visual challenges to engage with video and audio. This tutorial shows you how.

Acquiring the Prerequisites

To follow the steps in this tutorial, you need the following:

A grasp of the basics of JavaScript.
Adeptness with Node.js and Cloudinary.
An ability to integrate Cloudinary into Node.js apps.
A Cloudinary account. Sign up for a free account if you don’t have one.

Getting Started

As a start, upload a video, such as this one from YouTube. Follow these steps:

Download the video to your computer.
Create a project with a basic front end and back end to support media upload to the back end, e.g., to a Node.js server with Multer.

Note: To avoid storing copies of uploaded videos, upload them to Cloudinary with the [Cloudinary upload widget(https://cloudinary.com/documentation/upload_widget).

Your back end contains this Cloudinary configuration and API route:

const multer = require('multer')
const express = require('express')
const cors = require('cors')
const cloudinary = require('cloudinary').v2

require('dotenv').config()
const upload = multer({ dest: 'uploads/' })

cloudinary.config({
  cloud_name: process.env.CLOUD_NAME,
  api_key: process.env.API_KEY,
  api_secret: process.env.API_SECRET,
})

const app = express()

app.use(cors())

app.use(express.json())
app.post('/video/upload', upload.single('video'), uploadVideo)

function uploadVideo(req, res) {
  cloudinary.uploader.upload(
    req.file.path,
    {
      public_id: 'videos/video1',
      resource_type: 'video'
    },
    () => {
      res.json({ message: 'Successfully uploaded video' })
    }
  )
}

Install the dependencies and save the correct environment variables in a .env file.

Replace the variables CLOUD_NAME, API_KEY, and API_SECRET with the values from your account’s dashboard.
On the front end, send the video to Cloudinary with a file input.

Improving Video Accessibility

Cloudinary supports metadata for resources, including tags and subtitles for video. You can fetch videos from Cloudinary with integrated subtitles, which must originate from existing transcripts. That’s similar to the scenario whereby, while watching a video in a media player, you must show the player where to get the subtitles.

Manually generating tags and subtitles can be tedious. A much more efficient alternative is to generate through Cloudinary in these two steps:

Create transcripts in various languages to cater to those who are hearing challenged or foreign to the video’s language.
Generate and display tags that relate to the video for the visually impaired, including those who determine the video’s relevance with screen readers.

Leveraging the Google AI Video Transcription Add-On

In conjunction with Google’s Speech-to-Text API, Cloudinary’s Google AI Video Transcription add-on automatically generates transcripts for videos. As a result, when uploading or updating a video with Cloudinary’s API, you can create transcripts in the same folder as the video.

Here are the steps:

Activate the add-on for your account. A free plan is available.
Add to the Cloudinary upload method the option raw_convert in the Upload API reference. raw_convert asynchronously generates a file based on the uploaded file.

With that file, Google creates a transcript with the google_speech value for the uploaded video. Here’s how:

function uploadVideo(req, res) {
  cloudinary.uploader.upload(
    req.file.path,
    {
      public_id: 'videos/video2',
      resource_type: 'video',
      raw_convert: 'google_speech'
    },
    () => {
      res.json({ message: 'Successfully uploaded video' })
    }
  )
}

Note: The videos/video2 value for public_id identifies the video with subtitles. Assign any value as you desire and jot it down for use later.

Go back to the front end and upload the same video.

Cloudinary then generates another file in your account’s Media Library:

The video2.transcript file reads as follows in a code editor:

The above JSON structure shows that the line “If you only have 24 hours in a day, your success is dependent upon how you use the 24” is displayed between 0.1 and 7.3 seconds in the video.

You can also generate the following:

Other standard subtitle formats like SubRip (SRT) and VITec (VTT), which are supported by other media players.
Other transcriptions in different languages, which would make the video’s audio accessible for more viewers. French, for example, has this raw_convert value:

...
  raw_convert: 'google_speech:fr:BE'
...

That code generates a .transcript file with a French translation. fr:BE denotes the language and region, Belgium French in this case. Google supports numerous languages and dialects.

Adding Subtitles to Videos

Next, add subtitles to videos on request with video transformations. To do so , add a route on the back end for the uploaded video, which the generated .transcript file transforms:

app.get('/video', getVideo)

function getVideo(req, res) {
  try {
    cloudinary.api.resource('videos/video2', {}, (err, result) => {
      const video = cloudinary.video('videos/video2', {
        resource_type: 'video',
        type: 'upload',
        transformation: [
          {
            overlay: {
              resource_type: 'subtitles',
              public_id: 'videos/video2.transcript',
            },
          },
          { flags: 'layer_apply' },
        ],
      })
      res.json({
        ...result,
        videoElem: video.replace(/poster=/, 'controls poster='),
      })
    })
  } catch (err) {
    console.log({ err })
  }
}

A few explanations:

In the transformation property, you’ve added an overlay of the subtitles resource type and specified the path to that transcript file.

The return value of the cloudinary.video() method is in this format:

<video poster='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.jpg'>
  <source src='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.webm' type='video/webm'>
  <source src='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.mp4' type='video/mp4'>
  <source src='http://res.cloudinary.com/dillionmegida/video/upload/l_subtitles:videos:motivational-video.transcript/fl_layer_apply/v1/videos/motivational-video.ogv' type='video/ogg'>
</video>

You’ve replacedposter= with the string controls poster= and added the controls attribute to the video element, as shown here:

The Get Video button at the top makes a get request to the back end, grabs the video element, and renders it on the user interface.

Your video is now more accessible, complete with subtitles. If you’ve specified a different language for the transcript, the subtitles are in that language.

Capitalizing on Google’s Automatic Video-Tagging Capability

Besides categorizing or grouping your resources, Cloudinary also tags displays for viewers a video’s category or related tags before the viewers start watching the video. That information greatly helps people with poor vision.

To manually add tags to a video:

Click the video’s Manage button and then click the Metadata tab:

Input the tags:

Such a manual process is mundane and time sapping. Automate it with Google’s automatic video-tagging capability instead. Follow the steps below.

Activate the Google Video Tagging add-on. A free plan is available.

Update the uploadVideo function in the back end:

function uploadVideo(req, res) {
  cloudinary.uploader.upload(
    req.file.path,
    {
      public_id: 'videos/video3',
      resource_type: 'video',
      raw_convert: 'google_speech',
      categorization: 'google_video_tagging',
      auto_tagging: 0.7,
    },
    () => {
      res.json({ message: 'Successfully uploaded video' })
    }
  )
}

The categorization property sets up add-ons that automatically generate the video’s tags.

The confidence level specified by you for the auto_tagging property denotes the degree of assurance with which a label relates to a resource. auto_tagging accepts only tags with a higher confidence level than the one specified. Confidence level 1 yields specific keywords, but only a few. In the code above, the 0.7 level serves as a compromise between relevant tags and sufficient tags.

Since the add-on generates tags asynchronously, they might take a while to appear.

Refresh the screen after a while and you’ll see these results:

Depending on the video’s context, the generated tags might or might not be meaningful for a particular viewer. Nonetheless, the tags always describe the images in the video, such as “cars” and “environments.”

Displaying a Video’s Related Tags

Now obtain the video from Cloudinary by updating the getVideo function in the back end to read as follows:

...
    cloudinary.api.resource('videos/video3', {}, (err, result) => {
...

Your browser’s Networks tab (or in Postman or any API client) looks like this:

You can display video tags any way you desire, for example:

The tags might not be completely accurate so feel free to manually edit them in the dashboard or add other tags. For this video, you could add the tag “motivational quotes,” for example.

Adding Translations With the Google Translation Add-On

The tags you just generated are only accessible by English-speaking viewers only. With the Google Translation add-on, which you can use during image upload or in conjunction with a video for automatic tagging, you can add translations.

Follow these steps:

Activate the add-on and select the free plan:

Update the uploadVideo function to use the Google Translation add-on with the Google auto-tagging feature for video:

function uploadVideo(req, res) {
  cloudinary.uploader.upload(
    req.file.path,
    {
      public_id: 'videos/video4',
      resource_type: 'video',
      raw_convert: 'google_speech',
      categorization: 'google_video_tagging:en:fr',
      auto_tagging: 0.7,
    },
    () => {
      res.json({ message: 'Successfully uploaded video' })
    }
  )
}

The suffix :en:fr in the categorization property tells the add-on to generate tags, save them in English and French, and display them in the Cloudinary dashboard:

A look at the resource details through the API yields the following:

The add-on’s data populates the info property with properties in this flow:

categorization → google_video_tagging → data

Here, the array of generated tags contains a tag object with the en (for the English translation) and fr (for the French translation) properties.

Ultimately, by leveraging this add-on, you can display tags that match the viewer’s location or language.

Summing Up

It’s crucial that web apps that contain media are accessible to all, especially your target audience.

You’ve now learned how to use Cloudinary’s add-ons to improve video accessibility by adding subtitles and automatically generating and displaying the related tags—all in multiple languages as you desire.

Afterwards, your video can reach a broader audience, including those who are hearing or vision handicapped, those who speak other languages, and even those who enjoy watching video with audio on mute.

Cloudinary offers many other robust and effective add-ons. Do check them out.

Making Your Video More Accessible