August 19th, 2024 · 2 min read

Recording audio from a web browser is more challenging than it might seem at first glance. While the browser's abstraction from the hardware it's running on has its benefits, it can make it difficult to communicate with certain peripherals - e.g. a user's connected microphone. Luckily for us modern developers, the Web Audio API and the MediaStream API came along over a decade ago and solved many of these problems.

The Web Audio API is a powerful tool for manipulating audio in the browser. It allows developers to analyze, synthesize, and manipulate audio in real-time using some simple JavaScript. The MediaStream API allows developers to open streams of media content from many sources, including the microphone. In this article, we will look at how to use the Web Audio API and the MediaStream API to capture microphone audio in any modern web browser.

Setting up a basic HTML page
First, let's create a basic HTML page that we can use to control audio capture from the microphone. Create a new file called index.html and add the following code:

<!DOCTYPE html>
<html>
<head>
  <title>Microphone Capture Demo</title>
</head>
<body>
  <button id="start-button">Start Capture</button>
  <button id="stop-button">Stop Capture</button>
  <script src="main.js"></script>
</body>
</html>

Capturing audio from the microphone

Now that we have our HTML page, let's create the main JavaScript file to capture microphone audio. Create a new file called main.js and add the following code:

const startButton = document.getElementById('start-button');
const stopButton = document.getElementById('stop-button');


let audioContext;
let micStreamAudioSourceNode;
let audioWorkletNode;


startButton.addEventListener('click', async () => {
  // Check if the browser supports the required APIs
  if (!window.AudioContext || 
      !window.MediaStreamAudioSourceNode || 
      !window.AudioWorkletNode) {
    alert('Your browser does not support the required APIs');
    return;
  }


  // Request access to the user's microphone
  const micStream = await navigator
      .mediaDevices
      .getUserMedia({ audio: true });


  // Create the microphone stream
  audioContext = new AudioContext();
  mediaStreamAudioSourceNode = audioContext
      .createMediaStreamSource(micStream);


  // Create and connect AudioWorkletNode 
  // for processing the audio stream
  await audioContext
      .audioWorklet
      .addModule("my-audio-processor.js");
  audioWorkletNode = new AudioWorkletNode(
      audioContext,
      'my-audio-processor');
  micStreamAudioSourceNode.connect(audioWorkletNode);
});


stopButton.addEventListener('click', () => {
  // Close audio stream
  micStreamAudioSourceNode.disconnect();
  audioContext.close();
});

With this code, we are able to capture microphone audio using the Web Audio API and the MediaStream API. When the user clicks the Start Capture button, we create an AudioContext and request access to the user's microphone. Once we know we have access to the microphone audio, we then create an audio processing graph using a MediaStreamAudioSourceNode to capture the audio and an AudioWorkletNode to process it.

The Web Audio API and MediaStream API are supported on Google Chrome, Firefox, Safari, Microsoft Edge and Opera. A host of mobile web browsers are also supported.

Processing the captured audio data
Now that we have set up the basic infrastructure for capturing microphone audio, we can start processing the real-time audio data. To do this, we will need to define the behaviour of the AudioWorkletNode with an AudioWorkletProcessor implementation of our own.

Create a new file called my-audio-processor.js and add the following code:

class MyAudioProcessor extends AudioWorkletProcessor {
    process(inputs, outputs, parameters) {
        // Get the input audio data from the first channel
        const inputData = inputs[0][0];

        // Do something with the audio data
        // ...

        return true;
    }
}

registerProcessor('my-audio-processor', MyAudioProcessor);

In the process function that we've defined, we can access the input audio data and perform various operations on it. For example, we can use the Web Audio API's AnalyserNode to analyze the frequency spectrum or buffer the audio to send to a speech recognition engine.

With this final addition, we can now capture real-time microphone audio from the HTML page we created earlier.

Capturing audio from the browser on Easy Mode

Now, you might be thinking, "this approach seems complicated and limited (i.e. can't choose the sample rate of the incoming audio, audio processing on the main thread seems bad, etc.)", and you would be right. That's why we created the Picovoice Audio Recorders.

At Picovoice, we ran into a multitude of challenges getting audio from the web browser for speech recognition. We require specific audio properties for our speech recognition engines, and - since our audio processing happens all in the browser - we want the processing to happen on a worker thread. We found ourselves building out a complex array of utility functions to help, which we eventually merged into an open-source library: Picovoice Web Voice Processor.

With Web Voice Processor imported, our main.js file would look like this:

import { WebVoiceProcessor } from '@picovoice/web-voice-processor';

const startButton = document.getElementById('start-button');
const stopButton = document.getElementById('stop-button');

const engine = {
    onmessage: function(e) {
        switch (e.data.command) {
            case 'process':
                const inputData = e.data.inputFrame;
                // do something with the audio
                break;
        }
    }
}

startButton.addEventListener('click', async () => {
    // Once WebVoiceProcessor has at least one engine
    // subscribed, audio capture begins
    WebVoiceProcessor.subscribe(engine);
});

stopButton.addEventListener('click', () => {
    // Once WebVoiceProcessor no longer has engines
    // subscribed, audio capture stops
    WebVoiceProcessor.unsubscribe(engine);
});

In addition to simplifying the audio capture process, Web Voice Processor adds options for resampling the input audio, selecting the audio device to record with and running audio processing on a Worker Thread.

It takes less than 90 seconds to start recording audio from a web browser:

Explore
The Web Voice Processor is open-source and available on GitHub . There is also a demo in the repository that explores more of the features of the library.

How to Record Audio from a Web Browser