Decifer — Generate transcripts from audio using Flutter and Deepgram

Souvik Biswas - Apr 2 '22 - - Dev Community

Overview

Decifer is a cross-platform mobile app that helps to generate transcripts either from a voice recording or by uploading an audio file.

Transcription Playback

Try out the app: https://play.google.com/store/apps/details?id=com.souvikbiswas.deepgram_transcribe

Typically, for using Deepgram API you would require to maintain a server but I have made this project totally serverless. To know more continue reading.

Here's a brief demo of the entire app in action:

Submission Category:

Analytics Ambassadors

Link to Code on GitHub

The entire app is open sourced - try it out and also feel free to contribute to this project 😉 :

GitHub logo sbis04 / decifer

Generate your audio transcripts with ease.

Decifer Codemagic build status

Blog post about this project: https://dev.to/sbis04/decifer-generate-transcripts-with-ease-5hl3

Try out the app: https://appdistribution.firebase.dev/i/a57e37b2fda28351

A cross-platform mobile app that helps you to generate transcripts either from a voice recording or by uploading an audio file. The project uses a totally serverless architecture.

Architecture

The mobile app is created using Flutter which is integrated with Firebase. Firebase Cloud Functions is used to deploy the backend code required for communicating with the Deepgram API.

App overview

The Flutter application consists of the following pages/screens:

  • Login Page
  • Register Page
  • Dashboard Page
  • Record Page
  • Upload Page
  • Transcription Page

For authenticating the user inside the app -- Login and Register pages are used. Authentication is required to generate unique accounts for users required for storing the generated transcripts to Firestore and facilitate cloud-sync.

Register Page

The Dashboard Page displays a list of all the transcripts currently present on the user's account. It also has two buttons -…

Project Description

The primary features of the app are as follows:

  • Generate transcript from audio recording & audio file using Deepgram API.
  • Cloud-sync for syncing across multiple devices using the same account.
  • Transcribe confidence map view.
  • Export as PDF and share with anyone.

Architecture

I'm using a totally serverless architecture for this project 🤯, let's have a look how it works:

Decifer architecture

The mobile app is created using Flutter which is integrated with Firebase. I have used Firebase Cloud Functions to deploy the backend code required for communicating with the Deepgram API.

Firebase Cloud Functions lets you run backend code in a severless architecture.

I have deployed the following function to Firebase:



const functions = require("firebase-functions");
const {Deepgram} = require("@deepgram/sdk");

exports.getTranscription = functions.https.onCall(async (data, context) => {
  try {
    const deepgram = new Deepgram(process.env.DEEPGRAM_API_KEY);
    const audioSource = {
      url: data.url,
    };

    const response = await deepgram.transcription.preRecorded(audioSource, {
      punctuate: true,
      utterances: true,
    });

    console.log(response.results.utterances.length);

    const confidenceList = [];
    for (let i =0; i < response.results.utterances.length; i++) {
      confidenceList.push(response.results.utterances[i].confidence);
    }

    const webvttTranscript = response.toWebVTT();

    const finalTranscript = {
      transcript: webvttTranscript,
      confidences: confidenceList,
    };

    const finalTranscriptJSON = JSON.stringify(finalTranscript);
    console.log(finalTranscriptJSON);

    return finalTranscriptJSON;
  } catch (error) {
    console.error(`Unable to transcribe. Error ${error}`);
    throw new functions.https.HttpsError("aborted", "Could not transcribe");
  }
});


Enter fullscreen mode Exit fullscreen mode

The getTranscription function takes an audio URL, generates the transcripts using Deepgram API along with the respective confidences, and returns the data in a particular JSON format (that can be parsed within the app).

App screens

The Flutter application consists of the following pages/screens:

  • Login Page
  • Register Page
  • Dashboard Page
  • Record Page
  • Upload Page
  • Transcription Page

For authenticating the user inside the app -- Login and Register pages are used. Authentication is required to generate unique accounts for users required for storing the generated transcripts to Firestore and facilitate cloud-sync.

Register Page

The Dashboard Page displays a list of all the transcripts currently present on the user's account. It also has two buttons - one for navigating to the Record Page and the other for navigating to the Upload Page.

Dashboard Page

Record Page lets you record your audio using the device microphone and the transcribe it using Deepgram. You always have an option to re-record if you think the last recording wasn't good.

Record Page

From the Upload Page, you can choose any audio file present on your device and generate the transcript of it.

Upload Page

Transcription Page is where the entire transcript can be viewed. It has an audio-transcript synchronized playback that highlights the text transcript part with respect to the audio that is playing.

Transcription Page

You can also see the confidence map of each of the parts of the transcript (it shows how much accurate is that part of transcript generation - darker is higher confidence).

Confidence Map

You can also easily print or share the generated transcript in the PDF format.

Export transcript

Deepgram

Overview of my Deepgram dashboard (completed the mission, Get a Transcript via API or SDK):

Deepgram Overview

Usage analytics of the Deepgram API:

Deepgram Usage Analytics

Log of one of the API calls for transcribing from audio:

Deepgram Logs

References

. . . . .
Terabox Video Player