In this video, I show how to use fine-tune a state of the art Conformer model for audio keyword classification, and build a Gradio Space to showcase it. I also quickly test the model with distorted audio to see how resilient it it.
Dataset: https://huggingface.co/datasets/speech_commands
Base model: https://huggingface.co/facebook/wav2vec2-conformer-rel-pos-large
Fine-tuned model: https://huggingface.co/juliensimon/wav2vec2-conformer-rel-pos-large-finetuned-speech-commands
Space: https://huggingface.co/spaces/juliensimon/keyword-spotting
Notebook: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/keyword-spotting