Using Serenade to Code By Voice
The emergence of voice-controlled devices like Siri, Amazon Echo, and Google Home have simplified tasks that previously required a keyboard. But we've only just begun to inhabit a world filled with devices that we control in different ways, one where we can accomplish more by what we say than what we can do. Serenade aims to tackle an interesting problem: writing code through speech.
Serenade is a voice-to-code software that's available to plug into several popular IDEs, like VS Code and IntelliJ. Their claim is to allow you to code "using natural speech," and with support for nearly a dozen languages, it's an intriguing proposition. The ability to code through voice commands is more than just a Sci-Fi fantasy. For programmers with carpal tunnel, or other more severe physical ailments, coding can continue to provide them access to the creative endeavor they enjoy—and a living.
Is Serenade actually viable? In this post, we'll put the program to the test. My colleague Garen and I will attempt to build a Node.js site, with a simple HTML and CSS frontend, using only our voice. Our site will be nothing more than a button that, when clicked, will display a random emoji. To truly immerse ourselves in the hands-free experience, we'll also attempt to deploy the app to Heroku using their CLI. Let's get started!
Set up the server
As with most new projects, we'd like to create a new directory, change into it, and install our package dependencies. Since we can only use Serenade within the IDEs they support, we must resort to macOS' Dictation feature, which is lacking and unable to comprehend programmer speak. We must explicitly spell out Unix commands, or the program won't understand them. Furthermore, there doesn't seem to be any way to enforce lowercase, which really irks the directory aesthetic. Perhaps most frustrating of all is the inability to speak filename conventions:
In order to move this along, we just typed npm i express --save-dev
on the command line, created a new file called server.js, and opened up Visual Studio Code. Hands 1, Voice 0.
Serenade runs on your computer and detects your IDE, in order to integrate with its functionality. After installing and activating the app, you can pretty much get started by issuing explicit directives. In the event that Serenade didn't understand, you can choose a visual option to resolve the ambiguity. There are several tutorials available to help you begin learning the directives. Here's the first try:
The first attempt at using Serenade
There's a lot to like about how Serenade works: it knows common IDE commands like "delete word". Additionally, in general, if you speak out a sentence, it'll know when to add spaces and semicolons. However, one issue to keep in mind is the fact that Serenade has three different ways to add text:
- insert, which is a general way to describe a line
- add, which specifically recognizes language features like classes and functions
- type, which enters raw, plain text
Knowing which one to use can be a bit confusing, but after spending some time with the program (and thoroughly reading the documentation), it is possible to move at a somewhat faster clip. However, phrases are still misunderstood from time to time:
Moving a little faster as we learn the syntax
There seems to be some confusion about where to place parentheses and arguments. On occasion, it can feel like we are fighting against the app's syntax, rather than it understanding me. Hands 2, Voice 0.
Write the HTML and CSS
Let's assume we did manage to get a basic server going:
const express = require('express');
const app = express();
const path = require('path');
app.get('/', function(req, res) {
res.sendFile('/index.html');
});
app.listen(8080);
We'll now write a file called index.html to represent the entirety of our design. HTML is meant to be a structured language; how does it fare?
HTML for the win
Success! Writing HTML is a breeze. We was worried there would be some mistakes with tag placement, especially after adding the attributes, but the alignment and nesting are well understood. Hands 2, Voice 1.
Let's move on to the CSS:
CSS also works well
This is an example of when a really close understanding of the underlying language's grammar is really important; we never knew the bracketed CSS definitions were specifically called "rulesets." Still, at this point, we've started to understand how Serenade works, including its nifty capability to move directly to a line. Hands 2, Voice 2.
Write the client-side JavaScript
Our final HTML and CSS will end up looking like this:
<!DOCTYPE html>
<html>
<head>
<link rel=stylesheet type=text/css href=/main.css></link>
<script src="main.js"></script>
</head>
<body>
<h1>hello!</h1>
<p id="container"></p>
<button id="emoji">Emoji!</button>
</body>
</html>
h1 {
color: blue;
margin-top: 1em;
}
#container {
text-align: center;
}
To avoid executing our JavaScript immediately, we need to wait for the DOMContentLoaded event to fire before setting up any event listeners. Capitalization matters here, and we couldn't get Serenade to recognize the right spelling:
Capitalizations are a challenge
Still, either through our increased proficiency with Serenade, or the relatively straightforward nature of what to add, it was easy to add the event handling code. As programmers are not infallible, we intentionally tried to make mistakes to see if Serenade could recover. For example, we missed some indentation:
We'll call this one a draw.
Deploy to Heroku
For the final piece of our test, we need to deploy to Heroku. Heroku requires a Procfile, which is rather short for this project.
web: node index.js
Unfortunately, once again, the obstacle here is switching to the Dictation app, which simply cannot understand the commands required to push this application up to Heroku:
With a final clinch at the end: Hands 3, Voice 2.
Conclusion
Serenade is pretty fun to use. Navigating the IDE is solid, and explicitly speaking out the proper names for programming concepts makes one feel like a wise sage. However, it requires a specific understanding of the various commands, which doesn't feel like speaking English; it feels like we're speaking specific instructions to a computer, rather than a person we're pair programming with. Still, we'd recommend it as a system to augment your current two-handed coding style, rather than a full-on replacement. With practice and patience, it could be a useful tool for developers who otherwise can't code. And as a nice touch, the company provides free one-on-one training to anyone who wants to get started quickly.
Another issue that we did not note here is that, like any good programmer, we frequently relied on the internet to look up specific APIs. This means that we spent the majority of our time navigating away from the IDE (and Serenade), which tested Dictation's coping capabilities. We're not quite at the point of fully composing applications with voice (and there's a long way to go before we can deploy them), but we're getting closer.