This blog post was written for Twilio and originally published on the Twilio blog.
Data visualizations are handy ways to examine and think about data. Observable is a Jupyter Notebook-like tool that makes it easy to quickly run JavaScript code in cells so you can see what you're doing in real-time.
This post will go over how to make an interactive bar chart showing Taylor Swift's most-used words from her lyrics with Observable using D3.js. In the meantime you can view the completed notebook and visualization here, and you can fork and edit it yourself.
Brief Intro to Observable
You can think of each different cell as a function. Cells come in two primary forms:
Expressions. Expression cells are the most concise and are meant for simple definitions and in Observable outside of a closure, you don’t need a
var
/const
/let
keyword.
Blocks. Block cells are encompassed by curly braces and include more complex code that might contain local variables and loops.
Because local variables like arr
above can not be referenced by other cells, many Observable notebooks put different definitions and functions in their own cells. That is what this post will do as well--all the code snippets should be in their own cells, and after adding the code to a cell you should run it by typing shift-return
.
For a more detailed introduction to Observable, check out this Notebook.
Setup
Download this dataset of Taylor Swift lyrics and then make an Observable account if you do not have one already. Once you have an account, make a new notebook by clicking the New button in the top-right corner.
To get started, hover your mouse near the left of a cell. You should see a plus sign like this:
Import the dataset from your machine by clicking the plus sign beneath the existing stock markdown cell, clicking into an Observable cell, and then clicking shift-command-u
on Mac. Then select the file you wish to import (don't forget to unzip it!) In the cell you selected, you should then see something like:
FileAttachment("tswiftlyrics.csv")
Your file name can be different. You can run the cell by clicking the right-facing triangle on the right-end of the Run cell button
or by typing shift-return
, both of which would return the following:
To see the actual data from the CSV, append .text() to the code and run it to see the data above like so:
FileAttachment("tswiftlyrics.csv").text()
You can also see that a file was imported in that cell because there is that file symbol on the right. We see the data includes the artist for each song (Taylor Swift), the album name, the track title, track number on the album, the lyric, the line the lyric is on, and the year the song came out.
Now click the plus sign on the left of the cell to insert a new cell which will hold a comment. We can do that with markdown:
md`#### Require d3`
Insert a new cell and add the following to require D3.js.
d3 = {
const d3 = require("d3-dsv@1", "d3@5","d3-scale@3","d3-scale-chromatic@1", "d3-shape@1", "d3-array@2")
return d3
}
In Observable notebooks you cannot require
any npm package: you can only use tools that expose their modules via UMD or AMD. Usually if you can include the module from unpkg.com via CDN in a webpage, you can use it in Observable.
Now we loop through the CSV file, calling csvParse
to parse the input string (the contents of our CSV file). This returns an array of objects according to the parsed rows.
data = {
const text = await FileAttachment(<your-imported-taylor-swift-file-name.csv>).text();
return d3.csvParse(text, ({lyric}) => ({
lyric: lyric
}));
}
If you run and expand that cell you can see this input that just contains the lyrics from the CSV file:
In a new cell make an empty array to add the words from the lyrics to:
lyrics = []
In a new cell add the following to loop through our data
object to add each lyric to the lyrics
array.
data.forEach(lyric => lyrics.push(lyric.lyric));
You can see the modified lyrics
object in a new cell:
Clean up the Lyrics
Observable does not let us reassign variables because "Named cells are declarations, not assignments." If you were to try to reset or reassign the lyrics
variable you would get this error because cell names must be unique:
To analyze the most-used words from Taylor's lyrics, in a new cell let's convert the array to a string and use regex to remove non-string characters.
newLyrics = lyrics.join(' ').replace(/[.,\/#!""'$%\?^&\*;:{}=\-_`~()0-9]/g,"").toLowerCase()
After we clean up the lyrics, let's remove stopwords from the array of lyrics. Most of these words were taken from a list of NLTK stop words and do not really say much: they're sort-of "scaffolding-y." In a new cell add
stopwords = ['i','me','my','myself','we','our','ours','ourselves','you','your','yours','yourself','yourselves','he','him','his','himself','she','her','hers','herself','it','its','itself','they','them','their','theirs','themselves','what','which','who','whom','this','that','these','those','am','is','are','was','were','be','been','being','have','has','had','having','do','does','did','doing','a','an','the','and','but','if','or','because','as','until','while','of','at','by','for','with','about','against','between','into','through','during','before','after','above','below','to','from','up','down','in','out','on','off','over','under','again','further','then','once','here','there','when','where','why','how','all','any','both','each','few','more','most','other','some','such','no','nor','not','only','own','same','so','than','too','very','s','t','can','will','just','don','should','now', 'im', 'ill', 'let', 'said', 'thats', 'oh', 'say', 'see', 'yeah', 'youre', 'ey', 'cant', 'dont', 'cause']
To remove these stopwords from the lyrics add this function to a new cell.
remove_stopwords = function(str) {
var res = []
var words = str.split(' ')
for(let i=0;i<words.length;i++) {
var word_clean = words[i].split(".").join("")
if(!stopwords.includes(word_clean)) {
res.push(word_clean)
}
}
return(res.join(' '))
}
Now we make a new variable in a new cell calling the remove_stopwords
function.
lyrics_no_stopwords = remove_stopwords(newLyrics)
Get String Frequency for each Lyric
To get the number of occurrences for each word in the lyrics, add this code to a new cell using [reduce](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce)
.
strFrequency = function (stringArr) { //es6 way of getting frequencies of words
return stringArr.reduce((count, word) => {
count[word] = (count[word] || 0) + 1;
return count;
}, {})
}
Then we call that strFrequency
function and assign the output to a new variable obj
.
obj = strFrequency(lyrics_no_stopwords.split(' '))
If you run the cell you would see something like this:
Sort our Word Frequencies
Because this is a JavaScript object we can't just call sort()
. To sort our frequencies add this function to a new cell to sort our object from greatest to least.
sortedObj = Object.fromEntries(
Object.entries(obj).sort( (a,b) => a[1] - b[1] )
)
Running the cell would show the following output:
Make a new function in a new cell to only return the first x-number (in this case, 30) of items of the object, editing the object to also have lyric
and freq
in front of each value so the values are easy to access.
final = Object.entries(sortedObj).map(([lyric, freq]) => ({lyric, freq})).slice(0,30);
Running the cell you can see that final
is an array, slightly different from sortedObj
above.
Make our Chart
We need to set some attributes of our chart. In a new cell add
margin = ({top: 20, right: 0, bottom: 30, left: 40})
followed by another new cell with
height = 500
Now we create our x-values in a new cell with d3.scaleBand()
breaking up our domain of each Taylor Swift lyric from the final
object into a range of values, which are the minimum and maximum extents of the band.
x = d3.scaleBand()
.domain(final.map(d => d.lyric))
.rangeRound([margin.left, width - margin.right])
.padding(0.1)
Our y-values are made in a similar manner in a new cell:
y = d3.scaleLinear()
.domain([0, d3.max(final, d => d.freq)])
.range([height - margin.bottom, margin.top])
To style and display our axes, we must define them as functions translating them into the appropriate location according to the set orientation. In two separate cells include the following:
xAxis = g => g
.attr("transform", `translate(0,${height - margin.bottom})`)
.call(d3.axisBottom(x).tickSizeOuter(0))
yAxis = g => g
.call(d3.axisLeft(y).ticks(15))
.call(g => g.select(".domain").remove())
Now to add a title to the y-axis add the following code to a new cell.
yTitle = g => g.append("text")
.attr("font-family", "sans-serif")
.attr("font-size", 10)
.attr("y", 10)
.text("Frequency")
Now we call these by making our chart in a new cell. We create an SVG object, using the viewBox
attribute to set the position and dimension. Then we append a g
element (which is not unique to D3.js, as it is used to group SVG shapes together) creating rectangles from our lyric data and setting the lyric as the x-value for each rectangle and the frequency of the lyric as the y-value for each rectangle. We also set some style attributes and then call our xAxis
, yAxis
, and yTitle
.
{
const svg = d3.create("svg")
.attr("viewBox", [0, 0, width, height]);
svg.append("g")
.selectAll("rect")
.data(final)
.enter().append("rect")
.attr('x', d => x(d.lyric))
.attr('y', d => y(d.freq))
.attr('width', x.bandwidth())
.attr('height', d => y(0) - y(d.freq))
.style("padding", "3px")
.style("margin", "1px")
.style("width", d => `${d * 10}px`)
.text(d => d)
.attr("fill", "#CEBEDE")
.attr("stroke", "#FFB9EC")
.attr("stroke-width", 1)
svg.append("g")
.call(xAxis);
svg.append("g")
.call(yAxis);
svg.call(yTitle);
svg.call(yTitle);
return svg.node();
Running that cell should output this chart. Tada!
Add Interactivity to the Bar Chart
Beneath the yAxis
cell, add a new cell to contain a tooltip, which is displayed when a user hovers their cursor over a rectangle. We set different style elements to be hex colors related to Taylor Swift albums and other CSS-like properties.
tooltip = d3.select("body")
.append("div")
.style("position", "absolute")
.style("font-family", "'Open Sans', sans-serif")
.style("font-size", "15px")
.style("z-index", "10")
.style("background-color", "#A7CDFA")
.style("color", "#B380BA")
.style("border", "solid")
.style("border-color", "#A89ED6")
.style("padding", "5px")
.style("border-radius", "2px")
.style("visibility", "hidden");
Now edit the chart cell before by adding the following tooltip code. On a mouseover
event the tooltip is displayed and shows the word with how frequently the word appears in Taylor Swift songs. When the mouse moves while hovering over a rectangle in the bar chart, so does the tooltip and its text.
{
const svg = d3.create("svg")
.attr("viewBox", [0, 0, width, height]);
// Call tooltip
tooltip;
svg.append("g")
.selectAll("rect")
.data(final)
.enter().append("rect")
.attr('x', d => x(d.lyric))
.attr('y', d => y(d.freq))
.attr('width', x.bandwidth())
.attr('height', d => y(0) - y(d.freq))
.style("padding", "3px")
.style("margin", "1px")
.style("width", d => `${d * 10}px`)
.text(d => d)
.attr("fill", "#CEBEDE")
.attr("stroke", "#FFB9EC")
.attr("stroke-width", 1)
.on("mouseover", function(d) {
tooltip.style("visibility", "visible").text(d.lyric + ": " + d.freq);
d3.select(this).attr("fill", "#FDE5BD");
})
.on("mousemove", d => tooltip.style("top", (d3.event.pageY-10)+"px").style("left",(d3.event.pageX+10)+"px").text(d.lyric + ": " + d.freq))
.on("mouseout", function(d) {
tooltip.style("visibility", "hidden");
d3.select(this)
.attr("fill", "#CEBEDE")
});
svg.append("g")
.call(xAxis);
svg.append("g")
.call(yAxis);
svg.call(yTitle);
return svg.node();
}
You should see:
Tada! Now if you hover over a bar, you can see the exact value. If you want to see the complete code you can play around with the published Observable notebook here.
What's next for data visualizations?
You don't need to use Observable notebooks to make data visualizations in JavaScript-- you can use D3.js and other data visualization libraries in your preferred text editor too, and then display them in a webpage. However, Observable is a handy tool that lets you view code output quickly and can help make building and sharing demos easier. You can use other datasets as well such as different datasets here on Kaggle and be sure to ask yourself these 5 questions before working with a dataset. Let me know online what you're building!
- GitHub: elizabethsiegle
- Twitter: @lizziepika
- email: lsiegle@twilio.com