While I polish my skills I also like to get involved in learning languages. I learned English, my native language is Portuguese, and now I'm learning Japanese due to my love for Japanese culture. Yesterday I had the idea to automate a rather boring task that it was hindering my learning using NodeJs. Let's start from the beginning.
Background
I am learning Japanese since 2015 already, it has been a long journey and I am still far from fluency, but I am in a stage where I can read mangas (Japanese comics) with relatively ease and books with assistance of a dictionary. This week I started a new book and decided to give another chance to Anki, a very powerful flashcard application that are very famous among Japanese learners, but it can be used to learn virtually anything. I already used before the same way: I read the the book with a dictionary opened and every word I don't know I add to a .txt file to add to Anki afterwards and then start the memorization process. However, there is a problem, which probably made me stop using Anki before. Let's go into that.
Problem
Anki has an import feature where you can make a .txt file declaring both sides of the flashcard separated by a semicolon, like this:
傍ら;side, edge, beside, besides, nearby, while (doing)
飢える;to starve, to thirst, to be hungry
へたり込む;to sit down hard, to sink down to the floor
払い除ける;to ward off, to brush away, to fling off, to drive away
But you have to do this file somehow and at first I did this manually. I took note of all words I didn't know, maximum of 50 per day to not to have too much to learn at once and after that I went to the dictionary and copied the meaning to the other side of the flashcard. Furthermore, in Japanese there are three types of characters: hiragana, katakana and kanji. Simply put, kanji represent ideas, for example, 愛 means love, while hiragana and katakana represents sounds and are used to describe how kanji must be read. Using 愛 as example, its reading in hiragana is あい, which in our alphabet is written as ai
. For a more detailed explanation, you can refer to Wikipedia, where it has a very good summary of how it works. Therefore, it is also important to remember how the words with kanji are read, so I had to make another file that looked like the one below, the words and its reading separated by semicolon:
傍ら;かたわら
飢える;うえる
へたり込む;へたりこむ
払い除ける;はらいのける
The problem is that this manual task is very boring and time consuming. I had to copy every word, look at the dictionary, copy the meaning and reading and after that import into Anki. The dictionary is digital, so it was a matter of Crtl+C + Crtl+V, but still took 30 minutes or so to have 50 words ready. It is also error prone since I can confuse reading with meaning, put in the wrong file or mix words' meanings putting it in wrong rows. I had to do something to improve this experience and make the reading fun again, so I came up with the idea to do a script to do that.
Solution
Since it was a relatively simple script, I decided to take this opportunity to practice NodeJS, which I'm learning right now. However, it is not as simple as it looks since it is necessary to have a dictionary to feed the application. Luckily, I had a dictionary sitting on DynamoDB tables that I created for another project using Lambda and API Gateway to access it. Hopefully, in the near future I can talk about this other project as well, but for now assume that the script has access to a API that return the words found according to the term given as parameter, like this: example.com?term=愛
.
With this major problem done, it was just a matter of calling the API and parsing the response and write the files. The entire script was made using just three libraries:
- axios: http client library to call the API. I had very good experiences with it in the past since it seems much more straightforward than the others I had contact with.
- fs: standard library to deal with files I/O in nodejs.
- progress: make it more responsive while the work is done by having a progress bar.
First I declared some variables to store the content of the input file, a file with each word in a line, I split them and stored into a array to be used later. The variables that will store the result are also declared:
let input = fs.readFileSync('input.txt', {encoding: 'utf8'});
let terms = input.split('\r\n');
let outputReading = "";
let outputMeaning = "";
Then I created an axios instance to use and then the function that I use to call the API and get the word I desire:
var instance = axios.create({
baseURL: "https://api.example.com",
headers: {'x-api-key': "xxxxxxxxxx"}
});
async function getWord(term){
const response = await instance.get("/dictionary", {params: {term: term}});
return response.data.body[0];
}
In the function I call the API and returns the body of the response. The response is an array with the possible results for the search. A simplefied description of the schema is as follows:
{
"statusCode": 200,
"body": [
{
"Id": 1,
"kanji": [],
"kana": [],
"sense": [
{
"gloss":[]
}
]
}
]
}
The response has more elements detailing the entire word, but it is important for the problem I was trying to solve was the following:
- kana: An array with all the readings of the word. A word can have more than one reading, but the first one in the array is the most popular and generally the one I'm looking for.
- sense: An array with the meanings and its information: the part of speech, dialect, related words, antonyms, etc. A word can have different meanings, but one meaning can have a lot of words that are synonyms among them.
- gloss: The synonyms are stored here in an array.
All the objects stored in the arrays mentioned has a text
field where the information we are interested is stored. Going to our previous example with the word 愛 this is what the response looks like in a summarized way:
{
"statusCode": 200,
"body": [{
"kanji": [{
"common": 1,
"text": "愛",
"tags": []
}],
"kana": [{
"appliesToKanji": ["*"],
"text": "あい",
"common": 1,
"tags": []
}],
"Id": 1150410,
"sense": [{
"gloss": [{
"lang": "eng",
"text": "love"
}, {
"lang": "eng",
"text": "affection"
}, {
"lang": "eng",
"text": "care"
}]
}, {
"gloss": [{
"lang": "eng",
"text": "attachment"
}, {
"lang": "eng",
"text": "craving"
}, {
"lang": "eng",
"text": "desire"
}]
}]
}]
}
After getting the response, to handle it and get the result in the format I want, I created two functions to handle the meanings and the readings, respectively. Below we have the handleMeanings
function as an example:
function handleMeanings(term, word){
let meaningsArray = []
for(sindex in word.sense){
let glosses = word.sense[sindex].gloss;
for(gindex in glosses){
meaningsArray.push(glosses[gindex].text);
}
}
let joinMeanings = meaningsArray.join(", ");
let result = term + ";" + joinMeanings + "\r\n";
return result;
}
For each sense
I iterate through its glosses list and push to an array, then I join everything, pretty simple, that's just what I want.
Conclusion
For the people who saw the title and the "scary" image and thought it was something much more complex, I'm sorry. It was very simple and even anti-climax, but it is really helping me keeping up with my studies. Now the problem is to do all the reviews, I will try my best! :D
If you think something can be coded better, please let me know. NodeJS is still new to me!