While I polish my skills I also like to get involved in learning languages. I learned English, my native language is Portuguese, and now I'm learning Japanese due to my love for Japanese culture. Yesterday I had the idea to automate a rather boring task that it was hindering my learning using NodeJs. Let's start from the beginning.

Background

I am learning Japanese since 2015 already, it has been a long journey and I am still far from fluency, but I am in a stage where I can read mangas (Japanese comics) with relatively ease and books with assistance of a dictionary. This week I started a new book and decided to give another chance to Anki, a very powerful flashcard application that are very famous among Japanese learners, but it can be used to learn virtually anything. I already used before the same way: I read the the book with a dictionary opened and every word I don't know I add to a .txt file to add to Anki afterwards and then start the memorization process. However, there is a problem, which probably made me stop using Anki before. Let's go into that.

Problem

Anki has an import feature where you can make a .txt file declaring both sides of the flashcard separated by a semicolon, like this:

傍ら;side, edge, beside, besides, nearby, while (doing)
飢える;to starve, to thirst, to be hungry
へたり込む;to sit down hard, to sink down to the floor
払い除ける;to ward off, to brush away, to fling off, to drive away

But you have to do this file somehow and at first I did this manually. I took note of all words I didn't know, maximum of 50 per day to not to have too much to learn at once and after that I went to the dictionary and copied the meaning to the other side of the flashcard. Furthermore, in Japanese there are three types of characters: hiragana, katakana and kanji. Simply put, kanji represent ideas, for example, 愛 means love, while hiragana and katakana represents sounds and are used to describe how kanji must be read. Using 愛 as example, its reading in hiragana is あい, which in our alphabet is written as ai. For a more detailed explanation, you can refer to Wikipedia, where it has a very good summary of how it works. Therefore, it is also important to remember how the words with kanji are read, so I had to make another file that looked like the one below, the words and its reading separated by semicolon:

傍ら;かたわら
飢える;うえる
へたり込む;へたりこむ
払い除ける;はらいのける

The problem is that this manual task is very boring and time consuming. I had to copy every word, look at the dictionary, copy the meaning and reading and after that import into Anki. The dictionary is digital, so it was a matter of Crtl+C + Crtl+V, but still took 30 minutes or so to have 50 words ready. It is also error prone since I can confuse reading with meaning, put in the wrong file or mix words' meanings putting it in wrong rows. I had to do something to improve this experience and make the reading fun again, so I came up with the idea to do a script to do that.

Solution

Since it was a relatively simple script, I decided to take this opportunity to practice NodeJS, which I'm learning right now. However, it is not as simple as it looks since it is necessary to have a dictionary to feed the application. Luckily, I had a dictionary sitting on DynamoDB tables that I created for another project using Lambda and API Gateway to access it. Hopefully, in the near future I can talk about this other project as well, but for now assume that the script has access to a API that return the words found according to the term given as parameter, like this: example.com?term=愛.

With this major problem done, it was just a matter of calling the API and parsing the response and write the files. The entire script was made using just three libraries:

axios: http client library to call the API. I had very good experiences with it in the past since it seems much more straightforward than the others I had contact with.
fs: standard library to deal with files I/O in nodejs.
progress: make it more responsive while the work is done by having a progress bar.

First I declared some variables to store the content of the input file, a file with each word in a line, I split them and stored into a array to be used later. The variables that will store the result are also declared:

let input = fs.readFileSync('input.txt', {encoding: 'utf8'});
let terms = input.split('\r\n');
let outputReading = "";
let outputMeaning = "";

Then I created an axios instance to use and then the function that I use to call the API and get the word I desire:

var instance = axios.create({
    baseURL: "https://api.example.com",
    headers: {'x-api-key': "xxxxxxxxxx"}
});

async function getWord(term){
    const response = await instance.get("/dictionary", {params: {term: term}});
    return response.data.body[0];
}

In the function I call the API and returns the body of the response. The response is an array with the possible results for the search. A simplefied description of the schema is as follows:

{
    "statusCode": 200,
    "body": [
        {
            "Id": 1,
            "kanji": [],
            "kana": [],
            "sense": [
                {
                    "gloss":[]
                }
            ]
        }
    ]
}

The response has more elements detailing the entire word, but it is important for the problem I was trying to solve was the following:

kana: An array with all the readings of the word. A word can have more than one reading, but the first one in the array is the most popular and generally the one I'm looking for.
sense: An array with the meanings and its information: the part of speech, dialect, related words, antonyms, etc. A word can have different meanings, but one meaning can have a lot of words that are synonyms among them.
gloss: The synonyms are stored here in an array.

All the objects stored in the arrays mentioned has a text field where the information we are interested is stored. Going to our previous example with the word 愛 this is what the response looks like in a summarized way:

{
    "statusCode": 200,
    "body": [{
        "kanji": [{
            "common": 1,
            "text": "愛",
            "tags": []
        }],
        "kana": [{
            "appliesToKanji": ["*"],
            "text": "あい",
            "common": 1,
            "tags": []
        }],
        "Id": 1150410,
        "sense": [{
            "gloss": [{
                "lang": "eng",
                "text": "love"
            }, {
                "lang": "eng",
                "text": "affection"
            }, {
                "lang": "eng",
                "text": "care"
            }]
        }, {
            "gloss": [{
                "lang": "eng",
                "text": "attachment"
            }, {
                "lang": "eng",
                "text": "craving"
            }, {
                "lang": "eng",
                "text": "desire"
            }]
        }]
    }]
}

After getting the response, to handle it and get the result in the format I want, I created two functions to handle the meanings and the readings, respectively. Below we have the handleMeanings function as an example:

function handleMeanings(term, word){
    let meaningsArray = []

    for(sindex in word.sense){
        let glosses = word.sense[sindex].gloss;
        for(gindex in glosses){
            meaningsArray.push(glosses[gindex].text);
        }
    }

    let joinMeanings = meaningsArray.join(", ");

    let result = term + ";" + joinMeanings + "\r\n";
    return result;
}

For each sense I iterate through its glosses list and push to an array, then I join everything, pretty simple, that's just what I want.

Conclusion

For the people who saw the title and the "scary" image and thought it was something much more complex, I'm sorry. It was very simple and even anti-climax, but it is really helping me keeping up with my studies. Now the problem is to do all the reviews, I will try my best! :D
If you think something can be coded better, please let me know. NodeJS is still new to me!

Solving Japanese learning problems with code

Background

Problem

Solution

Conclusion