Parallel processing plays a vital role in compute-heavy applications. For example, consider an application that determines if a given number is prime or not. If you're familiar with prime numbers, you'll know that you have to traverse from 1 to the square root of the number to determine if it is prime or not, and this is often time-consuming and extremely compute-heavy.
So, if you're building such compute-heavy apps on Node.js, you'll be blocking the running thread for a potentially long time. Due to Node.js's single-threaded nature, compute-heavy operations that do not involve I/O will cause the application to halt until this task is finished.
Therefore, there's a chance that you'll stay away from Node.js when building software that needs to perform such tasks. However, Node.js has introduced the concept of Worker Threads and Child Processes to help with parallel processing in your Node.js app so that you can execute specific processes in parallel. In this article, we will understand both concepts and discuss when it would be useful to employ each of them.
Node.js Worker Threads
What are worker threads in Node.js?
Node.js is capable of handling I/O operations efficiently. However, when it runs into any compute-heavy operation, it causes the primary event loop to freeze up.
Figure: The Node.js event loop
When Node.js discovers an async operation, it ״offshores״ it to the thread pool. However, when it needs to run a compute-heavy operation, it performs it on its primary thread, which causes the app to block until the operation has finished. Therefore, to mitigate this issue, Node.js introduced the concept of Worker Threads to help offload CPU-intensive operations from the primary event loop so that developers can spawn multiple threads in parallel in a non-blocking manner.
It does this by spinning up an isolated Node.js context that contains its own Node.js runtime, event loop, and event queue, which runs in a remote V8 environment. This executes in a disconnected environment from the primary event loop, allowing the primary event loop to free up.
Figure: Worker threads in Node.js
As shown above, Node.js creates independent runtimes as Worker Threads, where each thread executes independently of other threads and communicates its process statuses to the parent thread through a messaging channel. This allows the parent thread to continue performing its functions as usual (without being blocked). By doing so, you're able to achieve multi-threading in Node.js.
What are the benefits of using Worker Threads in Node.js?
As you can see, using worker threads can be very beneficial for CPU-intensive applications. In fact, it has several advantages:
- Improved performance: You can offshore compute heavy operations to worker threads, and this can free up the primary thread, which lets your app be responsive to serve more requests.
- Improve parallelism: If you have a large process that you would like to chunk into subtasks and execute in parallel, you can use worker threads to do so. For example, if you were determining if 1,999,3241,123 was a prime number, you could use worker threads to check for divisors in a range - (1 to 100,000 in WT1, 100,001 to 200,000 in WT2, etc). This would speed up your algorithm and would result in faster responses.
When should you use Worker Threads in Node.js?
If you think about it, you should only use Worker Threads to run compute-heavy operations in isolation from the parent thread.
It's pointless to run I/O operations in a worker thread as they are already being offshored to the event loop. So, consider using worker threads when you've got a compute-heavy operation that you need to execute in an isolated environment.
How can you build a Worker Thread in Node.js?
If all of this sounds appealing to you, let's look at how we can implement a Worker Thread in Node.js. Consider the snippet below:
const {
Worker,
isMainThread,
parentPort,
workerData,
} = require("worker_threads");
const { generatePrimes } = require("./prime");
const threads = new Set();
const number = 999999;
const breakIntoParts = (number, threadCount = 1) => {
const parts = [];
const chunkSize = Math.ceil(number / threadCount);
for (let i = 0; i < number; i += chunkSize) {
const end = Math.min(i + chunkSize, number);
parts.push({ start: i, end });
}
return parts;
};
if (isMainThread) {
const parts = breakIntoParts(number, 5);
parts.forEach((part) => {
threads.add(
new Worker(__filename, {
workerData: {
start: part.start,
end: part.end,
},
})
);
});
threads.forEach((thread) => {
thread.on("error", (err) => {
throw err;
});
thread.on("exit", () => {
threads.delete(thread);
console.log(`Thread exiting, ${threads.size} running...`);
});
thread.on("message", (msg) => {
console.log(msg);
});
});
} else {
const primes = generatePrimes(workerData.start, workerData.end);
parentPort.postMessage(
`Primes from - ${workerData.start} to ${workerData.end}: ${primes}`
);
}
The snippet above showcases an ideal scenario in which you can utilize worker threads. To build a worker thread, you'll need to import Worker
, IsMainThread
, parentPort
, andworkerData
from the worker_threads
library. These definitions will be used to create the worker thread.
I've created an algorithm that finds all the prime numbers in a given range. It splits the range into different parts (five parts in the example above) in the main thread and then creates a Worker Thread using the new Worker()
to handle each part. The worker thread executes the else
block, which finds the prime numbers in the range assigned to that worker thread, and finally sends the result back to the parent (main) thread by using parentPort.postMessage()
.
Node.js: Child Processes
What are child processes in Node.js?
Child processes are different from worker threads. While worker threads provide an isolated event loop and V8 runtime in the same process, child processes are separate instances of the entire Node.js runtime. Each child process has its own memory space and communicates with the main process through IPC (inter-process communication) techniques like message streaming or piping (or files, Database, TCP/UDP, etc.).
What are the benefits of using Child Processes in Node.js?
Using child processes in your Node.js applications brings about a lot of benefits:
- Improved isolation: Each child process runs in its own memory space, providing isolation from the main process. This is advantageous for tasks that may have resource conflicts or dependencies that need to be separated.
- Improved scalability: Child processes distribute tasks among multiple processes, which lets you take advantage of multi-core systems and handle more concurrent requests.
- Improved robustness: If the child process crashes for some reason, it will not crash your main process along with it.
- Running external programs: Child processes let you run external programs or scripts as separate processes. This is useful for scenarios where you need to interact with other executables.
When should you use Child Processes in Node.js?
So, now you know the benefits child processes bring to the picture. It's important to understand when you should use child processes in Node.js. Based on my experience, I'd recommend using a child process when you want to execute an external program in Node.js.
My recent experience included a scenario where I had to run an external executable from within my Node.js service. It isn't possible to execute a binary inside the primary thread. So, I had to use a child process in which I executed the binary.
How can you build Child Processes in Node.js?
Well, now the fun part. How do you build a child process? There are several ways to create a child process in Node.js (using methods like spawn()
, fork()
, exec()
, and execFile()
) and as always, reading the docs is advisable to get the full picture, but the simplest case of creating child processes is as simple as the script shown below:
const { spawn } = require('child_process');
const child = spawn('node', ['child.js']);
child.stdout.on('data', (data) => {
console.log(`Child process stdout: ${data}`);
});
child.on('close', (code) => {
console.log(`Child process exited with code ${code}`);
});
All you have to do is import a spawn()
method from the child_process
module and then call the method by passing a CLI argument as the parameter. So in our example, we're running a file named child.js
.
The file execution logs are printed through the event streaming stdout
while the close
handler handles the process termination.
Of course, this is a very minimal and contrived example of using child processes, but it is brought here just to illustrate the concept.
How to select between worker threads and child processes?
Well, now that you know what child processes and worker threads are, it's important to know when to use either of these techniques. Neither of them is a silver bullet that fits all cases. Both approaches work well for specific conditions.
Use worker threads when:
- You're running CPU-intensive tasks. If your tasks are CPU-intensive, worker threads are a good choice.
- Your tasks require shared memory and efficient communication between threads. Worker threads have built-in support for shared memory and a messaging system for communication.
Use child processes when:
- You're running tasks that need to be isolated and run independently, especially if they involve external programs or scripts. Each child process runs in its own memory space.
- You need to communicate between processes using IPC mechanisms, such as standard input/output streams, messaging, or events. Child processes are well-suited for this purpose.
Wrapping up
Parallel processing is becoming a vital aspect of modern system design, especially when building applications that deal with very large datasets or compute-intensive tasks. Therefore, it's important to consider Worker Threads and Child Processes when building such apps with Node.js.
If your system is not designed properly with the right parallel processing technique, your system could perform poorly by over-exhausting resources (as spawning these resources consumes a lot of resources as well).
Therefore, it's important for software engineers and architects to verify requirements clearly and select the right tool based on the information presented in this article.
Additionally, you can use tools like Amplication to bootstrap your Node.js applications easily and focus on these parallel processing techniques instead of wasting time on (re)building all the boilerplate code for your Node.js services.