How to Use Every Core on your Machine using NodeJS
Each job takes seconds to complete which is expensive in the long run. Now it takes less than a minute for 3000 jobs! This is the final result.
Background
You have probably used other languages that have developer-friendly ways to multitask complex jobs. Unfortunately, doing this in JavaScript has always been complicated.
For the longest time, JavaScript and NodeJS were limited by the event loop. Code executes asynchronously, but not in true parallel fashion. However, that changed with the release of worker threads in NodeJS.
After discovering this concept, I immediately want to test its full capability. Unfortunately, the existing libraries are overly complex and/or lack true parallel capabilities.
Goal
I want a package that is perfect for small projects. Something that provides a job queue without relying on databases or the filesystem while proving obvious performance benefits.
Problem
Many packages are half-baked implementation of concurrency. For example, some packages have code that look like this.
The above code is incorrect because it leaves out some common edge cases:
- What happens if the pool must terminate abruptly?
- What happens if the amount of jobs is fewer than the thread count?
- What if one job takes significantly longer than the other?
The last question is the nail in the coffin. If most jobs take 2 seconds to process, but one takes 3 hours, then the entire pool must wait for 3 hours until all the workers are freed up.
Some libraries work around this problem by spawning additional workers, but that means the developer lacks full control over the number of workers. The pool should be deterministic.
Initial Solutions
Since Promise.all
is blocking, I immediately thought that Promise.any
or Promise.race
must be the answer to true parallelism, but I was wrong. Actually, no Promise
methods alone are sufficient enough for multitasking.
So it's settled, Promise.race
is likely the solution, and Promise.any
is flawed because Promise.any
must sucessfully complete at least on promise, or wait for all to fail.
What happens if all jobs fail besides one that takes 3 hours? Again, the entire pool must wait 3 hours before the job completes or causes an Aggregate Error
.
Unfortunately, Promise.race
is not the correct solution either. Sure, it solves the problem of hanging workers, but there is another edge case. How will you retrieve the result from multiple workers if the quickest promise is the only one handled? After all, quick is not always right.
Jobs Hold the Thread
The solution to the Promise.race
problem is workers themselves. It does not matter when the promise resolves because the worker is running in the background.
My solution is, every worker takes a thread id from the pool. When the worker finishes executing it gives the id back. This allows the pool to dynamically allocate threads.
Halting
The last goal is halting all pool execution. Even if there is a 3-hour-long job running, it halts immediately. Honestly, this is more difficult to figure out than the other problems with promises.
My first instinct is rejecting the promise, but this is problematic. I noticed that passing reasons
through the reject
call meant Promise.race
can only resolve one reason
. Yet, promising all reasons puts me back to the drawing board.
Even worse, rejecting the promise allows the main event loop to terminate, but the workers turn into zombies! 3 hours later-- worker output is still clogging your terminal!
Thankfully, I made the discovery. Threads must explicitly terminate the worker. This makes the termination process completely deterministic thus no data compromising. The promise resolves after the job promise race settles.
Project Success!
All the tests pass and I met my goals! The pool of workers executes jobs asynchronously without any external tools. Itβs on NPM. If you are interested in how to use the library, keep reading!
npm install jpool
Features
The amount of threads is variable, and all states are deterministic. A job will either pass, fail, or halt. This allows the pool to gracefully shut down or quit abruptly without zombies or runaway processes.
Basic Example (Main.js)
Cont. Example (Job.js)
See the Difference!
Each terminal window is processing the same set of jobs. From left to right, the programs use 1, 8, and 256 workers. Threads increase memory usage, but the benefits are worth it!
The end
The documentation needs work, otherwise, the package seems stable for v1.0.0. If you want to help, I am accepting PRs. Thank you for reading!