Quick tip: Using WebAssembly to implement Pearson Product-Moment Correlation in SingleStoreDB

Akmal Chaudhri - Feb 22 '23 - - Dev Community

Abstract

In this short article, we'll use WebAssembly to extend SingleStoreDB with Pearson Product-Moment Correlation.

Introduction

Continuing our short series of articles on statistical computations, we'll focus on another example and see how we can utilise WebAssembly with SingleStoreDB.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use Stats Demo Group as our Workspace Group Name and stats-demo as our Workspace Name.

Once we've created our database in the following steps, we'll make a note of our password and host name.

Create a Database

In our SingleStoreDB Cloud account, we'll use the SQL Editor to create a new database, as follows:



CREATE DATABASE IF NOT EXISTS test;


Enter fullscreen mode Exit fullscreen mode

Setup local Wasm development environment

We'll follow the steps described in the previous article to quickly create a local Wasm development environment. We'll also install and use the pushwasm tool.

Next, let's clone the following GitHub repo:



git clone https://github.com/singlestore-labs/singlestoredb-statistics


Enter fullscreen mode Exit fullscreen mode

Compile

We'll now change to the singlestoredb-statistics/correlation directory and build the code, as follows:



cargo build --target wasm32-wasi --release


Enter fullscreen mode Exit fullscreen mode

Deploy

Once the code is built, we'll create an environment variable:



export SINGLESTOREDB_CONNSTRING="mysql://admin:<password>@<host>:3306/test"


Enter fullscreen mode Exit fullscreen mode

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

Next, we'll use pushwasm to load the Wasm modules into SingleStoreDB, one-by-one:



pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_init

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_iter

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_merge

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_term

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corr2_termd

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_init

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_iter

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_merge

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name corrmat_term

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name vec_pack_f64

pushwasm udf --force --conn $SINGLESTOREDB_CONNSTRING --wit ./correlation.wit --wasm ./target/wasm32-wasi/release/correlation.wasm --name vec_unpack_f64


Enter fullscreen mode Exit fullscreen mode

All the Wasm UDFs should be successfully created.

Load and run SQL

In the file correlation.sql, we'll replace the one occurrence of blob with longblob.

We'll use a MySQL CLI client to connect to SingleStoreDB:



mysql --local-infile -u admin -h <host> -P 3306 --default-auth=mysql_native_password -p


Enter fullscreen mode Exit fullscreen mode

We'll replace the <host> with the value from our SingleStoreDB Cloud account.

Once connected, we'll switch to the test database:



USE test;


Enter fullscreen mode Exit fullscreen mode

We'll then execute the SQL statements from the correlation.sql file, as follows:



SOURCE correlation.sql


Enter fullscreen mode Exit fullscreen mode

This will create and load data into the iris table, and create some additional functions.

We can quickly check the table:



SELECT * FROM iris LIMIT 5;


Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:



+--------------+-------------+--------------+-------------+---------+
| sepal_length | sepal_width | petal_length | petal_width | species |
+--------------+-------------+--------------+-------------+---------+
|          4.7 |         3.2 |          1.3 |         0.2 | setosa  |
|            5 |         3.4 |          1.5 |         0.2 | setosa  |
|          5.4 |         3.9 |          1.3 |         0.4 | setosa  |
|          5.4 |         3.4 |          1.7 |         0.2 | setosa  |
|            5 |           3 |          1.6 |         0.2 | setosa  |
+--------------+-------------+--------------+-------------+---------+


Enter fullscreen mode Exit fullscreen mode

There are two aggregate functions:

  1. corr2d(): Returns the correlation coefficient between two variables
  2. corr2(): Returns more details of the relationship between two variables, including linear regression

Run Wasm in the database

We can test the Wasm functions with some examples from the GitHub repo.

  • Compute the correlation between sepal_width and sepal_length.


SELECT species, corr2d(sepal_width, sepal_length)
FROM iris
GROUP BY species;


Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:



+------------+-----------------------------------+
| species    | corr2d(sepal_width, sepal_length) |
+------------+-----------------------------------+
| virginica  |                0.4572278163941236 |
| versicolor |                 0.525910717282782 |
| setosa     |                0.7467803732639016 |
+------------+-----------------------------------+


Enter fullscreen mode Exit fullscreen mode
  • Compute the correlation between sepal_width and sepal_length and return the results as JSON.


SELECT species, corr2(sepal_width, sepal_length)
FROM iris
GROUP BY species;


Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:



+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| virginica  | {"b0":3.9068364663867405,"b1":0.9015344766688834,"corr":0.4572278163941236,"n":50,"nmiss":0,"r2":0.2090572760845384,"sse":15.670790000392019,"x_avg":2.9739999999999993,"y_avg":6.587999999999999} |
| versicolor | {"b0":3.53973471502592,"b1":0.8650777202072483,"corr":0.525910717282782,"n":50,"nmiss":0,"r2":0.2765820825528903,"sse":9.444365595856668,"x_avg":2.7700000000000005,"y_avg":5.935999999999998}     |
| setosa     | {"b0":2.6446596755601486,"b1":0.6908543956816411,"corr":0.7467803732639016,"n":50,"nmiss":0,"r2":0.5576809258921721,"sse":2.6929269869833408,"x_avg":3.418000000000002,"y_avg":5.005999999999999}  |
+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Enter fullscreen mode Exit fullscreen mode
  • Compute the matrix of correlation coefficients between sepal_length, sepal_width, and petal_width.


SELECT corrmat(vec_pack_f64([sepal_length, sepal_width, petal_width]))
FROM iris;


Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:



+-----------------------------------------------------------------------+
| corrmat(vec_pack_f64([sepal_length, sepal_width, petal_width])) |
+-----------------------------------------------------------------------+
| [1,-0.10936924995068921,1,0.81795363336916715,-0.35654408961381734,1] |
+-----------------------------------------------------------------------+

Enter fullscreen mode Exit fullscreen mode




Summary

In this short article, we have seen how we can extend SingleStoreDB with Wasm to provide support for statistical correlation.

Acknowledgements

I thank Oliver Schabenberger for his work on the Wasm modules and the code examples and documentation in the GitHub repo.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player