Quick tip: Ibis, Pandas and SingleStoreDB

Akmal Chaudhri - Mar 10 '23 - - Dev Community

Abstract

The Ibis Project provides the power of Python Analytics with SQL. In this short article, we'll see how to use Ibis to connect to SingleStoreDB Cloud and run a few commands to demonstrate the integration.

The notebook file used in this article is available on GitHub.

Introduction

In previous articles, we've discussed various options to connect to SingleStoreDB. These have included using, for example:

Another connection option is the Ibis backend being developed by SingleStore Labs. In this short article, we'll test this implementation with some commands.

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use the Free Shared Tier and take the default names for the Workspace and Database.

Notebook

First, we'll create some imports:

import ibis
import pandas as pd

ibis.options.interactive = True
Enter fullscreen mode Exit fullscreen mode

We'll now read the iris.csv file and look at the data:

url = "https://gist.githubusercontent.com/VeryFatBoy/9af771d443f5ec4dd6eec8d69a062638/raw/c03ef25a97f23a48ee408ac02114195b663a2364/iris.csv"

iris_df = pd.read_csv(url)

iris_df.head(5)
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa
Enter fullscreen mode Exit fullscreen mode

Next, let's check the Pandas Dataframe:

iris_df.info()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
Enter fullscreen mode Exit fullscreen mode

We'll now create a connection to SingleStoreDB:

conn = ibis.singlestoredb.connect(connection_url)
Enter fullscreen mode Exit fullscreen mode

Now we'll create a new table in SingleStoreDB using the Pandas Dataframe and then look at the data:

iris_tbl = conn.create_table("iris", iris_df, force = True)

iris_tbl.head(5)
Enter fullscreen mode Exit fullscreen mode

The result should be similar to the following:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ sepal_length ┃ sepal_width ┃ petal_length ┃ petal_width ┃ species         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ float64      │ float64     │ float64      │ float64     │ string          │
├──────────────┼─────────────┼──────────────┼─────────────┼─────────────────┤
│          4.9 │         3.0 │          1.4 │         0.2 │ Iris-setosa     │
│          5.0 │         3.4 │          1.5 │         0.2 │ Iris-setosa     │
│          4.9 │         3.1 │          1.5 │         0.1 │ Iris-setosa     │
│          4.6 │         3.6 │          1.0 │         0.2 │ Iris-setosa     │
│          7.0 │         3.2 │          4.7 │         1.4 │ Iris-versicolor │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────────────┘
Enter fullscreen mode Exit fullscreen mode

We can get some further details:

iris_tbl.info()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

┏━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┓
┃ name         ┃ type    ┃ nullable ┃ nulls ┃ non_nulls ┃ null_frac ┃ pos  ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━┩
│ string       │ string  │ boolean  │ int64 │ int64     │ float64   │ int8 │
├──────────────┼─────────┼──────────┼───────┼───────────┼───────────┼──────┤
│ sepal_length │ float64 │ True     │     0 │       150 │       0.0 │    0 │
│ sepal_width  │ float64 │ True     │     0 │       150 │       0.0 │    1 │
│ petal_length │ float64 │ True     │     0 │       150 │       0.0 │    2 │
│ petal_width  │ float64 │ True     │     0 │       150 │       0.0 │    3 │
│ species      │ string  │ True     │     0 │       150 │       0.0 │    4 │
└──────────────┴─────────┴──────────┴───────┴───────────┴───────────┴──────┘
Enter fullscreen mode Exit fullscreen mode

Basic schema information can also be viewed:

iris_tbl.schema()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

ibis.Schema {
  sepal_length  float64
  sepal_width   float64
  petal_length  float64
  petal_width   float64
  species       string
}
Enter fullscreen mode Exit fullscreen mode

We can also find the details of the CREATE TABLE statement:

conn.show.create_table("iris")
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

|  Name | CreateTable                                                                                                                                                                                                                                                                                                                                                                                         |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  iris | CREATE TABLE `iris` ( `sepal_length` double DEFAULT NULL, `sepal_width` double DEFAULT NULL, `petal_length` double DEFAULT NULL, `petal_width` double DEFAULT NULL, `species` text CHARACTER SET utf8 COLLATE utf8_general_ci, SORT KEY `__UNORDERED` () , SHARD KEY () ) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' |
Enter fullscreen mode Exit fullscreen mode

We can count the number of different species:

iris_tbl.species.value_counts()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ species         ┃ species_count ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ string          │ int64         │
├─────────────────┼───────────────┤
│ Iris-virginica  │            50 │
│ Iris-setosa     │            50 │
│ Iris-versicolor │            50 │
└─────────────────┴───────────────┘
Enter fullscreen mode Exit fullscreen mode

Filtering is also possible. Here, for example, we want to limit the results by Iris-versicolor where the petal_length is greater than 4.5:

res = iris_tbl[iris_tbl.species.like("Iris-versicolor")][iris_tbl.petal_length > 4.5]
Enter fullscreen mode Exit fullscreen mode

Using the filter we can get a count:

res.count()
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

14
Enter fullscreen mode Exit fullscreen mode

Sorting is also possible. This is in ascending order for petal_length:

res.order_by("petal_length")
Enter fullscreen mode Exit fullscreen mode

The result should be as follows:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ sepal_length ┃ sepal_width ┃ petal_length ┃ petal_width ┃ species         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ float64      │ float64     │ float64      │ float64     │ string          │
├──────────────┼─────────────┼──────────────┼─────────────┼─────────────────┤
│          6.6 │         2.9 │          4.6 │         1.3 │ Iris-versicolor │
│          6.5 │         2.8 │          4.6 │         1.5 │ Iris-versicolor │
│          6.1 │         3.0 │          4.6 │         1.4 │ Iris-versicolor │
│          6.3 │         3.3 │          4.7 │         1.6 │ Iris-versicolor │
│          6.1 │         2.8 │          4.7 │         1.2 │ Iris-versicolor │
│          7.0 │         3.2 │          4.7 │         1.4 │ Iris-versicolor │
│          6.1 │         2.9 │          4.7 │         1.4 │ Iris-versicolor │
│          6.7 │         3.1 │          4.7 │         1.5 │ Iris-versicolor │
│          5.9 │         3.2 │          4.8 │         1.8 │ Iris-versicolor │
│          6.8 │         2.8 │          4.8 │         1.4 │ Iris-versicolor │
│            … │           … │            … │           … │ …               │
└──────────────┴─────────────┴──────────────┴─────────────┴─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Finally, we can convert the results back into a Pandas Dataframe, for additional processing, as follows:

another_iris_df = res.execute()
Enter fullscreen mode Exit fullscreen mode

Summary

The Ibis Project provides another way to work with SingleStoreDB using Python and Pandas. The SingleStore Labs GitHub repo contains example notebooks that use other popular datasets. Check it out.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player