So, you want to have every change made to the data in a Salesforce Object stored.. forever?

Let’s explore how we used out of the box tools and a little coding to build a time machine, and what it looked like after I ran it on test data for several months.

First about our test data environment: We are using a test org with randomly generated accounts, and opportunities associated with those accounts. We also have a service that runs every few minutes to make changes to a small, random subset of the accounts.

The first step was to set up an collector app. This app will emit all of the changes as messages into Kafka.

Set up a Heroku App in a Private Space
Add a Heroku Private Postgres Database
Add Heroku Connect
Choose and Configure the objects to sync in Heroku Connect

We then needed to stream changes somewhere:

Add a Heroku Private Kakfa cluster to the app
Configure the Streaming Data Connector to point at the Heroku Connect tables

We were then mirroring our Salesforce Data into Heroku Postgres, preforming CDC on the data, and streaming the changes into Kafka.

We decided to use Node to listen to the event stream in Kakfa, and take the updates and insert them into a table on Postgres.

Created another Heroku App in the Private Space
Share the Kafka add on with the new app
Add a Heroku Private Postgres database
Create a table in the Postgres database - insuring that the validations would allow for duplicate values in things like External_id__c and sfid fields.
Write code that listens to Kafka for events, and inserts the events as new records into the TARDIS ..um.. Postgres.

This is not only scalable to many objects in a single Salesforce org, it’s also scaleable to many Salesforce orgs - including Work.com, Service Cloud and SalesCloud. Not only can you easily provide visibility across orgs, but you are capturing all changes and you can report on how records have evolved over time, as well as how records relate to each other in diffrent orgs.

What are the results now that we have all of this data flowing?

Looking at the collector app. We have just over 50k accounts total, and we are using only 350 MB of data. Picking an account at random, we can see that we have the most up to date info for that account in Postgres.

~ ☯ heroku pg:psql --app ggn-pg2k-test 

--> Connecting to postgresql-rigid-10449
psql (12.4)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

ggn-pg2k-test::DATABASE=> SELECT pg_size_pretty(pg_total_relation_size('"salesforce"."account"'));
 pg_size_pretty 
----------------
 349 MB
(1 row)

ggn-pg2k-test::DATABASE=> select count(*) from salesforce.account;
 count  
--------
 501646
(1 row)

ggn-pg2k-test::DATABASE=> select name, sfid, score__c from salesforce.account 
where External_id __c = '7f4acaad-6c77-4342-a533-83bb580aaf681586878813' ORDER BY score__ c;

           name | sfid | score__c 
--------------------------+--------------------+----------
 Borer, Braun and Gutmann | 001f400001MDeYzAAL | 2
(1 row)

Switching over to the time machine app, we have a total of 88,577,70 records on the same data source - so on average over 160 changes per record stored. Looking in detail at a single account, we can see a total of 21 changes on it.

The total data storage for all of the records and changes took just over 4GB of space.

~ ☯ heroku pg:psql --app pg2k-gregs-drain

--> Connecting to postgresql-rugged-63570
psql (12.4, server 12.3 (Ubuntu 12.3-1.pgdg16.04+1))
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

pg2k-gregs-drain::DATABASE=> SELECT pg_size_pretty(pg_total_relation_size('"account"'));
 pg_size_pretty 
----------------
 4274 MB
(1 row)

pg2k-gregs-drain::DATABASE=> select count(*) from account;
  count  
---------
 8857770
(1 row)

pg2k-gregs-drain::DATABASE=> select name, sfid, score__c, id from account 
where External_id__c = '7f4acaad-6c77-4342-a533-83bb580aaf681586878813' ORDER BY id;

           name | phone | score__c | sfid | id    
--------------------------+---------------------+----------+--------------------+---------
 Borer, Braun and Gutmann | 5555551212 | 1 | 001f400001MDeYzAAL | 33351
 Borer, Braun and Gutmann | 967-160-7974 x6487 | 15 | 001f400001MDeYzAAL | 34165
 Borer, Braun and Gutmann | 967-160-7974 x6487 | 15 | 001f400001MDeYzAAL | 34699
 Borer, Braun and Gutmann | 967-160-7974 x6487 | 15 | 001f400001MDeYzAAL | 35523
 Borer, Braun and Gutmann | 967-160-7974 x6487 | 15 | 001f400001MDeYzAAL | 376946
 Borer, Braun and Gutmann | 967-160-7974 x6487 | 15 | 001f400001MDeYzAAL | 971802
 Borer, Braun and Gutmann | 967-160-7974 x6487 | 15 | 001f400001MDeYzAAL | 1041819
 Borer, Braun and Gutmann | 5555551212 | 15 | 001f400001MDeYzAAL | 3006621
 Borer, Braun and Gutmann | 1-151-183-4668 x379 | 5 | 001f400001MDeYzAAL | 3006628
 Borer, Braun and Gutmann | 1-151-183-4668 x379 | 5 | 001f400001MDeYzAAL | 3006634
 Borer, Braun and Gutmann | 1-151-183-4668 x379 | 5 | 001f400001MDeYzAAL | 3006648
 Borer, Braun and Gutmann | 5555551212 | 5 | 001f400001MDeYzAAL | 5140095
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 5140108
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 5140114
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 5140120
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 5390250
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 5620956
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 7002900
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 7031300
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 7788503
 Borer, Braun and Gutmann | 504.976.9652 x047 | 2 | 001f400001MDeYzAAL | 7906202
(21 rows)

This pattern can be used for Salesforce objects via Heroku Connect, or any table that you have on Heroku Postgres. As you can see, it is super easy to set up - the only coding you have to do, is to create the listener who will grab all of the updates, and store them into your target database.

This is really the power of the Heroku Platform - using it’s tools to reduce the amount of work needed to be done to accomplish powerful things, in data as well as in traditional apps.

Building a Time Machine for Salesforce Data, the Easy Way