Skip to main content

Data operations

Snaplet has four operations for manipulating the data in a snapshot:

  • Transform: Make existing data suitable for development by transforming the original value into a new one
  • Exclude: Remove data in specific tables
  • Reduce (Subset): Capture a subset of data whilst keeping referential integrity intact
  • Generate: Seed values when you don't have any data

These operations are defined as code via config files and JavaScript functions. This gives a development team control over the shape of their data, and introduces a "gitops style workflow" for database snapshot management.

In this guide we're going to focus on transforming and excluding data.

Transforming data

In the previous step Snaplet generated a .snaplet/transform.ts file, where it identified columns that may contain personally identifiable information (PII), and associated a JavaScript function to those columns so that the values in the snasphot are anonymized.

The JavaScript functions are mapped to the structure of your database. As an example, if you have a User table with an email column you can transform the original value to a new one with the following:

// .snaplet/transform.ts
export const config = () => {
return {
public: {
User: ({ row }) => {
return {
email: 'user_' + row.id + '@example.org',
};
},
},
};
};

The function assigned to public.User receives the existing row values in the row variable. Here we used the id value to create a new email address value: "user_1@example.org", "user_2@example.org", etc...

Better fake values with Copycat

Copycat is our open-source library for generating fake-data that includes templates for names, addresses, phone numbers and many other common transformations!

It produces deterministic values, so for any given input it'll always produce the exact same output! For example:

copycat.email('a-real-email@domain.com'); // => beth.cranshaw@example.org
copycat.email('a-real-email@domain.com'); // => beth.cranshaw@example.org
copycat.email('1'); // => jane.maplemoth@example.org

Having predictable, deterministic values is helpful when coding or testing, as your transformed values are always handled consistently.

Exclude

Databases often have tables that contain loads of machine generated data, like logs, that aren't really necessary or helpful during development. Since the code doesn't operate against this data, it can be safely excluded. Associating a false value to a table will prevent Snaplet from copying data. Snaplet will still create the table's structure but skip the data, speeding up both snapshot capture and restoration.

// .snaplet/transform.ts
export const config = () => {
return {
public: {
AuditLog: false,
},
};
};

Reduce (Subset)

When creating a representative snapshot of your database to code against, you will typically need to capture only a small portion of the data in that database. Snaplet lets you capture a subset of your data when creating a snapshot, which reduces your snapshot's size and by extension, the time spent uploading and downloading snapshots. This is especially useful if you're connecting directly to a production or staging database that is many GBs in size. For the purpose of this getting started guide, we won't be subsetting your snapshot, but you can read more about subsetting your database snapshots here.

Debug transformations with "live preview"

Using JavaScript functions to tranform your data gives you an incredible amount of flexibility, but that flexibility may come at the cost of introducing unintentional bugs. Snaplet provides a live preview environment via the snaplet proxy command to debug transformations. When you boot up the Snaplet proxy it connects to your database, reads the transform.ts file and waits for a client connection. Then, you can connect to the proxy with your SQL tool, and validate your transformations in real time. This allows you to test and change your JavaScript transformations in real-time without any changes to the original database.

Other data operations...

In this chapter we covered transforming and excluding data, but Snaplet can also reduce (subset) and generate data. Read more about those data operations in our data operations reference.