Skip to main content

Transforming your data

With Sundial, you transform raw data into clean, structured datasets using modular SQL models. Define each step of your transformation logic in SQL, and Sundial handles dependencies, execution order, and testing automatically. Built for reliability and collaboration, Sundial lets you iterate quickly, document clearly, and scale confidently — turning messy data into trusted insights.

Tables

A Table is the fundamental primitive for representing data in Sundial. It represents data organized in rows and columns, with a defined schema that specifies the name and data type of each column.

Each table has a unique name, a schema defining its columns, optional data quality tests, and lineage tracking showing upstream and downstream dependencies.

There are two types of tables:

  • Sources — Tables linked to external warehouses or data lakes.
  • SQL Views — Tables computed on-demand at query time, or pre-computed and stored as materialized views.

Together these can be used to define even the most complex business logic and data transformations in a consistent and standard manner, letting you optimize for both performance and storage costs while maintaining a consistent interface for working with your data.

Naming tables

Table names can only contain alphanumeric characters and underscores, and cannot begin with a number. Each name must be unique within your workspace.

Valid names: user_orders, active_users_2023, revenue_by_month Invalid names: 2023_users, user-orders, revenue.by.month

A good practice is to name tables based on what data they represent, not how they achieve it. For example, prefer active_users over join_users_and_orders.

How it works

The transformation workflow follows these stages:

  1. Add a Source — Register your raw data from a warehouse, data lake, or external system. Sources make it easy to understand how data is structured and how it feeds downstream models.

  2. Declare transformations using SQL Views — Write modular SQL models to clean, join, filter, and enrich your raw data. Views are layered, reusable, and version-controlled so teams can collaborate and build on each other's work.

  3. Materialize — Pre-compute and store your views as physical tables for faster downstream queries.

  4. Schedule — Configure automated refreshes to keep your materialized views up to date.

  5. Backfill — Reprocess historical data when business logic changes or data quality issues are discovered.

  6. Maintain your pipeline — Add data quality tests, trace lineage, monitor freshness, and preview pending changes to keep your pipeline robust and transparent.

Reference

For detailed specifications on each concept, see the ETL Reference:

TopicReference
TestingTesting
LineageLineage

Still have questions?