Right Test, Right Place: how dbt can improve Tableau development

14 February 2025

One of the things I really like about dbt is it’s testing capabilities, which can prevent goofs from making it into the pipeline. In this blog post, let's take a whistle-stop tour of the different kinds of testing that dbt can offer, and how it can make analytics engineering and visualisation a smoother experience for all.

Photo by Hans Reniers on Unsplash

For Tableau developers, we have all been in a situation where it took a month to get access to a data source, only to find upon getting down to work, that something isn’t quite right. This could be particular columns in the wrong type, particular categories being missing, or even particular years being absent. We’re at the business end of the data pipeline, and errors like this not only disrupt our work - and as the face of data to the stakeholder, often we’re the face of the error! - but also take time to traceback to the source.

In a previous role as a test manager, I was also on the other end of this problem - taking ages to get a data source through development and into production, keeping it stable stable, to a good level of finesse… and then a couple of weeks later, the business tell me something is missing. Why couldn’t I have known that three months ago?? 💥

Sound familiar?

If so, dbt may be able to help! One of the things I really like about dbt is it’s testing capabilities, which can prevent goofs from making it into the pipeline.

Out of the box tests: Ingestion and Raw Data

dbt has four kinds of quick data tests available, and the ability to write some custom data tests (which we’ll talk about further down). These are unique, not_null, accepted_values, and relationships. These are great all the way through the data pipeline but particularly at the ingestion and raw data phases, because failing these tests probably indicate something pretty big is going wrong.

unique is our guide to testing that assumption that our ID columns are at the fidelity level we think they are - which is often the first one we see which looks like it changes by row! Many-to-many matches for a given value is the source of the dreaded asterisk in Tableau, where selecting a filter item produces multiple results (generally where we were only expecting one).
not_null checks if there are any null values present in a column. Null, zero, and blank are conceptually different - the latter two do actually count as values (integers and strings, respectively) while null does not. This is particularly useful in preventing that other little gotcha in Tableau, particularly around maps - the null values toast message, in the bottom corner of the view. Whether or not nulls can/should be expected should be part of any requirements gathering.
accepted_values tests that each value in the column matches one from a given list (though this can also be non-strings, like integers or booleans). I find this most useful in Tableau for ensuring that any hierarchies I’ve made in the visualisation don’t end up drifting away from the data, and also that there hasn’t been a change in the business that hasn’t been communicated properly with the data team - like new products or markets!
relationships tests that all values in the table being tested (a “child”) have a corresponding record in another, pre existing table (a “parent”). Missing customers or suppliers have been the source of several wild-goose chases in my data career, because they could represent a customer who would get a poor experience from not being included in the business systems properly, or, very rarely, something untoward (e.g. invoices missing purchase orders are a fraud warning sign).

dbt_utils: Transformational Testing

dbt is a versatile product because of its ecosystem of additional tools, one of which is the dbt-utils package - not included with core installations but arguably an essential addition to any deployment of dbt. dbt-utils is a mini toolbox of macros and tests which are an essential part of any data testing toolkit, but they’re particularly useful in the T of our ETL/ELT stage - where the heavy lifting of getting the data into shape is going on, but also where bumps and bruises can occur.

equal_rowcount does what it says on the tin - do both of these relations have the same number of rows? This is great when creating datasets which may be distilled on particular complete time periods, such as financial or rail periods. Organisations working like this will often have a date table in their models to accommodate for their particular way of doing things, and this test can ensure that only completed periods are pushed to an end state data-source.
recency helps us answer that great, frequent question: “when was this last refreshed?”. The recency test checks for data which is as recent as a given interval (hour, day, etc) from the current timestamp (i.e. when the test is being conducted). In a very stable pipeline, this is a critical test for end-stage models, because it may indicate an issue up stream from the data which needs sorting.
relationships_where is built from relationships in the core data tests, but with a little twist - it allows you to specify records where you might know of a pre-existing nuance in the data, but it’s not a critical problem to fail a test on. With clients, I most often see this with the inevitable test records that have somehow ended up in the production data - they might be required for testing further downstream, so can’t be removed, but are unlikely to have corresponding data in other production tables.

Custom Schema Data Tests: Refining Data Sources

The tests we’ve talked about above are drawn from the core of data as a practice - checking values and structures. As we move further down the pipeline and closer to the visualisation, we’re more likely to need tests that reflect business requirements rather than data requirements, which is where custom tests come in.

Singular data tests are super simple and easy to add in - they’re just select SQL statements which will return any records which fail the test, such as integer values which might be greater than a threshold you set, or outside of a geographic range specified as latitude or longitude figures. These are drawn from business requirements as one-offs for that particular data stream, stored in the tests directory, and generally pretty easy for people to interpret.
Generic data tests are a hybrid - they work like our core tests, but they’re designed for your business needs. In particular, they work like core tests because they are parameterised, which means that the block can be re-used for multiple models! This is great when you may have organisation-wide, but nonetheless very custom, requirements to keep all your data in line - this could be ensuring that certain ID numbers are patterned in a particular way, or fields contain particular characters (email addresses must have @ in, for example). These home brew generic tests can help your analytics engineers work in a more efficient way, which gets more data into the hands of your analysts! You can find out more about how to build these in dbt’s documentation, here.

I hope this has given you some inspiration on how to utilise dbt to make life as an analyst or engineer easier - a little bit goes a long way to ensuring that we set each other up for success!

Author:

Louisa O'Brien

View More Posts