22 October 2014
I start this post with a very theoretical definition of the weighted average, where instead of each of the data points contributing equally to the final average (as with the simple arithmetic mean), some data points contribute more than others. These ’contributions’ are called weights.That is clear but how does it link to Tableau and aggregating data? Let me start from a bit farther away....What is aggregation?Before defining ’aggregation’, let’s see when we use it. Our data sources usually / optimally contain disaggregated data, meaning each row in the data file refers to the smallest individual entity we have information about and we see different measures at this level, let it be:- a single product variant’s sales value in a country,- a respondent’s evaluation score to a specific market research question or- the duration of a distinct call that a call center received, etc.Tableau’s power is in its ability to ’roll-up’ this highly granular, item level data to higher, more and more topline level. Sticking to the earlier examples these higher levels may be- a brand’s sales value in the country (made up of its brand variants),- a brand’s sales value in a region (made up of its product variants’ performances in the countries within the region) or- the total duration of all the calls the call center received within a specific day. In these cases we are no longer analyzing individual rows in our dataset but combining them along similarities (like the brand they belong to or the region they belong to or the day they belong to).This process of viewing numeric values (measures) at higher, more and more summarized levels of the data is ’aggregation’. So when aggregating we move from the very details to summary while disaggregating means we split the results back to row level, technically speaking.Aggregation has several types. ’Type’ refers to what we calculate from the numeric values when we roll-up the data. Is it the sum, average, median, etc.? Tableau offers various aggregation types but this post is not about discussing these options.A calculated field’s results can be:
- calculated at row level and then those row level results can be ’rolled-up’/aggregated or
- the factors in the formula can be aggregated first to a higher level where the calculation is done.