Joining data tables in Tableau and Alteryx

5 February 2015
How does a join work?A join operation combines fields from two or more tables in a database and enables us to analyse the resulting data set. The result of a join is a horizontally 'extended' data table as the join adds fields (columns) compared to the starting stage. Understanding the different join types is essential for using Tableau. We at The Information Lab are also advocates of Alteryx, an analytics platform providing intuitive workflow for data blending and advanced analytics. As a powerful data personalization tool, Alteryx of course offers all the data joining possibilities.Whenever I start to learn or explain a concept, I try to find a simple example or analogy and walk through that. Usually what works in small and simple also works on large and complex. Hence we will overview the join types on two small data tables. These are two sheets in an Excel file named PlayerGeography. The first table (Players) contains pool billiard players and their home countries while the other (Geography) lists countries and the respective regions and continents.Players table                                       Geography tablePlayers_table                 Geography_tableWe will join these tables by matching the Country field in the Players table with the Geography_Country field in the Geography table and arrive to a more informative dataset, not only showing player names and their countries but the region and continent the country belongs to. The join could be performed several ways, depending on the logic we want to follow. Let’s walk through these join types and see how they are performed in Tableau (8.2) and Alteryx.A) Inner JoinIt combines only the common records from the tables. The below image is a visual representation of an Inner Join on table 1 and table 2. This image may be familiar from set theory and Venn diagrams.Inner join imageHow can we do this in Tableau? The inner join operation is readily available in the data connection editor window after we dragged the two tables onto the canvas. As Tableau shows us the resulting dataset right away at the bottom of the window, we see that in fact we arrive to the intersection of those two tables.Inner joinThe Join tool in Alteryx has two input nodes, a left and a right part. Notice that it has three output nodes, the L / J / R ones, L standing for a Left Excluding Join, J = Inner Join and R = Right Excluding Join.Alteryx join toolThis modular setup requires a little bit different thinking compared to Tableau but in case of an inner join the middle output node (J) is just what we need.Alteryx inner joinB) Left JoinThis will include all records from the first table (Players) and all matching records from the second table.Left join imageThis is straightforward to setup in Tableau. It seems that Alex Pagulayan's country is not listed in the Geography table.Left outer joinWe arrive to the first point where we have to adjust to the slightly different logic of Alteryx. Actually, when working with Alteryx you realize that this is even more user friendly. The three output nodes of the Join tool (L/J/R) altogether add up to a Full Outer Join. So the L and J nodes have to be unioned for a Left Join.Alteryx left joinC) Right JoinA 'Right Join' returns all records from the second data table (Geography) and the matching records from the first table.Right join imageOur example is built on an Excel file, where a Right Join is only available upon using the Legacy Connection. The default Tableau connection does not rely on the Microsoft Jet data engine driver but in some cases (like a Right Join) require the capabilities of the JET driver. The legacy connection can be accessed when we open the Excel data source file in Tableau.Legacy ConnectionFor more information on the Legacy and default Excel connections, please visit this link.With a Legacy connection our task in Tableau becomes simple:Right outer joinWe do not see Alex Pagulayan in the resulting data set as Canada is not included in the Geography table. The Alteryx solution is also unambiguous. Building on the example of the Left Join, the Right Join is merely the union of the J and R output nodes from the Join tool.Alteryx right joinD) Full Outer JoinThis type of join returns all records from the Players table and all records from the Geography table.Full outer join imageWe hit the first instance in Tableau when performing a join is not merely a matter of dragging data tables to the canvas and selecting the required join type. For a Full Outer Join a custom SQL statement has to be written.Full outer join not availableFull outer joinLater on for a Left Excluding Join and a Right Excluding Join we will also have to write custom SQL statements and a Full Outer Join syntax has three parts. It is built from the Left Excluding Join + the Right Excluding Join + the Inner Join. The 'excluding' joins are achieved by adding the 'Where' clause at the end of the Left or Right Join SQL blocks, thus excluding all those records that do not comply with the 'Where' clause.Tableau's profile is visual analytics, data visualisation, being able to ask questions from your data very quickly. It is a front-end tool. Alteryx is a back-end tool, enabling fast and flexible data personalization, data cleaning, etc. So we expect a solution of just a few clicks on a Full Outer Join from Alteryx.Alteryx full outer joinE) Left Excluding JoinThis operation returns only those records from the first table that do not have a match in the second table.Left excluding join imageIn our example this will leave only Alex Pagulayan's record in the result set. Well, he met no matching performance in 2004 when he won the world pool championship.Let's see the custom SQL for Tableau's data connection:Left join minus intersectionIn Alteryx the desired output is simply node 'L' of the Join tool.Alteryx left join minus intersectionF) Right Excluding JoinThis is nothing special after the getting familiar with the Left Excluding Join. A mirrored image, in every respect.Right excluding join imageThis still requires a custom SQL script in Tableau.Right join minus intersectionThe equivalent in Alteryx:Alteryx right join minus intersectionG) Outer Excluding JoinOuter excluding join imageIn Tableau we can produce this set with a custom SQL statement that unions the left and right parts of the desired set.Outer join minus intersectionIn Alteryx we have to union the outputs from the L and R nodes.Alteryx full outer join minus intersectionWe have discussed joining on only one key field but even if the join is executed on a combination of fields, the logic remains unaffected.Finally, let's review the various join type solutions in Tableau's custom SQL and Alteryx:Joins - Tableau custom SQL and AlteryxThe topic may have been a bit dry if you are not a data geek as me but I have included a name who is not a pool player. Finding that person in the list may be a playful challenge but to accomplish that you have to 'Inner Join' the 'Players' table with a list of professional pool players.... (OK, Google will just take you there, too.)
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Information Lab and data industry
Subscribe now
© 2024 The Information Lab