5 February 2015
How does a join work?A join operation combines fields from two or more tables in a database and enables us to analyse the resulting data set. The result of a join is a horizontally 'extended' data table as the join adds fields (columns) compared to the starting stage. Understanding the different join types is essential for using Tableau. We at The Information Lab are also advocates of Alteryx, an analytics platform providing intuitive workflow for data blending and advanced analytics. As a powerful data personalization tool, Alteryx of course offers all the data joining possibilities.Whenever I start to learn or explain a concept, I try to find a simple example or analogy and walk through that. Usually what works in small and simple also works on large and complex. Hence we will overview the join types on two small data tables. These are two sheets in an Excel file named PlayerGeography. The first table (Players) contains pool billiard players and their home countries while the other (Geography) lists countries and the respective regions and continents.Players table Geography table We will join these tables by matching the Country field in the Players table with the Geography_Country field in the Geography table and arrive to a more informative dataset, not only showing player names and their countries but the region and continent the country belongs to. The join could be performed several ways, depending on the logic we want to follow. Let’s walk through these join types and see how they are performed in Tableau (8.2) and Alteryx.A) Inner JoinIt combines only the common records from the tables. The below image is a visual representation of an Inner Join on table 1 and table 2. This image may be familiar from set theory and Venn diagrams.How can we do this in Tableau? The inner join operation is readily available in the data connection editor window after we dragged the two tables onto the canvas. As Tableau shows us the resulting dataset right away at the bottom of the window, we see that in fact we arrive to the intersection of those two tables.The Join tool in Alteryx has two input nodes, a left and a right part. Notice that it has three output nodes, the L / J / R ones, L standing for a Left Excluding Join, J = Inner Join and R = Right Excluding Join.This modular setup requires a little bit different thinking compared to Tableau but in case of an inner join the middle output node (J) is just what we need.B) Left JoinThis will include all records from the first table (Players) and all matching records from the second table.This is straightforward to setup in Tableau. It seems that Alex Pagulayan's country is not listed in the Geography table.We arrive to the first point where we have to adjust to the slightly different logic of Alteryx. Actually, when working with Alteryx you realize that this is even more user friendly. The three output nodes of the Join tool (L/J/R) altogether add up to a Full Outer Join. So the L and J nodes have to be unioned for a Left Join.C) Right JoinA 'Right Join' returns all records from the second data table (Geography) and the matching records from the first table.Our example is built on an Excel file, where a Right Join is only available upon using the Legacy Connection. The default Tableau connection does not rely on the Microsoft Jet data engine driver but in some cases (like a Right Join) require the capabilities of the JET driver. The legacy connection can be accessed when we open the Excel data source file in Tableau.For more information on the Legacy and default Excel connections, please visit this link.With a Legacy connection our task in Tableau becomes simple:We do not see Alex Pagulayan in the resulting data set as Canada is not included in the Geography table. The Alteryx solution is also unambiguous. Building on the example of the Left Join, the Right Join is merely the union of the J and R output nodes from the Join tool.D) Full Outer JoinThis type of join returns all records from the Players table and all records from the Geography table.We hit the first instance in Tableau when performing a join is not merely a matter of dragging data tables to the canvas and selecting the required join type. For a Full Outer Join a custom SQL statement has to be written.Later on for a Left Excluding Join and a Right Excluding Join we will also have to write custom SQL statements and a Full Outer Join syntax has three parts. It is built from the Left Excluding Join + the Right Excluding Join + the Inner Join. The 'excluding' joins are achieved by adding the 'Where' clause at the end of the Left or Right Join SQL blocks, thus excluding all those records that do not comply with the 'Where' clause.Tableau's profile is visual analytics, data visualisation, being able to ask questions from your data very quickly. It is a front-end tool. Alteryx is a back-end tool, enabling fast and flexible data personalization, data cleaning, etc. So we expect a solution of just a few clicks on a Full Outer Join from Alteryx.E) Left Excluding JoinThis operation returns only those records from the first table that do not have a match in the second table.In our example this will leave only Alex Pagulayan's record in the result set. Well, he met no matching performance in 2004 when he won the world pool championship.Let's see the custom SQL for Tableau's data connection:In Alteryx the desired output is simply node 'L' of the Join tool.F) Right Excluding JoinThis is nothing special after the getting familiar with the Left Excluding Join. A mirrored image, in every respect.This still requires a custom SQL script in Tableau.The equivalent in Alteryx:G) Outer Excluding JoinIn Tableau we can produce this set with a custom SQL statement that unions the left and right parts of the desired set.In Alteryx we have to union the outputs from the L and R nodes.We have discussed joining on only one key field but even if the join is executed on a combination of fields, the logic remains unaffected.Finally, let's review the various join type solutions in Tableau's custom SQL and Alteryx:The topic may have been a bit dry if you are not a data geek as me but I have included a name who is not a pool player. Finding that person in the list may be a playful challenge but to accomplish that you have to 'Inner Join' the 'Players' table with a list of professional pool players.... (OK, Google will just take you there, too.)