Sometimes the data you want to analyse and visualise is nice and accessible in your corporate data warehouse, cleansed, in the right format, and all the fields you need are there – a data nirvana. More often, the data that you have access to needs some TLC; a bit pre-processing before analysis can start or some re-shaping to get the most out of it. If you follow this blog and others on Alteryx, you’ll see how the tool can be used to easily complete these types of tasks.
But what if the data you need isn’t readily available; it isn’t in spreadsheets or a database, it’s only accessible via the web. How can you get hold of it in a way that’s easy to repeat and update, easy to combine with your own data and easy to share the methods and techniques with colleagues? All without embarking on a lengthly IT let development project, of course!
Today we start a new series of blogs on using Alteryx to access web-based APIs. We’ll start off with a relatively easy one to work with, a website called Quandl, but as later posts dive into more complex APIs you’ll see that they all share a commonality in that we’ll be repeatedly using the Alteryx Download tool, followed by a series of tools that parse out what gets downloaded.
Quandl is a website that hosts numerical data. All sorts of numerical data. Their mission is ‘to make all the numerical data in the world available’ on their website. If you haven’t ever heard of Quandl, take 5 minutes out of reading this post and have a look around their site.
Quandl allows their users (free account registration is required for some datasets) to access data either directly through their site, or by downloading it via their API. We’re going to see how Alteryx can use this API to bring data from Quandl into your data stream.
The first thing we need to do is get the API address of the data we want to use. I want to fetch Tableau’s quarterly revenues so to get the API address for this data set, I simply click on one of the API buttons in the top right of the screen.
I’m going to pick the JSON option as I know Alteryx has a tool that will parse that format out for me nice and quick. Clicking the JSON button brings up a new URL:
https://www.quandl.com/api/v1/datasets/SF1/DATA_REVENUE_ARQ.json?auth_token=EVwa8c1T47yPjNqd15z2
This is the URL I need to use in Alteryx! So the next stop is to open up Alteryx and give this URL to a download tool. I can do that using a Text Input tool. Attach a browse tool to the output of the download tool and run the module.
The data from the URL is in the ‘DownloadData’ column. It looks pretty ugly, but it’s still in JSON format so parse it out by attaching a JSON Parse tool.
The result is two new columns – a JSON_Name column, and a JSON_ValueString column. The JSON_ValueString is all the good stuff, the JSON_Name is just for labels. Furthermore, we have our actual data mixed up with some metadata at the top of the set of results, much of which we don’t really need, AND within the data section the dates are all mixed up with the revenue figures. It looks a bit of a mess BUT now that we have the data coming thru we can get to work with manipulating it.
We need to accomplish the following:
- Isolate the actual data from the meta data
- ‘Pivot’ the data so that we have 1 column for dates and 1 column for revenue
- Tack back on some of the metadata – i.e. column names
The first one is easy – use a filter tool to only bring thru rows where the JSON_Name starts with ‘data’. The second task is slightly more tricky. We can use the Cross Tab tool, but we need a way to tell Alteryx what makes a new columns and what makes a new row. If you inspect the JSON_Name column a little more closely, then you’ll see that the items follow a pattern of data.[X].[Y], where [X] is the record number and [Y] id the desired column number. If we can break these items out then we can use them as columns to feed the Cross Tab tool. How can we do this? The Text To Columns tool, using the full stop / period as a delimiter makes short work of this!
OK, now lets feed the output of this into a Cross Tab tool, and use the new column JSON_Name2 (or [X] as we identified it above; the record ID) as the grouping field, JSON_Name3 (or [Y]; the column number) as the Header Field, and JSON_ValueString as the Data Field.
The output is looking close to what we need! A quick Select tool can ditch the now unneeded JSON_Name2 field, and could potentially rename the columns and we’d be done. However, it’d be fantastic if we could reuse this workflow for other Quandl data, including ones where the number of columns, or the names of fields are different from ‘Date’ and ‘Revenue’. Lets look at the metadata again and see if we can make use of that to rename our fields.
There are 2 rows in the metadata section that have a JSON_Name starting with ‘column_names’. Lets isolate them and use the Dynamic Rename tool to fix our column naming issue. You’ll find the Dynamic Rename tool in the Developer section.
Exploring the full set of options available in a Dynamic Rename tool are a little out of scope for this blog post, but if you want a more comprehensive tour of the options, check out this page from Alteryx: http://www.alteryx.com/community/blogs/engine-works/new-50-dynamic-rename-tool
Configure the Dynamic Rename tool so that the actual data stream connects to the L input and the column names meta data connects to the R input and set the properties so that:
- Rename Mode is set to: ‘Take Field Names from Right Input Rows’
- Old Field Name from Column is set to: –Use Positional Rename-
- New Field Name from Column is set to: JSON_ValueString.
Attach a browse, run and our result should be perfect!
OK, that was quite a lot of work for just 7 rows of output, BUT this template can now be used for any JSON API call to Quandl! Check it out, search for something else on Quandl, grab the JSON API address, change the Text Input tool and rerun – Bingo!
Want to be even more whizzy? Lets change this workflow into a Macro. Right click the Text Input tool, and select ‘Convert to Macro Input’, then replace the Browse tool at the end for a Macro Output and Save the workflow as Quandl.yxmc. Now you can use this process over and over again simply by bringing in 1 tool into your workflow!
Stay tuned for another example of connecting Alteryx to an API. If you have any questions or comments, please use the form below or get in touch using info@theinformationlab.co.uk