New Feature Friday was an awesome initiative started internally by our CTO to ensure we are, as a company, across any new tools and features released within our field. Last week on our call we explored Trifacta.
What is Trifacta?
Trifacta advertises itself as an ‘intelligence service which allows anyone to explore, clean and prepare structured and unstructured data for analysis, reporting and machine learning.' In many ways it is a tool similar to Alteryx, and the reason it has landed on our radar is the fact Alteryx confirmed the purchase of Trifecta in January 2022.
They key difference between Trifacta and Alteryx is Trifacta leverages a cloud infrastructure only, there is no desktop version. It is operated within the browser and bears similarity to the cloud designer Alteryx feature.
Trifacta Pricing
In years gone by Trifacta had been a well thought of, free to use tool. However with product evolution and a move to a cloud infrastructure they have changed to a pay-to-use model.
Pricing is categorised by the type of subscription:
- Starter: $80/Month
- Professional: $400/Month
- Enterprise: 'Quoted amount' but estimated at ~$1000 / Month
On top of the subscription price, Trifacta includes personalised provisioning with each user paying $0.60 / vCPU hour.
There is an option for an annual subscription in which the user saves 20% on the monthly price.
Here is a rundown of the differences between editions:
Key takeaways here is a scheduling ability is afforded to Professional upwards and Enterprise offers API connectivity.
A free 30 day trial is available, download it and give it a try!
Giving Trifacta a go
As a group we decided to create a standard ‘workflow’ in which we would ingest data into the tool, carry out data cleansing and transformations before outputting to a database. This is a classic use case for a tool like Trifacta and we wanted to consider the similarities or differences between Trifacta and its now parent company, Alteryx.
Connecting to data
This is done in a separate window to the rest of the flow. Data can be imported from the file browser, AWS/Google or a series of different data connections. At first glance it looks as if there are a multitude of different platforms to connect to, from FreshDesk, Slack and lots more - it all looks very exciting. However, it quickly becomes apparent that the alot of the connections listed have a button next to them labelled I’m Interested. What initially looks like a vast list is in truth actually much shorter with only around 10 different connections possible at present. Also lots of the sources are Import Only meaning you can import data from Google Sheets for example, but then would have no facility to output back into Google sheets. Custom connectors were also not seemingly available.
It was noted that with the recent acquisition by Alteryx, there is chance these data connections might start to come to fruition through knowledge sharing. It is an ambitious list of connections that is present and if these can be developed into robust connections, it would provide great scope to their users.
Recipes
In the world of Trifacta, the series of steps to create the data transformation is called a recipe. This way of organising the flow is more like Tableau Prep. Within each step you can create a list of different functions, like splitting, renaming or changing the data type.
When a new recipe is added, Trifacta gives a suggestions pane with common jobs such as filters (where clause) or the creation of sets/groups (If statement). When clicking a suggestion Trifacta presents a nice piece of UI in which affected rows are highlighted to give a visual representation of the outcome of the suggested step.
Filtering the data-set and creating these subsets was relatively simple and intuitive. We then moved onto creating a join to another data-set. When adding a join step, a pane on the right appears and Trifacta offers other sources to connect to. These would be other imported data sources, or other sheets within the Excel file we had uploaded. We then get suggestions on the fields to join on and a nice preview pane to show the outcome of the join. After the join is complete there is the functionality to keep or remove fields, reminiscent of the select tool in Alteryx. Unfortunately renaming fields is not possible and would need the creation of a new step.
Other features noted whilst creating the recipe was a step can be added by right-clicking a column and selecting an option such as rename or convert to uppercase. Steps can also be dragged to change their order - very Tableau Prep like!
Outputting the data
The default option here is a CSV file but options to output to databases such as Snowflake, SQL and a few others were available. We set-up a connection to our SQL server in a new window relatively painlessly. When adding a connection any field names that are invalid (no ‘/’ allowed!) create an error. This is a little clunky as we had to go back to another window and fix the invalid field names before setting up the connection again. There is also an option here to change the running environment between Spark and Photon. One being more suitable for smaller data-sets (I forget which one exactly!!). It was not immediately clear to us what the implication meant for CPU / hour.
Recipe configuration
Scheduling appeared to be a slick part of Trifacta. It seemed there was an option to send a message on Slack or send emails to notify the flow being complete. It looked to us like the flow can be triggered by database updates or even Slack notifications. The general integration with other apps seems to be a really exciting feature for Trifacta.
Collaboration with other users looks like a real plus for Trifacta. We set-up a second user and shared our flow. An email was received instantly with the option to open it within Trifacta.
There is a nice view that shows the storage and the compute time used by the flow.
Other thoughts
- Option to sample the data seemed flawed with a non representative sample being produced.
- When adding more to the sample a message pops up warning it may crash the browser.
- Option to filter by boolean was strange, it was not clear if we need to type in the word True/False or 1/0. Would like some more instruction to the user.
- Macros are possible!
- There seems to be a macro gallery which requires a separate log-in
- A macro was available that dynamically changes field headers which would have helped us with our earlier issue of invalid field header names.
- Suggested recipes seem strong
- Before creating a recipe, Trifacta suggests common recipes which essentially gives us a pre-made set of steps that could be re-used and save the user time in creating the recipe themselves.
What will the acquisition mean for Alteryx?
At the end the call we held a hypothetical discussion around how Alteryx may change since this acquisition:
- Alteryx cloud improvement - Moving to a cloud infrastructure seems to be a logical step for applications. There is no doubt the Alteryx Cloud Designer needs to see some improvement. This acquisition should allow knowledge sharing between the companies and we hope to see significant development to the Cloud Designer in the near future.
- More connectors - Trifacta seem to have great ambition to allow a host of data connections. We would hope to see the same connections available across both platforms.
- Integration with other platforms - This is a real strength of Trifacta and would be great to see this emulated on the Alteryx platform.
- Scheduling improvements - one of the most exciting features we saw was the schedule triggers that Trifacta appears to offer. It would be fantastic to see scheduling developing more in the Alteryx world. At the moment schedules are limited to a recurring schedule, there isn’t an option for a workflow to be triggered by an event - something we would hope to see in the wake of the Trifacta acquisition.