Learn SQL the Effective Way with Data School New York

2 September 2022

By way of introduction, I am Will, this year’s Tableau Iron Viz Champion and a recent addition to the core consultancy team at The Information Lab. I’m a firm believer that good data preparation is key to creating effective data visualisations (something I discussed in my recent Tableau Prep User Group talk - which you can find here if you wish to dig a little deeper). And so, when I was recently given the opportunity to visit the Data School New York to train a cohort in SQL I leapt at the chance to pass on my SQL knowledge!

Throughout my career, data manipulation with SQL is one of the strongest skills I have developed. Projects such as working with the BBC Archives holding metadata, WorldRemit’s CRM database, and answering users' questions on Stack Overflow have all helped to further my understanding and strengthen my skills in this area.

Over the last 8 years every data analyst job I’ve had or hired for required SQL, and despite all the ETL tools available on the market, data analysts will still be asked for SQL proficiency. A sample of Glassdoor job ads for New York Data Analysts showed half wanted SQL skills, and, of those who asked for Tableau, 2 out 3 also asked for SQL.

For the New York Data Schoolers, I felt it was important that the team understood the core concepts, but were also able to delve straight into writing SQL queries and battling some of the real-world problems I had faced so that they were well-prepared for future client projects.

Getting started with SQL

When we learn SQL we all start with:

SELECT *FROM orders;

This will return us all records from the orders table (* = all columns).

This works well in the classroom to display a few thousand rows, but in any organisation, this query could return millions of records:

This wastes time, returning the query and contributing queue times for other queries to execute. As a result, this inefficiency is best avoided.
In the pay-per-query platforms (like Snowflake or Amazon Redshift) this is very expensive!
We’re never going to check every single record out of 1+ million rows anyway!

Solution? Add a limit!

SELECT *FROM ordersLIMIT 100;

Adding a limit here gives a good snapshot of the table data (just 100 rows), quickly and cost-effectively. While you are exploring the data with SQL be sure to add a limit.

Exploring data with SQL

When checking for data quality issues, such as typos in product names, a query such as this could be useful:

SELECT DISTINCT product_nameFROM ordersORDER BY product_name ASCLIMIT 100;

We can’t always assume our data is stored in a consistent format, but we can check this using SELECT DISTINCT to return us the unique product names. using ORDER BY allows us to sort the data alphabetically (ASC = ascending) to help us spot any typos or products with different cases.

We can use filters such as the WHERE clause to help transform the data to answer the question:

SELECT *FROM ordersWHERE city = ‘New York’LIMIT 100;

Aggregating the data allows us to reveal insights

SELECTproduct_name,SUM(sales) as ‘sales’FROM ordersWHERE city = ‘New York’GROUP BY product_name;

This gives us the total sales for each product we sold in New York (N.B. we drop LIMIT here).

Combining datasets with SQL

It’s rare in business that you’ll only work with one table, so after getting acquainted with querying the team dived into JOINs.

SELECTO.product_name,R.return_date,COUNT(O.product_name) as ‘total_returned’FROM orders as OINNER JOIN returns as R on O.order_id=R.order_idWHERE O.city = ‘New York’GROUP BYO.product_name,R.return_date;

This returns a count of returned products by day in New York.

Crucially for the team, they needed to understand the operations SQL performs, which was best shown when I asked them to explain the different outputs of the following UNION queries:

SELECT order_id FROM ordersUNIONSELECT order_id FROM returns

Vs.

SELECT order_id FROM ordersUNION ALLSELECT order_id FROM returns

The key distinction here is this: UNION returned a DISTINCT list of order_ids from the orders & returns tables, whilst UNION ALL returned a long list of all order_ids from both tables containing duplicates.

Putting it all into practice

With this knowledge acquired the New York Data School became SQL super sleuths and were tasked with solving a murder. The team were left to tackle The SQL Murder Mystery.

As their coach, it was great to see how the team approached this task. The class were very collaborative, helping each other out where there were struggles but also independent enough to go it alone, with some exploring their own ideas and then feeding that knowledge back to the cohort.

Starting to code is not an easy task and I was impressed with how the New York Data Schoolers took to the challenge in a professional and collaborative way. The skills they displayed mean they have a great grounding in SQL for their future placements and consulting projects.

My tips for getting started with SQL

Learning to code is challenging, SQL is a more friendly coding language than most but it still takes time to become a fluent SQL user. Here are a few tips from experience that helped me progress my skills with SQL.

1. Know the jargon

Understanding what the different SQL operations are doing is invaluable when planning your data transformation, debugging (finding issues) in your queries, and effective Google searching. e.g. understanding the difference between UNION and UNION ALL is very important for building an efficient SQL script.

2. Stack Overflow

Stack Overflow(https://stackoverflow.com/) is a coding community where users can post their coding issues and other users will help solve them. It is a great problem-solving resource to:

Check your errors.
Work out how to perform a data transformation.
Correct use of a technique.

3. Practice regularly

Schedule time to practice SQL on a regular basis, I’ve recommended that our New York start tackling SQL problems on CodeWars. Even spending 10 minutes a day to try and solve a problem can help to embed the skills you have learned.

Author:

Will Sutton

View More Posts