
For an instructor lead, in-depth look at learning SQL click below.
If you are dealing with a relational database and you want to combine data from more than one table, SQL’s JOIN command is your best buddy. Understanding how and when to use this robust and flexible command is a must-have skill for any data analyst, database admin, or data scientist. This guide will teach you the fundamentals of SQL Joins using Databricks, a unified analytics platform.
What is a SQL Join?
In SQL, a Join operation is used to combine rows from two or more tables, based on a related column between them. The most common types of SQL Joins are: Inner Join, Left Join, Right Join and Full Outer Join.
Setting Up Your Databricks Environment
Before starting to play with SQL Joins, confirm you have created and setup your Databricks environment appropriately. Let us assume we have two tables – Customers and Orders.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE TABLE Customers( ID INT PRIMARY KEY NOT NULL, NAME TEXT NOT NULL, AGE INT NOT NULL, ADDRESS CHAR(50) ); CREATE TABLE Orders( OID INT PRIMARY KEY NOT NULL, DATE DATE NOT NULL, CUSTOMER_ID INT REFERENCES Customers(ID), AMOUNT INT NOT NULL ); |
Inner Join
The INNER JOIN keyword selects records that have matching values in both tables. Let’s see how we can join the Customers and Orders tables over the matching customer IDs.
1 2 3 4 5 6 |
SELECT Customers.ID, Customers.NAME, Orders.AMOUNT FROM Customers INNER JOIN Orders ON Customers.ID = Orders.CUSTOMER_ID; |
Left Join (or Left Outer Join)
The LEFT JOIN keyword returns all records from the left table (Customers), and the matched records from the right table (Orders). If there is no match, the result is NULL from the right side.
1 2 3 4 5 6 |
SELECT Customers.ID, Customers.NAME, Orders.AMOUNT FROM Customers LEFT JOIN Orders ON Customers.ID = Orders.CUSTOMER_ID; |
Right Join (or Right Outer Join)
Similar to a Left Join, a RIGHT JOIN returns all records from the right table, and the matched records from the left table. If there is no match, the result is NULL from the left side.
1 2 3 4 5 6 |
SELECT Customers.ID, Customers.NAME, Orders.AMOUNT FROM Customers RIGHT JOIN Orders ON Customers.ID = Orders.CUSTOMER_ID; |
Full Outer Join
A FULL OUTER JOIN keyword returns all records when there is a match in either the left (Customers) or the right (Orders) table records.
1 2 3 4 5 6 |
SELECT Customers.ID, Customers.NAME, Orders.AMOUNT FROM Customers FULL OUTER JOIN Orders ON Customers.ID = Orders.CUSTOMER_ID; |
Conclusion
Mastering joins in SQL can seem complex, especially if you are dealing with multiple tables and large data sets, but it doesn’t have to be. With practice and patience, you’ll soon feel comfortable combining data in a variety of ways. Remember, the power of a data analyst lies in their ability to extract meaningful insights from data, and SQL Joins in Databricks is a critical tool in your arsenal.