
For an instructor lead, in-depth look at learning SQL click below.
Welcome! SQL is a powerful language for manipulating and exploring data, and Databricks is an industry-standard tool that makes this process even easier. In this blog post, we’ll go over a few exercises that will help you get up to speed with using SQL in Databricks.
Exercise 1: Finding Records in a Table
Often, the first task in data exploration is simply finding the right data. We can use a simple SELECT statement to display data from a table. The ‘*’ symbol represents all columns in the table.
1 2 3 |
SELECT * FROM Employees |
This SQL code will display all columns and records from the table named ‘Employees’.
Exercise 2: Filtering Records
Let’s say we only want to find employees who work in the IT department. SQL offers the WHERE clause for this purpose:
1 2 3 |
SELECT * FROM Employees WHERE Department = 'IT' |
This SQL code will only display employees who work in the IT department.
Exercise 3: Summarizing Data
SQL can also summarize data with GROUP BY and COUNT functions. For instance, we can see the number of employees in each department:
1 2 3 |
SELECT Department, COUNT(*) as EmployeeCount FROM Employees GROUP BY Department |
This SQL code groups employees by their department and shows the number of employees in each department.
Exercise 4: Joining Tables
Many times, you’ll need to combine tables to see a complete picture. You can use JOIN command to achieve this, say we want to know the department name from the department table:
1 2 3 4 5 |
SELECT E.EmployeeName, D.DepartmentName FROM Employees E JOIN Departments D ON E.DepartmentID = D.DepartmentID |
This SQL code will join the Employees table and Departments table on the DepartmentID field, and display the employee names with their department names.
Wrap Up
SQL is an incredibly powerful tool for data exploration, and Databricks makes it easy and efficient. With these exercises, you’re well on your way to becoming a pro at SQL for data exploration in Databricks.