
For an instructor lead, in-depth look at learning SQL click below.
Databricks SQL provides a platform for managing and executing SQL workloads in Databricks. With Databricks SQL, one can run simple queries, create visualizations, and build dashboards on structured and semi-structured data.
Getting Started
To run SQL commands in Databricks, we first need to connect to a Databricks cluster.
Here is a sample SQL command to select all data from a hypothetical ‘students’ table:
|
1 2 3 |
SELECT * FROM students; |
Basic SQL Queries
SQL queries allow us to retrieve specific data from our database. Let’s dive into some examples.
1. WHERE Clause:
The WHERE clause is used to filter records based on specific conditions.
|
1 2 3 |
SELECT * FROM students WHERE grade = 'A'; |
This query will return all the rows from the ‘students’ table where the grade is ‘A’.
2. GROUP BY Statement:
The GROUP BY statement groups rows that have the same values in specified columns into aggregated data.
|
1 2 3 |
SELECT grade, COUNT(*) FROM students GROUP BY grade; |
This returns the count of students in each grade.
Fancy SQL
Let’s quickly look into some more advanced techniques such as joins and subqueries:
1. INNER JOIN:
INNER JOIN is used to combine rows from two or more tables, based on a related column between them.
|
1 2 3 |
SELECT students.name, grades.grade FROM students INNER JOIN grades ON students.id = grades.id; |
This will return a result set that includes the student name and their grade, combining the ‘students’ and ‘grades’ tables based on the ‘id’ they have in common.
2. Subquery:
A subquery is a SQL query nested inside a larger query.
|
1 2 3 |
SELECT * FROM students WHERE id IN (SELECT id FROM grades WHERE grade = 'A'); |
This will return all data from the ‘students’ table where their id matches an id in the ‘grades’ table and they have a grade of ‘A’.
Conclusion
These are the fundamentals of using SQL with Databricks. The real power of SQL reveals itself as your queries grow more complex and you start truly leveraging the interactive and analytical power of Databricks.
