
For an instructor lead, in-depth look at learning SQL click below.
Duplicates can appear in your SQL results for various reasons. Your data set might contain identical values, or your JOIN statements might produce repeated records. Or perhaps your subquery is returning multiple instances of the same data point. Whatever the case might be, duplicates can distort your analytics and occupy unnecessary memory. Leveraging the DISTINCT keyword in SQL can help avoid these complications.
What is the DISTINCT Keyword?
The DISTINCT keyword is used in SQL SELECT statements to prevent duplicate rows from appearing in the query results. This is especially helpful when querying data across multiple tables where records could potentially be duplicated.
Basic usage of DISTINCT
Typically, you would use the DISTINCT keyword in the SELECT clause of your SQL query, immediately followed by the column name(s) you wish to filter for unique values.
|
1 2 3 4 |
SELECT DISTINCT column_name1, column_name2,... FROM table_name; |
Example of Using DISTINCT
Assume we have a “Students” table with a “Course” column that contains numerous cases of the same course being listed. To get a list of all unique courses, we use the DISTINCT keyword as follows:
|
1 2 3 4 |
SELECT DISTINCT Course FROM Students; |
This SQL statement will return all unique “Course” values from the “Students” table, thus eliminating any duplicate entries.
DISTINCT for Multiple Columns
If you want to select unique combinations across two columns, you can certainly do so with DISTINCT. In this case, the DISTINCT keyword operates on the entire tuple – if the combined values are unique, that record will appear in our query result.
|
1 2 3 4 |
SELECT DISTINCT column1, column2 FROM table_name; |
Even if column1 and column2 have duplicate values individually, as long as their combination is unique, the row will be selected in the result set.
Conclusion
The DISTINCT keyword is a powerful tool in SQL that can be used to eliminate duplicates and keep our data clean and concise. Whether you’re dealing with a small database or handling voluminous amounts of data across multiple tables, understanding how and when to use the DISTINCT keyword is crucial for accurate data analytics.
