DISTINCT: Understanding DISTINCT Keyword in SQL: Eliminating Duplicate Results

Learn SQL with Udemy

For an instructor lead, in-depth look at learning SQL click below.


Duplicates can appear in your SQL results for various reasons. Your data set might contain identical values, or your JOIN statements might produce repeated records. Or perhaps your subquery is returning multiple instances of the same data point. Whatever the case might be, duplicates can distort your analytics and occupy unnecessary memory. Leveraging the DISTINCT keyword in SQL can help avoid these complications.

What is the DISTINCT Keyword?

The DISTINCT keyword is used in SQL SELECT statements to prevent duplicate rows from appearing in the query results. This is especially helpful when querying data across multiple tables where records could potentially be duplicated.

Basic usage of DISTINCT

Typically, you would use the DISTINCT keyword in the SELECT clause of your SQL query, immediately followed by the column name(s) you wish to filter for unique values.

Example of Using DISTINCT

Assume we have a “Students” table with a “Course” column that contains numerous cases of the same course being listed. To get a list of all unique courses, we use the DISTINCT keyword as follows:

This SQL statement will return all unique “Course” values from the “Students” table, thus eliminating any duplicate entries.

DISTINCT for Multiple Columns

If you want to select unique combinations across two columns, you can certainly do so with DISTINCT. In this case, the DISTINCT keyword operates on the entire tuple – if the combined values are unique, that record will appear in our query result.

Even if column1 and column2 have duplicate values individually, as long as their combination is unique, the row will be selected in the result set.

Conclusion

The DISTINCT keyword is a powerful tool in SQL that can be used to eliminate duplicates and keep our data clean and concise. Whether you’re dealing with a small database or handling voluminous amounts of data across multiple tables, understanding how and when to use the DISTINCT keyword is crucial for accurate data analytics.

Leave a Comment