What is DISTINCT keyword?
The `DISTINCT` keyword in SQL is used to eliminate duplicate rows from the result set of a `SELECT` statement. When applied to one or more columns, it ensures that each row in the output is unique based on the values in the specified columns.
What is DISTINCT?
The DISTINCT keyword is a fundamental clause used with the SELECT statement in SQL. Its primary purpose is to return only unique values in the specified columns from a table. If a column contains multiple identical entries, DISTINCT will ensure that only one instance of that entry is included in the final result set.
Syntax
The basic syntax for using the DISTINCT keyword is straightforward:
SELECT DISTINCT column1, column2, ...
FROM table_name;
You place DISTINCT immediately after SELECT and before the column names you wish to retrieve unique values from.
How DISTINCT Works
When you use DISTINCT, the database system processes the data in the following way:
- It first retrieves all rows that match the
WHEREclause (if any). - Then, it evaluates the expressions in the
SELECTlist (e.g.,column1,column2). - Finally, it compares the resulting rows and removes any that are exact duplicates across all specified
DISTINCTcolumns, presenting only one occurrence of each unique combination.
DISTINCT with Multiple Columns
When you specify multiple columns with DISTINCT, the keyword considers the *combination* of values across all selected columns to determine uniqueness. A row is considered a duplicate only if *all* specified columns have identical values as another row.
SELECT DISTINCT department_id, city
FROM employees;
In this example, the query will return unique pairs of department_id and city. If two employees are in department_id = 10 and city = 'New York', only one such pair will appear in the result set.
DISTINCT vs. GROUP BY
While DISTINCT is used for removing duplicate rows, GROUP BY is used to group rows that have the same values in specified columns into summary rows. Both can achieve similar results in simple cases (e.g., getting unique values from a single column), but GROUP BY is more powerful for aggregation (e.g., COUNT, SUM, AVG) on grouped data. DISTINCT primarily focuses on presenting unique row combinations without necessarily performing aggregations.
Use Cases
- Retrieving a list of all unique categories or types from a product table.
- Finding all unique customer IDs that have placed orders.
- Identifying all distinct cities where employees are located.
- Getting a list of unique values in a lookup table.
Important Considerations
- Performance: Using
DISTINCTcan impact performance, especially on large tables, as it requires sorting the result set to identify and remove duplicates. - NULL Values: In SQL,
NULLvalues are treated as equal for the purpose ofDISTINCT. If you have multipleNULLentries in aDISTINCTcolumn, only oneNULLwill be returned. - All Columns: If you use
SELECT DISTINCT *, it will return unique rows based on the combination of values in *all* columns of the table.