🗄️ SQL Q34 / 127

What is DISTINCT keyword?

AI-Powered Answer ✓ Answered

The `DISTINCT` keyword in SQL is used to eliminate duplicate rows from the result set of a `SELECT` statement. When applied to one or more columns, it ensures that each row in the output is unique based on the values in the specified columns.

What is DISTINCT?

The DISTINCT keyword is a fundamental clause used with the SELECT statement in SQL. Its primary purpose is to return only unique values in the specified columns from a table. If a column contains multiple identical entries, DISTINCT will ensure that only one instance of that entry is included in the final result set.

Syntax

The basic syntax for using the DISTINCT keyword is straightforward:

sql
SELECT DISTINCT column1, column2, ...
FROM table_name;

You place DISTINCT immediately after SELECT and before the column names you wish to retrieve unique values from.

How DISTINCT Works

When you use DISTINCT, the database system processes the data in the following way:

  • It first retrieves all rows that match the WHERE clause (if any).
  • Then, it evaluates the expressions in the SELECT list (e.g., column1, column2).
  • Finally, it compares the resulting rows and removes any that are exact duplicates across all specified DISTINCT columns, presenting only one occurrence of each unique combination.

DISTINCT with Multiple Columns

When you specify multiple columns with DISTINCT, the keyword considers the *combination* of values across all selected columns to determine uniqueness. A row is considered a duplicate only if *all* specified columns have identical values as another row.

sql
SELECT DISTINCT department_id, city
FROM employees;

In this example, the query will return unique pairs of department_id and city. If two employees are in department_id = 10 and city = 'New York', only one such pair will appear in the result set.

DISTINCT vs. GROUP BY

While DISTINCT is used for removing duplicate rows, GROUP BY is used to group rows that have the same values in specified columns into summary rows. Both can achieve similar results in simple cases (e.g., getting unique values from a single column), but GROUP BY is more powerful for aggregation (e.g., COUNT, SUM, AVG) on grouped data. DISTINCT primarily focuses on presenting unique row combinations without necessarily performing aggregations.

Use Cases

  • Retrieving a list of all unique categories or types from a product table.
  • Finding all unique customer IDs that have placed orders.
  • Identifying all distinct cities where employees are located.
  • Getting a list of unique values in a lookup table.

Important Considerations

  • Performance: Using DISTINCT can impact performance, especially on large tables, as it requires sorting the result set to identify and remove duplicates.
  • NULL Values: In SQL, NULL values are treated as equal for the purpose of DISTINCT. If you have multiple NULL entries in a DISTINCT column, only one NULL will be returned.
  • All Columns: If you use SELECT DISTINCT *, it will return unique rows based on the combination of values in *all* columns of the table.