SQL DISTINCT

SQL DISTINCT

SQL DISTINCT



 Summary: in this tutorial, you will learn how to use the SQL DISTINCT operator to remove duplicates from a result set

Introduction to SQL DISTINCT operator

The primary key ensures that the table has no duplicate rows. However, when you use the SELECT statement to query a portion of the columns in a table, you may get duplicates.

To remove duplicates from a result set, you use the DISTINCT operator in the SELECT clause as follows:

SELECT DISTINCT column1, column2, ... FROM table1;

If you use one column after the DISTINCT operator, the database system uses that column to evaluate duplicates. In case you use two or more columns, the database system will use the combination of values in these columns for the duplication check.

To remove the duplicates, the database system first sorts the result set by every column specified in the SELECT clause. It then scans the table from top to bottom to identify the duplicates that are next to each other. In case the result set is large, the sorting and scanning operations may reduce the performance of the query.

SQL DISTINCT examples

We will use the employees the table in the sample database to demonstrate how the DISTINCT operator works.

1) Using SQL DISTINCT on one column example

The following statement retrieves the salary data from the  employees table and sorts it in descending order.

SELECT salary FROM employees ORDER BY salary DESC;

As you see we have duplicate salary data e.g., 17,000 because two or more employees have the same salary. To remove the duplicate, you add the DISTINCT operator to the SELECT clause as follows:

SELECT DISTINCT salary FROM employees ORDER BY salary DESC;

Now all duplicates are removed from the result set.

2) Using SQL DISTINCT on multiple columns example

The following statement retrieves the job id and salary from the employees table.

SELECT job_id, salary FROM employees ORDER BY job_id, salary DESC;

Now if you add the DISTINCT operator to the SELECT clause, the database system uses values in both job_id and salary columns to evaluate duplicates. It keeps only one of the highlight rows as shown in the screenshot above.

SELECT DISTINCT job_id, salary FROM employees ORDER BY job_id, salary DESC;

SQL DISTINCT and NULL values

In the database world, NULL is special. NULL values are used as markers to indicate that the information is missing or not applicable.

For this reason, NULL cannot be compared to any value. Even NULL is not equal to itself. If you have two or more NULL values in a column, does the database system consider them as the same or distinct values?

Typically, the DISTINCT operator treats all NULL values the same. As a result, the DISTINCT operator keeps only one NULL value and removes the other from the result set.

For example, the following statement returns the distinct phone numbers of employees.

SELECT DISTINCT phone_number FROM employees;

Notice that it returned only one NULL value.

In this tutorial, you have learned how to use the SQL DISTINCT operator to remove duplicate rows from a result set.

Reactions

Post a Comment

0 Comments

close