PostgreSQL SELECT DISTINCT

PostgreSQL SELECT DISTINCT

 PostgreSQL SELECT DISTINCT



Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT clause to remove duplicate rows from a result set returned by a query.

Introduction to PostgreSQL SELECT DISTINCT clause

The DISTINCT a clause is used in the SELECT statement to remove duplicate rows from a result set. The DISTINCT clause keeps one row for each group of duplicates. The DISTINCTthe clause can be applied to one or more columns in the select list of the SELECT statement.

The following illustrates the syntax of the DISTINCT clause:

SELECT DISTINCT column1 FROM table_name;

In this statement, the values in the column1 the column is used to evaluate the duplicate.

If you specify multiple columns, the DISTINCT the clause will evaluate the duplicate based on the combination of values of these columns.

SELECT DISTINCT column1, column2 FROM table_name;

In this case, the combination of values in both column1 and column2 columns will be used for evaluating the duplicate.

PostgreSQL also provides the DISTINCT ON (expression) to keep the “first” row of each group of duplicates using the following syntax:

SELECT DISTINCT ON (column1) column_alias, column2 FROM table_name ORDER BY column1, column2;

The order of rows returned from the SELECT the statement is unspecified therefore the “first” row of each group of the duplicate is also unspecified.

It is a good practice to always use the ORDER BY clause with the DISTINCT ON(expression) to make the result set predictable.

Notice that the DISTINCT ON the expression must match the leftmost expression in the ORDER BY clause.

PostgreSQL SELECT DISTINCT examples

Let’s create a new table called distinct_demo and insert data into it for practicing the DISTINCT clause.

Note that you will learn how to create a table and insert data into a table in the subsequent tutorial. In this tutorial, you just execute the statement in psql or pgAdmin to execute the statements.

First, use the following CREATE TABLE statement to create the distinct_demo the table consists of three columns: idbcolorand fcolor.

CREATE TABLE distinct_demo ( id serial NOT NULL PRIMARY KEY, bcolor VARCHAR, fcolor VARCHAR );

Second, insert some rows into the distinct_demo table using the following INSERT statement:

INSERT INTO distinct_demo (bcolor, fcolor) VALUES ('red', 'red'), ('red', 'red'), ('red', NULL), (NULL, 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('green', 'green'), ('blue', 'red'), ('blue', 'green'), ('blue', 'blue');

Third, query the data from the distinct_demo table using the SELECT statement:

SELECT id, bcolor, fcolor FROM distinct_demo ;

PostgreSQL DISTINCT one column example

The following statement selects unique values in the  bcolor column from the t1 table and sorts the result set in alphabetical order by using the ORDER BY clause.

SELECT DISTINCT bcolor FROM distinct_demo ORDER BY bcolor;

PostgreSQL DISTINCT multiple columns

The following statement demonstrates how to use the DISTINCT the clause on multiple columns:

SELECT DISTINCT bcolor, fcolor FROM distinct_demo ORDER BY bcolor, fcolor;

Because we specified both bcolor and fcolor columns in the SELECT DISTINCT clause, PostgreSQL combined the values in both bcolor and fcolor columns to evaluate the uniqueness of the rows.

The query returns the unique combination of bcolor and fcolor from the distinct_demo table. Notice that the distinct_demo the table has two rows with red value in both  bcolor and  fcolor columns. When we applied the DISTINCT to both columns, one row was removed from the result set because it is a duplicate.

PostgreSQL DISTINCT ON example

The following statement sorts the result set by the  bcolor and  fcolor, and then for each group of duplicates, it keeps the first row in the returned result set.

SELECT DISTINCT ON (bcolor) bcolor, fcolor FROM distinct_demo ORDER BY bcolor, fcolor;

Here is the output:

In this tutorial, you have learned how to use PostgreSQL SELECT DISTINCT statement to remove duplicate rows returned by a query.

Reactions

Post a Comment

0 Comments

close