SQL GROUP BY

SQL GROUP BY

 

SQL GROUP BY



Summary: in this tutorial, you will learn how to use the SQL GROUP BY clause to group rows based on one or more columns.

Introduction to SQL GROUP BY clause

Grouping is one of the most important tasks that you have to deal with while working with databases. To group rows into groups, you use the GROUP BY clause.

The GROUP BY clause is an optional clause of the SELECT statement that combines rows into groups based on matching values in specified columns. One row is returned for each group.

You often use the GROUP BY in conjunction with an aggregate function such as MIN, MAX, AVG, SUM, or COUNT to calculate a measure that provides the information for each group.

The following illustrates the syntax of the GROUP BY clause.

SELECT column1, column2, AGGREGATE_FUNCTION (column3) FROM table1 GROUP BY column1, column2;

It is not mandatory to include an aggregate function in the SELECT clause. However, if you use an aggregate function, it will calculate the summary value for each group.

If you want to filter the rows before grouping, you add a WHERE clause. However, to filter groups, you use the HAVING clause.

It is important to emphasize that the WHERE clause is applied before rows are grouped whereas the HAVING clause is applied after rows are grouped. In other words, the WHERE clause is applied to rows whereas the HAVING clause is applied to groups.

To sort the groups, you add the ORDER BY clause after the GROUP BY clause.

The columns that appear in the GROUP BY clause are called grouping columns. If a grouping column contains NULL values, all NULL values are summarized into a single group because the GROUP BY clause considers NULL values are equal.

SQL GROUP BY examples

We will use the employees and departments tables in the sample database to demonstrate how the GROUP BY clause works.

To find the headcount of each department, you group the employees by the department_id column, and apply the COUNT function to each group as the following query:

SELECT department_id, COUNT(employee_id) headcount FROM employees GROUP BY department_id;

SQL GROUP BY with INNER JOIN example

To get the department name, you join the employees table with the departments table as follows:

SELECT e.department_id, department_name, COUNT(employee_id) headcount FROM employees e INNER JOIN departments d ON d.department_id = e.department_id GROUP BY e.department_id;

SQL GROUP BY with ORDER BY example

To sort the departments by headcount, you add an ORDER BY clause as the following statement:

SELECT e.department_id, department_name, COUNT(employee_id) headcount FROM employees e INNER JOIN departments d ON d.department_id = e.department_id GROUP BY e.department_id ORDER BY headcount DESC;

Note that you can use either the headcount alias or the COUNT(employee_id) in the ORDER BY clause.

SQL GROUP BY with HAVING example

To find the department whose headcount is greater than 5, you use the HAVING clause like the following query:

SELECT e.department_id, department_name, COUNT(employee_id) headcount FROM employees e INNER JOIN departments d ON d.department_id = e.department_id GROUP BY e.department_id HAVING headcount > 5 ORDER BY headcount DESC;

SQL GROUP BY with MIN, MAX, and AVG example

The following query returns the minimum, maximum, and average salary of employees in each department.

SELECT e.department_id, department_name, MIN(salary) min_salary, MAX(salary) max_salary, ROUND(AVG(salary), 2) average_salary FROM employees e INNER JOIN departments d ON d.department_id = e.department_id GROUP BY e.department_id;

SQL GROUP BY with SUM function example

To get the total salary per department, you apply the SUM function to the salary column and group employees by the department_id column as follows:

SELECT e.department_id, department_name, SUM(salary) total_salary FROM employees e INNER JOIN departments d ON d.department_id = e.department_id GROUP BY e.department_id;

SQL GROUP BY multiple columns

So far, you have seen that we have grouped all employees by one column. For example, the following clause

GROUP BY department_id

place all rows with the same values in the department_id a column in one group.

How about grouping employees by values in both department_id and job_id columns?

GROUP BY department_id, job_id

This clause will group all employees with the same values in both department_id and job_id columns in one group.

The following statement groups row with the same values in both department_id and job_id columns in the same group then return the rows for each of these groups.

SELECT e.department_id, department_name, e.job_id, job_title, COUNT(employee_id) FROM employees e INNER JOIN departments d ON d.department_id = e.department_id INNER JOIN jobs j ON j.job_id = e.job_id GROUP BY e.department_id , e.job_id;

Department 2, 3, and 5 appear more than one.

This is because these departments have employees who hold different jobs. For example, in the shipping department, there are 2 employees holding the shipping clerk job, 1 employee holding the stock clerk job, and 4 employees holding the stock manager job.

SQL GROUP BY and DISTINCT

If you use the GROUP BY clause without an aggregate function, the GROUP BY clause behaves like the DISTINCT operator.

The following gets the phone numbers of employees and also groups rows by the phone numbers.

SELECT phone_number FROM employees GROUP BY phone_number;

Notice that the phone numbers are sorted.

The following statement also retrieves the phone numbers but instead of using the GROUP BY clause, it uses the DISTINCT operator.

SELECT DISTINCT phone_number FROM employees;

The result set is the same except that the one returned by the DISTINCT the operator is not sorted.

In this tutorial, we have shown you how to use the GROUP BY clause to summarize rows into groups and apply the aggregate function to each group.

Reactions

Post a Comment

0 Comments

close