SQL GROUP BY
The GROUP BY statement in SQL is used to arrange identical data into groups. It is typically used with aggregate functions (like COUNT, SUM, AVG, MAX, MIN) to perform calculations on each group of data.
Key Points
Grouping Data:
- Combines rows with the same values in the specified columns into a single group.
Aggregate Functions:
- Performs calculations for each group, such as finding totals, averages, or counts.
Order of Execution:
GROUP BYis executed afterWHEREand beforeORDER BY.
Columns in SELECT:
- Any column in the
SELECTclause that is not part of an aggregate function must be in theGROUP BYclause.
- Any column in the
Syntax
column_name: Column(s) to group by.aggregate_function: Function applied to each group (e.g.,SUM,COUNT).
Examples
1. Basic Example: GROUP BY Single Column
- Groups rows by department and counts the number of employees in each department.
2. GROUP BY Multiple Columns
- Groups rows by both department and job title, and counts the number of employees in each group.
3. GROUP BY with WHERE Clause
- Filters rows where
Salary > 30000before grouping them by department and calculating the average salary.
4. GROUP BY with ORDER BY
- Groups rows by department calculates the total salary for each department, and sorts the results in descending order of total salary.
Use Cases
Summarizing Data:
- Calculate totals, averages, and other aggregates for specific groups.
Data Segmentation:
- Analyze subsets of data grouped by categories like departments, regions, or periods.
Counting Occurrences:
- Determine the frequency of specific values, e.g., the number of employees in each role.
GROUP BY with HAVING Clause
The HAVING clause is used to filter grouped data after the GROUP BY operation. It is similar to WHERE, but works on grouped data.
Example: GROUP BY with HAVING
- Group rows by department count employees in each group, and filter to include only departments with more than 5 employees.
Common Errors
Non-Aggregated Columns in SELECT:
- The error occurs when a column in the
SELECTclause is not part ofGROUP BYor an aggregate function.
- The error occurs when a column in the
Incorrect Placement of HAVING:
HAVINGmust come afterGROUP BY.
Advanced Examples
1. GROUP BY with Aliased Columns
- Groups rows by the year extracted from
OrderDateand counts orders per year.
2. GROUP BY with JOIN
- Joins
EmployeesandDepartments, groups data by department name, and counts employees in each department.
Best Practices
Include All Non-Aggregate Columns:
- Ensure all columns in the
SELECTclause are either aggregated or included inGROUP BY.
- Ensure all columns in the
**Avoid Using SELECT ***:
- Specify the columns explicitly to ensure clarity and proper grouping.
Use Descriptive Aliases:
- Name aggregated columns meaningfully using
ASfor better readability.
- Name aggregated columns meaningfully using
Conclusion
The GROUP BY statement is a powerful tool for summarizing and analyzing grouped data in SQL. It works seamlessly with aggregate functions and provides valuable insights into datasets. Understanding how to combine it with HAVING, JOIN, and ORDER BY unlocks its full potential.

