SQL Distinct

When working with large datasets, it is common to encounter duplicate data within tables. SQL’s SELECT DISTINCT command offers a powerful tool for filtering out duplicate data, returning only unique results. In this article, we’ll explore the concept of DISTINCT and how to use it to filter data.

We’ll assume that we have a table named “customers” that contains duplicate data. The table has five columns: “customer_id”, “first_name”, “last_name”, “email”, and “phone_number”. Here’s a sample of the data in the “customers” table:


customer_id | first_name | last_name | email             | phone_number
------------|------------|-----------|-------------------|-------------
1           | John       | Smith     | johnsmith@abc.com | 555-1234
2           | Jane       | Doe       | janedoe@abc.com   | 555-5678
3           | John       | Smith     | jsmith@abc.com    | 555-1234
4           | Bob        | Johnson   | bobj@abc.com      | 555-9999
5           | Jane       | Doe       | janed@abc.com     | 555-5678

customer_id | first_name | last_name | email | phone_number

------------|------------|-----------|-------------------|-------------

1 | John | Smith | johnsmith@abc.com | 555-1234

2 | Jane | Doe | janedoe@abc.com | 555-5678

3 | John | Smith | jsmith@abc.com | 555-1234

4 | Bob | Johnson | bobj@abc.com | 555-9999

5 | Jane | Doe | janed@abc.com | 555-5678

As you can see, there are duplicates in the “first_name”, “last_name”, “email”, and “phone_number” columns.

Using SELECT DISTINCT to Filter Duplicate Data
To remove duplicate data from a table, we can use the SELECT DISTINCT command. The syntax for using SELECT DISTINCT is as follows:


SELECT DISTINCT column_name
FROM table_name;

SELECT DISTINCT column_name

FROM table_name;

This command will return only the unique values of the specified column from the specified table. If multiple columns are specified, the command will return unique combinations of values for those columns.

To filter out duplicate data from the “customers” table, we can use the following commands:


SELECT DISTINCT first_name
FROM customers;

SELECT DISTINCT last_name
FROM customers;

SELECT DISTINCT email
FROM customers;

SELECT DISTINCT phone_number
FROM customers;

SELECT DISTINCT first_name

FROM customers;

SELECT DISTINCT last_name

FROM customers;

SELECT DISTINCT email

FROM customers;

SELECT DISTINCT phone_number

FROM customers;

Each of these commands will return a list of unique values for the specified column. For example, running the first command would return the following:


first_name
----------
John
Jane
Bob

first_name

----------

John

Jane

Bob

This command returns only the unique values in the “first_name” column, which are “John”, “Jane”, and “Bob”.

SQL SELECT DISTINCT Summary
SQL’s SELECT DISTINCT command is a powerful tool for filtering out duplicate data from a table. By specifying the column name in the SELECT DISTINCT command, we can return only the unique values for that column. This can be particularly useful when working with large datasets or when trying to identify unique values within a column. By using this command, we can streamline our data analysis and gain more accurate insights.

SQL Distinct

SQL LESSONS