I had an interesting situation where I wanted to count DISTINCT rows with variable values.
Imagine a table like this:
| Customer | City | VIP |
|---|---|---|
| Acme | Sydney | true |
| Bally | Sydney | false |
| Bally | Melbourne | true |
| Costco | Melbourne | false |
| David Jones | Sydney | true |
| David Jones | Melbourne | true |
| Embraer | Sydney | false |
| Fastway | Sydney | true |
Now, how would you count the number of distinct customers in each city by VIP status?
I’m looking for this result:
| City | VIPs | Not VIPs |
|---|---|---|
| Melbourne | 2 | 1 |
| Sydney | 3 | 2 |
You can’t just group by City and use COUNT(DISTINCT customer) because it has to be divided by VIP too. That’s when I realised that the DISTINCT keyword looks at the following expression rather than just a column name.
Here’s the code that can be used:
select city, count(distinct case when vip then customer else null end) as VIPs, count(distinct case when not vip then customer else null end) as Not_VIPs from table group by city
The COUNT automatically ignores the null values, so it’s only count the number of distinct cities that are/are not VIPs.
That’s a very different way to use DISTINCT.