SQL Brain

This is your Brain on SQL

  • About this Blog

    I use PostgreSQL and love it a lot.

    Whenver I discover or conjeur something neat, I post it here!

  • Subscribe

Passing a variable number of parameters to a function in PostgreSQL

Posted by John on November 29, 2010

I had a situation where I wanted to pass a variable number of Key/Value pairs to an SQL function, which would then insert values into parent/child pages. After a bit of research, I found that Arrays were the ideal choice.

Take a look at this code:

CREATE OR REPLACE FUNCTION insert_stuff ( in_array TEXT[2][] ) RETURNS void AS $$
BEGIN
  -- Attribute records
  FOR i IN array_lower(in_array, 1)..array_upper(in_array, 1) LOOP
    INSERT INTO stuff (key, value)
      VALUES( in_array[i][1], in_array[i][2]);
  END LOOP;

END;
$$ LANGUAGE plpgsql;

The parameter being passed to the function is in_array TEXT[2][], which means an array of 2-element Text arrays. Here’s some samples of calling the function:

select insert_stuff( ARRAY[['firstname','Fred']] );
select insert_stuff( ARRAY[['firstname','Fred'], ['food','Apple']] );
select insert_stuff( ARRAY[['country','Australia'], ['city','Sydney'], ['weather', 'Cloudy'] );

The result is the ability to pass any number of elements into the function. Yippee!

Posted in Functions | 1 Comment »

Using DISTINCT with CASE

Posted by John on September 23, 2010

I had an interesting situation where I wanted to count DISTINCT rows with variable values.

Imagine a table like this:

Customer City VIP
Acme Sydney true
Bally Sydney false
Bally Melbourne true
Costco Melbourne false
David Jones Sydney true
David Jones Melbourne true
Embraer Sydney false
Fastway Sydney true

Now, how would you count the number of distinct customers in each city by VIP status?

I’m looking for this result:

City VIPs Not VIPs
Melbourne 2 1
Sydney 3 2

You can’t just group by City and use COUNT(DISTINCT customer) because it has to be divided by VIP too. That’s when I realised that the DISTINCT keyword looks at the following expression rather than just a column name.

Here’s the code that can be used:

select
  city,
  count(distinct case when     vip then customer else null end) as VIPs,
  count(distinct case when not vip then customer else null end) as Not_VIPs
from table
group by city

The COUNT automatically ignores the null values, so it’s only count the number of distinct cities that are/are not VIPs.

That’s a very different way to use DISTINCT.

Posted in Clever Queries | 1 Comment »

Concatenate rows with a PostgreSQL aggregate function

Posted by John on February 8, 2010

Aggregates are a nifty feature of PostgreSQL that let you write your own functions that operate over a number of rows, similar to MIN, MAX, COUNT and AVERAGE.

Basically, a set of rows (typically generated via GROUP BY) are iterated through the Aggregate function, with a final call when the set of rows is complete.

  • MIN works by keeping track of the smallest value passed in
  • COUNT works by incrementing a counter for each row
  • AVERAGE works like SUM and then divides by the equivalent to COUNT

I had a need to concatenate strings from multiple rows into a single string, which I wanted to delimit by blank lines, commas or whatever. For example, let’s say I had some rows:

NAME
----
John
David
Mary
Sally

I wanted to be able to call smoosh(name, ', ') and get the result: John, David, Mary, Sally

The SMOOSH function below does just that:

-- SMOOSH: Combines strings with supplied paramter
-- eg smoosh('fred', 'joe', ', ') ==> 'fred, joe'
-- The 'strict' paramter automatically handles NULL inputs

CREATE FUNCTION smoosh(text, text, text) RETURNS text
  VOLATILE STRICT
AS $$$
DECLARE
  BEGIN
    IF  character_length($1) > 0 THEN
      RETURN $1 || $3 || $2;
    ELSE
      RETURN $2;
    END IF;
  END;
$$$ LANGUAGE plpgsql;

-- Combine text with supplied delimiter
-- eg smoosh(name, ', ')
CREATE AGGREGATE smoosh(text, text)
(
  SFUNC = smoosh,
  STYPE = text
);

-- TEST HERE
/*
select smoosh(i::text, ', ')
FROM
  (select generate_series(1, 3) as i UNION select null as i UNION select 4 as i) foo
*/

Aggregates take a bit of brain power to interpret, but basically:

  • The Smoosh FUNCTION concatenates two input strings, with a supplied delimiter in the middle
  • The Smoosh AGGREGATE just tells PostgreSQL to call the Smoosh FUNCTION for each row, and that it’s of type text

Neat, eh?!

Posted in Functions | Leave a Comment »

PostgreSQL concat operator doesn’t like NULLs

Posted by John on January 31, 2010

I just found out that the || concatenation operator returns NULL if one of the inputs is NULL.

select '1' || '2'
==> null

Fortunately, Berend Tober added a user comment to the PostgreSQL documentation with a solution… rewrite the || operator!

CREATE OR REPLACE FUNCTION public.textcat_null(text, text)
  RETURNS text AS
$$$
SELECT textcat(COALESCE($1, ''), COALESCE($2, ''));
$$$
  LANGUAGE 'sql' VOLATILE;

CREATE OPERATOR ||(
  PROCEDURE = public.textcat_null,
  LEFTARG = text,
  RIGHTARG = text);

select '1' || '2'
==> 12

Works like a charm, thanks Berend!

Posted in Uncategorized | Tagged: | Leave a Comment »

Comment your SQL queries to track down performance hogs!

Posted by John on December 3, 2009

Here’s a tip…

Start your SQL queries with a comment that uniquely identifies the query or its source. For example, I put lots of SQL queries on wiki pages, so I include the Unique ID of the wiki page:

-- Page 19323
select id from customer
where ...

Then if the SQL Query runs wild, I can check the commands running on my server and the Page ID is visible at the start of the query. I can then track down the bad SQL!

Posted in Performance | Leave a Comment »

Avoid SQL Divide-by-zero errors by using NULLIF

Posted by John on October 6, 2009

I picked up this tip while reading SQL Hacks.

Let’s say you’re receiving occasional divide by zero errors:

select
  count(name) / count(*) as "Average"
from xyz

This will result in a division by error if there are no rows returned.

It can be avoided by using the NULLIF function, which returns a null value if the two passed-in expressions are equal:

select
  count(name) / NULLIF( count(*), 0)
from xyz

So, if count(*) = 0, a null value is returned. The resulting division therefore equates to null. While that isn’t a particularly useful result, it’s better than getting an error!

Posted in Clever Queries | Leave a Comment »

Outer Joins are faster than INs

Posted by John on August 21, 2009

I had a situation where I had to count Trial Licenses based on whether the person is an existing Customer.

Think of it as two tables:

  • license table
  • customer table

For each license, I had to lookup the customer table, with counts of how many matched and didn’t match.

As a first pass, I did something like this:

select
  to_char(created, 'YYYY-MM') as month,
  sum(case when owner in (select owner from customer) then 1 else 0 end) as "customer count",
  sum(case when owner in (select owner from customer) then 0 else 1 end) as "non-customer count",
  count(*) as "total"
from license

Performance was bad because of the repeated calls to the customer table. I could have split it into two queries so that the sub-select went into the WHERE, but I’d have to stitch the results back together since I wanted a count of both customers and non-customers.

Then I tried it via a LEFT OUTER JOIN, something like this:

select
  to_char(created, 'YYYY-MM') as month,
  count(customers) as "customer count",
  count(*) - count(customers) as "non-customer count"
  count(*) as "total"
from license LEFT OUTER JOIN customer using owner

Performance improved dramatically! A JOIN works so much faster than sub-selects, at least in PostgreSQL.

Posted in Joins, Performance | Leave a Comment »

Using logaritms for grouping large numbers

Posted by John on August 20, 2009

I had a bit of fun today. I had to show a summary of Revenue from some companies that I had downloaded from Jigsaw. (Jigsaw is a community site for gathering and sharing Company and Contact information. They’ve got a free API for accessing company information, too!)

The revenue data was fairly random, like this:

5000000
891900000
5000000
150000000
500000
30000000
5000000
1000000000
5000000

Even grouping the data wasn’t much better:

revenue       count  
------------  -----  
5000000       2543   
500000        1636   
30000000      1128   
75000000      432    
150000000     339    
1000000000    338    
350000000     252    
750000000     156    
1200000000    11     
1100000000    10     
1300000000    9      
103630000000  9      
2100000000    4
etc.

So, what would be the best way to present this information?

There’s too many possible values to group into units of $million. But perhaps grouping into orders of magnitude would work:

select
  to_char(power(10, ceiling(log(revenue / 1000000))), 'FM$999,999,999"m"') as revenue,
  count(*)
from jigsaw
where
  revenue > 1000000
group by ceiling(log(revenue / 1000000))
order by ceiling(log(revenue / 1000000))

This results in:

revenue      count  
-----------  -----  
$1m          1      
$10m         2552   
$100m        1642   
$1,000m      1287   
$10,000m     234    
$100,000m    184    
$1,000,000m  24     

Much nicer!

Posted in Clever Queries | Leave a Comment »