Recently, a colleague asked me a question about a PostgreSQL construction for which they found that it would silently ignore part of SQL, and whether it's truly doing that.
This is a mini setup that shows the issue:
create table test( id int primary key, f1 text );
insert into test select id, chr(ascii('a')+id) from generate_series(25,0,-1) id;;
This creates a table 'test' with two fields, and fills the fields with a number and a character from 'z' to 'a'.
If I execute the following SQL:
select * from test order by f1 limit 1;
I get:
postgres=# select * from test order by f1 limit 1;
id | f1
----+----
0 | a
(1 row)
Here's the issue:
prepare prep(text, int) as select * from test order by $1 limit $2;
And then execute the prepared statement:
postgres=# execute prep('f1',1);
id | f1
----+----
25 | z
(1 row)
These are two different results for seemingly the same statement. I actually didn't need to do anything, they also found the issue:
For the statement, the execution plan is:
postgres=# explain select * from test order by f1 limit 1;
QUERY PLAN
-----------------------------------------------------------------
Limit (cost=1.39..1.39 rows=1 width=6)
-> Sort (cost=1.39..1.46 rows=26 width=6)
Sort Key: f1
-> Seq Scan on test (cost=0.00..1.26 rows=26 width=6)
(4 rows)
- The select performs an unfiltered scan of a field for which is no index, and thus it performs a seq scan.
- Because the seq scan does not hold ordering guarantees that an ascending or descending index can, it has to perform a sort for the order by.
- The limit rowsource then takes one value because of limit 1.
For the prepared statement the execution plan is:
postgres=# explain execute prep('f1',1);
QUERY PLAN
------------------------------------------------------------
Limit (cost=0.00..0.15 rows=3 width=38)
-> Seq Scan on test (cost=0.00..1.26 rows=26 width=38)
(2 rows)
- The select performs an unfiltered scan of a field for which there is no index, and thus it performs a seq scan.
- Limit does take the first value.
- The sort step is missing, and thus the first value that it found from the seq scan is shown, which is 'z', but could be anything, because the seq scan does not guarantee ordering.
I also found some sources on the internet explaining that this is because of the bind parameters, probably the best one being from the PostgreSQL mail list: https://www.postgresql.org/message-id/1421875206968-5834948.post@n5.nabble.com
The essence of that explanation is that for the order by the parameter is used as an identifier, whilst parameters are values, and that therefore this doesn't work. However, what I am puzzled about is why this succeeds in creating a prepared statement, and why it all works, but silently skips the order by.
I hope to hear any opinions or reasons for ignoring it, or from things missing from this small investigation.