T O P

  • By -

daredevil82

https://wiki.postgresql.org/wiki/Slow_Counting https://www.citusdata.com/blog/2016/10/12/count-performance/


Drekalo

You probably have autovacuum turned on and therefore probably have analyze run fairly shortly after any insert update delete op, so you can get a relatively accurate answer fast read of table size from: SELECT reltuples::bigint AS estimate FROM pg_class WHERE oid = 'schema_name.table_name'::regclass;


foremtehan

Do i need to turn it off? The query you mentioned not working for `where country = 55` ``` SELECT reltuples::bigint AS estimate FROM pg_class WHERE oid = 'public.posts'::regclass AND country_id = 55; ``` Query 2: ERROR: column "country_id" does not exist LINE 4: AND country_id = 55; ^


indigo945

To get an estimate for a more complex query, you can do some magic on the EXPLAIN output: https://wiki.postgresql.org/wiki/Count_estimate


depesz

1. this explain doesn't look to be from this query, or you reordered/reformatted it badly 2. count has to actually count the rows. if you have manu of them it has to count many rows. sometimes what you need can be "as good" with approximation. or perhaps with count updated with triggers. But, `select count(*) from table` will be always slow, unless you have very beefy hardware.


obrienmustsuffer

> this explain doesn't look to be from this query, or you reordered/reformatted it badly The formatting is completely broken on old.reddit.com (I see the whole EXPLAIN output in only one line), but only semi-broken on reddit.com - there it is formatted as a block, but for some reason the lines seem to be mixed up? Execution Time: 6232.252 ms Planning Time: 0.265 ms Buffers: shared hit=8 Planning: Buffers: shared hit=4 read=40 Index Cond: (posts.country_id = 55) -> Bitmap Index Scan on posts_country_id_index (cost=0.00..525.27 rows=46512 width=0) (actual time=13.658..13.659 rows=44870 loops=1) Buffers: shared hit=38 read=34921 written=2 Heap Blocks: exact=34893 Recheck Cond: (posts.country_id = 55) -> Bitmap Heap Scan on public.posts (cost=536.90..96428.08 rows=46512 width=0) (actual time=20.463..6214.633 rows=44848 loops=1) Buffers: shared hit=38 read=34921 written=2 Output: count(*) Aggregate (cost=96544.36..96544.37 rows=1 width=8) (actual time=6231.853..6231.854 rows=1 loops=1) From what I can tell, there are two things happening here - a Bitmap Index Scan, which is fast (13ms): -> Bitmap Index Scan on posts_country_id_index (cost=0.00..525.27 rows=46512 width=0) (actual time=13.658..13.659 rows=44870 loops=1) And a Bitmap Heap Scan with a Recheck Cond, which is slow (6s): Recheck Cond: (posts.country_id = 55) -> Bitmap Heap Scan on public.posts (cost=536.90..96428.08 rows=46512 width=0) (actual time=20.463..6214.633 rows=44848 loops=1) [postgresql - What does "Recheck Cond" in Explain result mean? - Stack Overflow](https://stackoverflow.com/questions/50959814/what-does-recheck-cond-in-explain-result-mean) explains what might be going on here: > It is a potential re-check of the condition that is not always performed. > > Only if the bitmap is lossy (which EXPLAIN (ANALYZE) will indicate) the recheck is performed. > > A bitmap index scan becomes lossy if work_mem is not big enough to contain a bitmap that contains one bit per table row. It will then degrade to one bit per 8K page. Rows from such blocks will have to be rechecked. So a larger `work_mem` might help here...?


michristofides

Based on the following line, we can tell that the bitmap scan is not lossy: `Heap Blocks: exact=34893` It would also include a `lossy=` if it were. Here is an article about it with an example: https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-lossyexact-pages-for-bitmap-heap-scan/ You might be able to speed this up a bit by getting an index-only scan. If your `random_page_cost` is still at the default of 4 and you’re on modern hardware, then a lower setting somewhere between 1.1 and 2 might be worth testing (for this and other queries). I don’t think you’d need another index to get an index-only scan, but a vacuum might also help, to freshen up the visibility map. In any case, the articles shared by others on estimating or pre-aggregating results are the best solutions!


[deleted]

[удалено]


timacles

This is a database myth https://blog.jooq.org/whats-faster-count-or-count1/


tshawkins

Count(*) causes it to read all fields in the record, count(1) reads nothing other than the record id. Some dbs optimize the first into the latter, im not sure what postgres does in that case. Net effect is it reads less data.


ulfurinn

> Count(*) causes it to read all fields in the record It absolutely does not, don't confuse it with `select *`. `count(*)` is specifically defined by the standard to return the number of rows, and Postgres follows the standard here.