6

The author of Bitmap Scan described the difference between Bitmap Heap Scan and Index Scan:

A plain indexscan fetches one tuple-pointer at a time from the index, and immediately visits that tuple in the table. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory "bitmap" data structure, and then visits the table tuples in physical tuple-location order. The bitmap scan improves locality of reference to the table at the cost of more bookkeeping overhead to manage the "bitmap" data structure --- and at the cost that the data is no longer retrieved in index order, which doesn't matter for your query but would matter if you said ORDER BY.

Questions:

  1. Why does it sort the fetched tuple-pointers again, when the index is already sorted?

  2. How does it sort with bitmap? I know what bitmap is, but I don't understand how it can be used for sorting.

  3. Why is it faster than Index Scan when fetching a moderately large percentage of the table? On the contrary, it seems to add quite a bit of computing to the process.

2 Answers 2

6

The backbone of Postgres storage is made up of data pages of 8 kilobytes in a typical installation. Each data page typically holds many tuples. Read the details of physical storage in the manual.

The "bitmap" in a bitmap scan is a way to collect tuple pointers in buckets of data pages. Index sort order is necessarily lost in this process in favor of physical sort order. In "lossy mode" (which only occurs if the result is so huge or workmem so small that even the tiny bitmap won't fit) only block numbers are kept and corresponding tuple indexes discarded.

After that, each data page is only visited once from storage to extract (possibly) multiple tuples and in physical sequence, which also matters for some types of storage. In lossy mode, tuples from each identified page have to be filtered by rechecking the index condition; else tuples can be retrieved directly using collected tuple indexes.

In an index scan, each page may have to be visited multiple times if multiple tuples end up to be stored in the same data page. The actual process is even more sophisticated. Related:

To your questions:

  1. The sorting of the index is lost due to collecting hits and reading them out data page by data page.

  2. Hence, the result has to be sorted again, in an added sort step - if the sort order is required (by ORDER BY, for instance).

  3. The number of data pages that have to be read is the most prominent factor for overall performance. Bitmap index scans reduce that number to a minimum. With faster storage, the benefit of bitmap index scans gets smaller. That's why accurate cost settings are crucial for the query planner to make good decisions.

4
  • 1) What do you mean by "collecting hits"? 2) From the description by the author, it sorts the tuple-pointers regardless of the requirement of sort order(Notice the caveat at the end of the quote). 3) The benefits you mentioned apply to lossy mode. What about lossless mode where individual tuples instead of whole data page are visited?
    – netok
    Apr 12, 2019 at 22:58
  • 1) "Collecting hits" = add ctid (or just block number in "lossy mode") of qualifying index tuples to result bitmap. 2) You seem to be misreading that part of Tom Lane's post. Index sort order is always lost in bitmap index scans. 3) Postgres always has to read whole data pages from storage, that's the atomic unit. It can then address a tuple within the data page directly if the tuple index is also known (not just the block number as in "lossy mode"). The benefits of visiting each data page once is there in any case. I clarified some more. Please follow the added links for more details. Apr 12, 2019 at 23:17
  • Thanks for the link, it is very useful. 1) Can you elaborate how it causes the lost of sorting? 2) If the query doesn't require any order in the result, will there be any sorting operation in bitmap scan? If yes, what is it sorted by?
    – netok
    Apr 13, 2019 at 14:28
  • 1. I think the original quote is already very clear on that and I elaborated some more. 2.If the query does not require any particular sorting, the only "sorting" is by physical data pages, so you get results in arbitrary order - which may or may not still in insert order, to some extent. Apr 13, 2019 at 22:20
0

A bitmap is a data structure used by Postgres to temporarily store the row pointers it's found on an index scan.

If Postgres doesn't use a bitmap, then for every row pointer it finds in the index, it needs to immediately go to fetch the row from the table. This can be slow if rows are in different data pages*. Using a bitmap, Postgres stores the row pointers in memory grouped by data page, and then it can fetch the rows faster going page by page. The downside is that it needs additional memory to maintain the bitmap. In fact, if the index scan returns too many row pointers, creating the exact bitmap might be impractical (too much space). In those situations Postgres can use a "lossy bitmap", that only keeps track of the data pages and not the specific row pointers. At the time of fetching the actual rows using a lossy bitmap, Postgres needs to read the entire data pages and recheck the conditions to find the matching rows.

To clarify the OP questions, bitmaps do not sort row pointers based on the ORDER BY criteria and the original order in which row pointers were found in the index is lost because the bitmap groups them by data page, that's why an extra step for sorting is needed after.


* This statement assumes that reading data that is physically distant takes longer than reading data that is physically closer. This might not apply to all modern storage devices.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.