Book query optimizations #3767

mikiher · 2025-01-01T07:24:37Z

Brief summary

This adds indices to significantly improve database library query time when sorting by size or duration.

Which issue is fixed?

There's no issue. I encountered this while working on PR #3726.

In-depth Description

The following query in libraryItemsBookFilters.js is where most of books fetch time is spent and its performance heavily depends on sortOrder. When fast scrolling on the library page happens, this can lead to choking on the server if a number of page requests are handled concurrently.

    const { rows: books, count } = await Database.bookModel.findAndCountAll({
      where: bookWhere,
      distinct: true,
      attributes: bookAttributes,
      replacements,
      include: [
        {
          model: Database.libraryItemModel,
          required: true,
          where: libraryItemWhere,
          include: libraryItemIncludes
        },
        seriesInclude,
        authorInclude,
        ...bookIncludes
      ],
      order: sortOrder,
      subQuery: false,
      limit: limit || null,
      offset
    })

I fixed a couple of the easy cases by introducing indices on size and duration.
This leads to a significant improvement in the query performance when paging through the library with sortBy size or duration.

I only dealt with size and duration for now because they were the easiest. Other sort orders are either already covered by existing indices, or they're trickier to optimize because of the existing database schema (e.g. sorting by author). Maybe I'll try to improve them in a future PR.

How have you tested this?

I tested by running 35 consecutive page requests, by scrolling down the library page. There was no parallel fetching.

Sorting by size

Current:

Histogram {
  min: 78,
  max: 1048,
  mean: 588.3714285714286,
  exceeds: 0,
  stddev: 305.3454330252669,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 78,
    50 => 549,
    75 => 870,
    87.5 => 983,
    93.75 => 1019,
    96.875 => 1023,
    98.4375 => 1048,
    100 => 1048
  }
}

After adding index on (library_id, media_type, size) on table libraryItems:

Histogram {
  min: 37,
  max: 70,
  mean: 46.82857142857143,
  exceeds: 0,
  stddev: 5.940109254818776,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 37,
    50 => 46,
    75 => 49,
    87.5 => 53,
    93.75 => 55,
    96.875 => 55,
    98.4375 => 70,
    100 => 70
  }
}

Sorting by duration

Current:

histogram: Histogram {
  min: 72,
  max: 258,
  mean: 162.82857142857142,
  exceeds: 0,
  stddev: 46.34497088714818,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 72,
    50 => 154,
    75 => 182,
    87.5 => 238,
    93.75 => 254,
    96.875 => 254,
    98.4375 => 258,
    100 => 258
  }
}

After adding index on duration on table books:

Histogram {
  min: 25,
  max: 61,
  mean: 30.542857142857144,
  exceeds: 0,
  stddev: 6.030128438067621,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 25,
    50 => 29,
    75 => 32,
    87.5 => 34,
    93.75 => 37,
    96.875 => 37,
    98.4375 => 61,
    100 => 61
  }
}

As you can the indices improve performance significantly.

One thing to note here is that even with the added index, the query performance slowly degrades with increasing offset.
This is because the larger the offset, the more records need to be examined to satisfy the query (this is a known issue when working with offsets). There's a technique to deal with this (keyset pagination), but it is a bit more tricky to implement. If you think it's worth the effort, I'll look into it.

advplyr · 2025-01-01T16:10:48Z

How are you generating that histogram data?

I've been working on cleaning out the old data model objects recently and the only remaining big ones are LibraryItem, Book, Podcast and PlaybackSession.
For a while now I've been looking forward to optimizing the API endpoints once we're no longer stuck having to convert everything to the old library item object.
That could potentially be a nice performance improvement also.

Thanks!

mikiher · 2025-01-01T18:06:34Z

perf_hooks.createHistogram

mikiher added 3 commits January 1, 2025 07:34

Add libraryItem size index

754c121

Add index on duration

0444829

Update migrations changelog

46247ec

mikiher marked this pull request as ready for review January 1, 2025 07:26

advplyr merged commit 8c4d0c5 into advplyr:master Jan 1, 2025
5 checks passed

mikiher deleted the book-query-optimizations branch January 5, 2025 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Book query optimizations #3767

Book query optimizations #3767

mikiher commented Jan 1, 2025

advplyr commented Jan 1, 2025

mikiher commented Jan 1, 2025

Book query optimizations #3767

Book query optimizations #3767

Conversation

mikiher commented Jan 1, 2025

Brief summary

Which issue is fixed?

In-depth Description

How have you tested this?

Sorting by size

Sorting by duration

advplyr commented Jan 1, 2025

mikiher commented Jan 1, 2025