Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Book query optimizations #3767

Merged
merged 3 commits into from
Jan 1, 2025
Merged

Conversation

mikiher
Copy link
Contributor

@mikiher mikiher commented Jan 1, 2025

Brief summary

This adds indices to significantly improve database library query time when sorting by size or duration.

Which issue is fixed?

There's no issue. I encountered this while working on PR #3726.

In-depth Description

The following query in libraryItemsBookFilters.js is where most of books fetch time is spent and its performance heavily depends on sortOrder. When fast scrolling on the library page happens, this can lead to choking on the server if a number of page requests are handled concurrently.

    const { rows: books, count } = await Database.bookModel.findAndCountAll({
      where: bookWhere,
      distinct: true,
      attributes: bookAttributes,
      replacements,
      include: [
        {
          model: Database.libraryItemModel,
          required: true,
          where: libraryItemWhere,
          include: libraryItemIncludes
        },
        seriesInclude,
        authorInclude,
        ...bookIncludes
      ],
      order: sortOrder,
      subQuery: false,
      limit: limit || null,
      offset
    })

I fixed a couple of the easy cases by introducing indices on size and duration.
This leads to a significant improvement in the query performance when paging through the library with sortBy size or duration.

I only dealt with size and duration for now because they were the easiest. Other sort orders are either already covered by existing indices, or they're trickier to optimize because of the existing database schema (e.g. sorting by author). Maybe I'll try to improve them in a future PR.

How have you tested this?

I tested by running 35 consecutive page requests, by scrolling down the library page. There was no parallel fetching.

Sorting by size

Current:

Histogram {
  min: 78,
  max: 1048,
  mean: 588.3714285714286,
  exceeds: 0,
  stddev: 305.3454330252669,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 78,
    50 => 549,
    75 => 870,
    87.5 => 983,
    93.75 => 1019,
    96.875 => 1023,
    98.4375 => 1048,
    100 => 1048
  }
}

After adding index on (library_id, media_type, size) on table libraryItems:

Histogram {
  min: 37,
  max: 70,
  mean: 46.82857142857143,
  exceeds: 0,
  stddev: 5.940109254818776,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 37,
    50 => 46,
    75 => 49,
    87.5 => 53,
    93.75 => 55,
    96.875 => 55,
    98.4375 => 70,
    100 => 70
  }
}

Sorting by duration

Current:

histogram: Histogram {
  min: 72,
  max: 258,
  mean: 162.82857142857142,
  exceeds: 0,
  stddev: 46.34497088714818,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 72,
    50 => 154,
    75 => 182,
    87.5 => 238,
    93.75 => 254,
    96.875 => 254,
    98.4375 => 258,
    100 => 258
  }
}

After adding index on duration on table books:

Histogram {
  min: 25,
  max: 61,
  mean: 30.542857142857144,
  exceeds: 0,
  stddev: 6.030128438067621,
  count: 35,
  percentiles: SafeMap(8) [Map] {
    0 => 25,
    50 => 29,
    75 => 32,
    87.5 => 34,
    93.75 => 37,
    96.875 => 37,
    98.4375 => 61,
    100 => 61
  }
}

As you can the indices improve performance significantly.

One thing to note here is that even with the added index, the query performance slowly degrades with increasing offset.
This is because the larger the offset, the more records need to be examined to satisfy the query (this is a known issue when working with offsets). There's a technique to deal with this (keyset pagination), but it is a bit more tricky to implement. If you think it's worth the effort, I'll look into it.

@mikiher mikiher marked this pull request as ready for review January 1, 2025 07:26
@advplyr
Copy link
Owner

advplyr commented Jan 1, 2025

How are you generating that histogram data?

I've been working on cleaning out the old data model objects recently and the only remaining big ones are LibraryItem, Book, Podcast and PlaybackSession.
For a while now I've been looking forward to optimizing the API endpoints once we're no longer stuck having to convert everything to the old library item object.
That could potentially be a nice performance improvement also.

Thanks!

@advplyr advplyr merged commit 8c4d0c5 into advplyr:master Jan 1, 2025
5 checks passed
@mikiher
Copy link
Contributor Author

mikiher commented Jan 1, 2025

perf_hooks.createHistogram

@mikiher mikiher deleted the book-query-optimizations branch January 5, 2025 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants