Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(image-optimizer): use previously cached image cache entry if upstream image is identical #67257

Open
wants to merge 2 commits into
base: canary
Choose a base branch
from

Conversation

kahlstrm
Copy link

@kahlstrm kahlstrm commented Jun 27, 2024

What?

  • Reduces redundant image optimizations if the upstream image is unchanged, there is no need to run the optimization again as you can just use the previously optimized image.
  • Changes getHash-function to return base64url instead of custom variant of standard base64. This can be removed if deemed not necessary.
  • Moves all image ETag calculation logic (and introduce getImageEtag-function to do it) to happen inside image-optimizer for better consistency.

Why?

Currently when an image is requested and it becomes stale, the server will trigger a full image optimization for that image, using a significant spike in CPU-usage for routes that have multiple images (e.g. an image carousel).

How?

By calculating and storing the upstream etag in the cache entry, we can utilize the previously cached entry to check if the upstream image has remained the same, making it possible to reuse the previously cached entry again, as the image optimization would produce the same results anyways.

@ijjk
Copy link
Member

ijjk commented Jun 27, 2024

Allow CI Workflow Run

  • approve CI run for commit: f406464

Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer

isStatic: boolean,
xCache: XCacheHeader,
imagesConfig: ImageConfigComplete,
maxAge: number,
isDev: boolean
) {
const contentType = getContentType(extension)
const etag = getHash([buffer])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are storing the etag already, shouldn't we use that instead of recalculating the etag? This can be reverted to use getImageTag.

@kahlstrm kahlstrm force-pushed the feat/use-previous-cache-entry-if-same-upstream-image branch from bda2f1a to a1f627d Compare June 29, 2024 17:07
@kahlstrm
Copy link
Author

kahlstrm commented Jul 3, 2024

@styfle Hi, I hate to be that guy who randomly pings maintainers, but could you look at this to check if this is a wanted feature? For instance, our site becomes slow and unresponsive after a stale cache.

We have minimumCacheTTL configured for an hour, but many people might not know about it, and IMO, this would benefit people in general.

@ijjk
Copy link
Member

ijjk commented Jul 3, 2024

Failing test suites

Commit: a1f627d

TURBOPACK=1 pnpm test-dev test/development/acceptance/ReactRefreshRequire.test.ts (turbopack)

  • ReactRefreshRequire > re-runs accepted modules
  • ReactRefreshRequire > propagates a hot update to closest accepted module
  • ReactRefreshRequire > propagates a module that stops accepting in next version
Expand output

● ReactRefreshRequire › re-runs accepted modules

expect(received).toEqual(expected) // deep equality

Expected: ["init FooV1", "init BarV1"]
Received: undefined

  33 |       `require('./foo'); export default function Noop() { return null; };`
  34 |     )
> 35 |     expect(await session.evaluate(() => (window as any).log)).toEqual([
     |                                                               ^
  36 |       'init FooV1',
  37 |       'init BarV1',
  38 |     ])

  at Object.toEqual (development/acceptance/ReactRefreshRequire.test.ts:35:63)

● ReactRefreshRequire › propagates a hot update to closest accepted module

expect(received).toEqual(expected) // deep equality

- Expected  - 0
+ Received  + 2

  Array [
+   "init FooV1",
+   "init FooV1",
    "init BarV4",
  ]

  161 |       `
  162 |     )
> 163 |     expect(await session.evaluate(() => (window as any).log)).toEqual([
      |                                                               ^
  164 |       'init BarV4',
  165 |     ])
  166 |

  at Object.toEqual (development/acceptance/ReactRefreshRequire.test.ts:163:63)

● ReactRefreshRequire › propagates a module that stops accepting in next version

expect(received).toEqual(expected) // deep equality

- Expected  - 0
+ Received  + 1

  Array [
+   "init FooV1",
    "init BarV2",
    "init FooV1",
  ]

  442 |     // Since the export list changed, we have to re-run both the parent
  443 |     // and the child.
> 444 |     expect(await session.evaluate(() => (window as any).log)).toEqual([
      |                                                               ^
  445 |       'init BarV2',
  446 |       'init FooV1',
  447 |     ])

  at Object.toEqual (development/acceptance/ReactRefreshRequire.test.ts:444:63)

Read more about building and testing Next.js in contributing.md.

pnpm test-dev test/e2e/app-dir/params-hooks-compat/index.test.ts

  • app-dir - params hooks compat > should only access search params with useSearchParams
Expand output

● app-dir - params hooks compat › should only access search params with useSearchParams

expect(received).toEqual(expected) // deep equality

- Expected  - 3
+ Received  + 1

- Object {
-   "q": "pages",
- }
+ Object {}

  17 |
  18 |     expect(appSearchparamsJSON).toEqual({ q: 'app' })
> 19 |     expect(pagesSearchparamsJSON).toEqual({ q: 'pages' })
     |                                   ^
  20 |   })
  21 |
  22 |   it('should only access path params with useParams', async () => {

  at Object.toEqual (e2e/app-dir/params-hooks-compat/index.test.ts:19:35)

Read more about building and testing Next.js in contributing.md.

'"url" parameter is valid but image type is not allowed'
)
}
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this svg logic moved?

Copy link
Author

@kahlstrm kahlstrm Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this a leftover, or don't remember anymore why I moved it, seems like there's no point in doing that. Reverted this now

return (
previousCacheEntry?.value?.kind === 'IMAGE' &&
// If the cached image is optimized
previousCacheEntry.value.upstreamEtag !== previousCacheEntry.value.etag &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why upstreamEtag should not be equal to etag?

This seems like it will always return true when not equal and therefore it will never actually revalidate, which seems wrong to me.

Copy link
Author

@kahlstrm kahlstrm Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this PR tries to do is prevent hitting this specific path, with optimizeImage is the expensive operation.

upstreamEtag represents the original source image's etag (currently calculated via the same function getImageEtag), whereas the cache entry's etag represents the resulting image's etag, i.e. in the case image optimization is done, the etag will be the hash of that optimized image instead of the upstream image.

This condition exists to cause a revalidation on cases where the optimization did not happen previously (in this case we fallback to using upstreamBuffer and upstreamEtag for the image), so this will fail early on those cases and cause a new optimization attempt. Although now thinking back at it the likelyhood of the image optimization ending up with the same outcome is quite likely anyways.

When this condition is true, we'll check the next condition, comparing the previous upstream image's etag against the current upstream image's etag to determine if we should revalidate or not.

Copy link
Member

@styfle styfle Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization not happening could be because of a fallback (such as an error was thrown) or it could be because the image was vector form (svg doesnt need to be optimized).

Either way, we still need to revalidate those images because the upstream response could have changed since the previous request. So I suggest removing line 463 and that way the following line (465) can actually compare the new upstream etag.

Copy link
Author

@kahlstrm kahlstrm Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not quite following, as the function is shouldUsePreviouslyCachedEntry, all of the conditions inside it need to return true for the revalidation not to occur.

This is how I see it:

  • If this comparison is false, we will revalidate the image
  • If this comparison is true, we will do the etag comparison (line 465) to determine if we want to revalidate the image.

By revalidate, I mean that the previous entry is not used and we'll fallback to previous behaviour

Am I misunderstanding something? What do you mean by revalidation? I'm fine with removing this line, but don't fully understand the reasoning 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is the scenario where the line is needed? Is there a test for this?

I can't see your link anymore because you force pushed.

(note: you don't need to force push because we squash on merging)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now. I read through this again and I see that the line is handling the case when svg or animated. I think the comment should mention this.

// If the cached image is optimized
previousCacheEntry.value.upstreamEtag !== previousCacheEntry.value.etag &&
// and the upstream etag is the same as the previous cache entry's
getImageEtag(upstreamImage.buffer) === previousCacheEntry.value.upstreamEtag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could improve performance here by using the etag from the upstream response header and fallback to the hash if its missing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of using the upstream response header at first, but didn't like how the code turned out, or because of the weak validation syntax (it can start with W/, the etag value would need to be base64-encoded correctly to prevent any errors.

What I had previously was alter ImageUpstream to include etag and then get it from the response and have the hash as fallback. Didn't quite like the implementation then, but writing an ETag extractor helper function would probably make it a bit nicer and make it possible to write tests for it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this now along with tests, not 100% happy with the tests but they should still be sufficient I think.

@kahlstrm kahlstrm force-pushed the feat/use-previous-cache-entry-if-same-upstream-image branch from a1f627d to 39d6d5a Compare July 3, 2024 23:51
@kahlstrm kahlstrm force-pushed the feat/use-previous-cache-entry-if-same-upstream-image branch 3 times, most recently from 54d55bd to 3c09907 Compare July 21, 2024 00:00
@kahlstrm kahlstrm requested a review from styfle July 21, 2024 00:01
@kahlstrm kahlstrm force-pushed the feat/use-previous-cache-entry-if-same-upstream-image branch from 3c09907 to fba6061 Compare August 6, 2024 00:14
it('should return base64url encoded etag if etag is provided', () => {
const etag = 'sample-etag'
const result = extractEtag(etag, Buffer.from(''))
expect(result).toMatch(/^([a-z]|[A-Z]|[0-9]|-|_)+$/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use exact string comparison instead of regex here?

it('should not return weak etag identifier if etag is provided', () => {
const etag = 'W/"sample-etag"'
const result = extractEtag(etag, Buffer.from(''))
expect(result).not.toMatch(/^W\//)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, can we use exact string comparison instead of regex here?

@kahlstrm kahlstrm force-pushed the feat/use-previous-cache-entry-if-same-upstream-image branch 2 times, most recently from f2acee3 to cd61981 Compare August 24, 2024 07:39
@kahlstrm kahlstrm requested a review from styfle August 24, 2024 16:34
@kahlstrm kahlstrm force-pushed the feat/use-previous-cache-entry-if-same-upstream-image branch from cd61981 to f406464 Compare August 24, 2024 16:35
etag: string | null | undefined,
imageBuffer: Buffer
) {
if (!!etag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!!etag) {
if (etag) {

Comment on lines +16 to +17
expect(result).not.toContain('/')
expect(result).toMatch('Vy8ic2FtcGxlLWV0YWci')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expect(result).not.toContain('/')
expect(result).toMatch('Vy8ic2FtcGxlLWV0YWci')
expect(result).toEqual('Vy8ic2FtcGxlLWV0YWci')

it('should return base64url encoded etag if etag is provided', () => {
const etag = 'sample-etag'
const result = extractEtag(etag, Buffer.from(''))
expect(result).toMatch('c2FtcGxlLWV0YWc')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expect(result).toMatch('c2FtcGxlLWV0YWc')
expect(result).toEqual('c2FtcGxlLWV0YWc')

const maxAge = getMaxAge(imageUpstream.cacheControl)

if (shouldUsePreviouslyCachedEntry(imageUpstream, previousCacheEntry)) {
// Return the cached optimized image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Return the cached optimized image

You can remove this comment.

Alternatively, you can change shouldUsePreviouslyCachedEntry() from returning a boolean to returning the result and naming it something like getPreviouslyCachedOrNull()

// Return the cached optimized image
return {
buffer: previousCacheEntry.value.buffer,
contentType: paramsResult.mimeType,
Copy link
Member

@styfle styfle Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the contentType is correct here?

We have a lot of logic to determine the contentType below so perhaps we should move the cached entry check below this:

let contentType: string
if (mimeType) {
contentType = mimeType
} else if (
upstreamType?.startsWith('image/') &&
getExtension(upstreamType) &&
upstreamType !== WEBP &&
upstreamType !== AVIF
) {
contentType = upstreamType
} else {
contentType = JPEG
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants