-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stability] Survive vips sigabrt #75
Conversation
@@ -0,0 +1,90 @@ | |||
package main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a binary to stress test the lib in the vectorDB path
@@ -29,11 +29,9 @@ func main() { | |||
sourceConfig.Rank = 0 | |||
sourceConfig.WorldSize = 1 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a binary to stress test the lib in the filesystem path
@@ -161,12 +167,8 @@ type DatagoClient struct { | |||
|
|||
// GetClient is a constructor for the DatagoClient, given a JSON configuration string | |||
func GetClient(config DatagoConfig) *DatagoClient { | |||
// Make sure that the GC is run more often than usual | |||
// VIPS will allocate a lot of memory and we want to make sure that it's released as soon as possible | |||
os.Setenv("GOGC", "10") // Default is 100, we're running it when heap is 10% larger than the last GC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was ugly, not required when we properly release the buffers (and it doesn't crash on us)
// Initialize the vips library | ||
err := os.Setenv("VIPS_DISC_THRESHOLD", "5g") | ||
err := os.Setenv("VIPS_DISC_THRESHOLD", "10g") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other terms: never go do disk. Could be changed depending on where this runs, 2TB servers for Photoroom so ram is fine
} | ||
|
||
err = img.AutoRotate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not required anymore, can be done at load time, a little cleaner
pkg/serdes.go
Outdated
// Decode the image payload using vips | ||
img, err := vips.NewImageFromBuffer(buffer) | ||
// Decode the image payload using vips, using bulletproof settings | ||
importParams := vips.NewImportParams() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit and visible load settings, didn't make anything more stable but still a good thing probably
// If the image is 4 channels, we need to drop the alpha channel | ||
if img.Bands() == 4 { | ||
err = img.Flatten(&vips.Color{R: 255, G: 255, B: 255}) // Flatten with white background | ||
if err != nil { | ||
fmt.Println("Error flattening image:", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code style, reporting these messages directly in the error, nicer
return nil, -1., fmt.Errorf("error converting to sRGB: %w", err) | ||
} | ||
} else { | ||
// // FIXME: maybe that we could recover these still. By default throws an error, sRGB and PNG not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaner conversion, vips would complain that we were doing something illegal for these images so skipping it now
// Optionally crop and resize the image on the fly. Save the aspect ratio in the process for future use | ||
originalWidth, originalHeight := img.Width(), img.Height() | ||
|
||
if transform != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the crux of the stability fix when direct-to-cluster, the subsequent vips call can trigger a sigabrt and I couldn't find any way to catch this case beforehand.
Flipping this around, this can happen but we recover from it. Guess is that some network transmission was broken and we didn't catch it as we should have
if err != nil { | ||
return nil, -1., err | ||
} | ||
imgBytes, _, err = img.ExportPng(&vips.PngExportParams{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as in the loading case, be explicit above the params used here instead of using ones which would depend on the library defaults
// Define bit depth de facto, not exposed in the vips interface | ||
bitDepth = len(imgBytes) / (width * height * channels) * 8 // 8 bits per byte | ||
} | ||
|
||
defer img.Close() // release vips buffers when done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this helps a lot in high throuhgput situations, lowers the RAM use and just generally makes sense, we're done with this vips intermediate at this point
client_config.SourceConfig = source_config | ||
client = datago.GetClient(client_config) | ||
|
||
client_config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous implementation was buggy, in that the update to the config object in the python space wouldn't be reflected in go. This json/dict approach is readable and not broken
012c695
to
e8b826a
Compare
can still crash eventually |
be2d5e6
to
678303c
Compare
- now reproed the issue, looks like it's around the autorotate part - fixing this hopefully
678303c
to
e957b2e
Compare
can still be made to hard crash, when trying long enough. |
reviving this for testing |
494df94
to
d04f7ff
Compare
For some payloads vips will sigabrt on a function call, without an obvious way to catch this beforehand (initial loadbuffer doesn't error out, image characteristics have nothing specific). Happening very often on a given cluster, guess is that network could be less reliable and broken packets would not be caught somehow.
Adding a relatively ugly but seemingly reliable defer(recover) pattern which will skip this payload for now, testing overnight. Not sure how to unit test this
Note that this PR enables img.Close()-ing the vips image when done, which lowers the RAM use quite dramatically (not waiting for the go GC)