Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add histogram calculation for RasterBand #468

Merged
merged 7 commits into from
Dec 2, 2023

Conversation

spadarian
Copy link
Contributor

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

This PR exposes 2 GDAL methods to calculate band histograms (default and user-defined).

src/raster/rasterband.rs Outdated Show resolved Hide resolved
self.max
}
/// Histogram values for each bucket
pub fn histogram(&self) -> &[i32] {
Copy link
Contributor

@metasim metasim Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming buckets or counts? With this name, you'll end up with things like this:

let rb: Rasterband<'_> = ...;
let histogram = rb.get_histogram(...);

for count in histogram.histogram() { // <--- awkward.
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found it weird but I was just being consistent with GDAL naming. Happy to change it if we are not aiming for that.

pub struct Histogram {
min: f64,
max: f64,
n_buckets: i32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this usize? Sometimes GDAL uses signed values when it doesn't make semantic sense, and there's precedence for rectifying this in Rust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't contributed for a while and from memory we were using isize. I do agree that we could rectify that.

How far do we want to take it? I could also use a NonZeroUsize, which could be also useful when selecting a band from a dataset... (@lnicola)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol... yeah, I've argued for NonZeroUsize, but creating them is really annoying/verbose.

When I see isize or i32 I immediately ask myself "What does a negative value mean?". If it has meaning, we should document it. If it doesn't, then I'd at least try in the bindings to coerce it into an unsigned type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wish constructing a NonZeroUsize from a constant/literal was possible without unsafe or bloating the binary. 😔

Comment on lines 962 to 963
histogram: *mut i32,
histogram_vec: Option<Vec<i32>>,
Copy link
Contributor

@metasim metasim Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this storing the counts twice? Who owns histogram?

It would really help to document why this is neccessary. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry. I should document that. I used that double storage mechanisms because with GDALGetDefaultHistogram C owns histogram and with GDALGetRasterHistogram Rust does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a restating of my idea from before. Does this capture the semantics? I should probably just compile and step through the tests, but I'm admittedly confused if I've gotten the borrowed vs owned cases switched...

pub struct Histogram {
    min: f64,
    max: f64,
    n_buckets: i32,
    counts: HistCounts
}

enum HistCounts {
   Borrowed(*mut libc::c_void),
   Owned(Vec<i32>)
}

...

impl Drop for HistCounts {
    fn drop(&mut self) {
        match self {
            Self::Borrowed(p) => ???;
            Self::Owned(v) => ???;
        }
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries mate. I'm pushing those changes now.

src/raster/rasterband.rs Outdated Show resolved Hide resolved
/// # Arguments
///
/// * `force` - If `true`, force the computation. If `false` and no default histogram is available, the method will return None
pub fn get_default_histogram(&self, force: bool) -> Result<Option<Histogram>> {
Copy link
Contributor

@metasim metasim Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For coherence with other methods/functions, I'd consider collapsing these two methods into a single one accepting a new HistogramOptions with a Default implementation setting the GDALGetDefaultHistogram settings.

@lnicola Do you agree with me? Even though there are separate GDALGetDefaultHistogram and GDALGetRasterHistogram binding functions, we could be more idiomatic with the HistogramOptions approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer thin wrappers to avoid learning a new API but I'm happy to make it more idiomatic if you guys want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Let's hear what some of the other contributors think. I'm on the extreme of "let's not leak C-language workarounds" and "let's leverage Rust to improve ergonomics". Others, I think, are more in your camp, so let's let someone else make the call. We've had related debates here.

src/raster/rasterband.rs Outdated Show resolved Hide resolved
max: f64,
n_buckets: i32,
histogram: *mut i32,
histogram_vec: Option<Vec<i32>>,
Copy link
Contributor

@metasim metasim Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think the counts should be usize or u32, regardless of what GDAL C returns, but not sure that's a consensus opinion. @lnicola @jdroenner your opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that @lnicola did something similar with raster index, so probably it makes sense to do it like that moving forward.

CHANGES.md Show resolved Hide resolved
@spadarian
Copy link
Contributor Author

I've incorporated some of the changes suggested by @metasim, including:

  • Double storage mechanism for Borrowed and Owned histogram counts (makes sense and it is hidden from the user)
  • Changing isize to usize since we recently changed that for raster indexing

I did not merge the calls for GDALGetDefaultHistogram and GDALGetRasterHistogram since I think having a API more or less consistent with the C code is desirable. Open to change it if that is where the project is going @lnicola @jdroenner

src/raster/rasterband.rs Outdated Show resolved Hide resolved
@metasim
Copy link
Contributor

metasim commented Nov 29, 2023

@spadarian Take a look at this PR against your repo for an alternative naming and associated documentation, which I think clarifies things for a future contributor.

The other option (assuming VSIFree is required instead of using drop), is to have default_histogram copy the allocated array into the same Vec as used by histogram, then VSIFree the array. I personally like this approach, but I would expect some to balk at the extra copy. Makes no difference to the user tho.

src/raster/rasterband.rs Outdated Show resolved Hide resolved
src/raster/rasterband.rs Outdated Show resolved Hide resolved
src/raster/rasterband.rs Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
src/raster/rasterband.rs Outdated Show resolved Hide resolved
src/raster/rasterband.rs Outdated Show resolved Hide resolved
@lnicola
Copy link
Member

lnicola commented Nov 29, 2023

We should also add set_default_histogram, but it's fine to do it in another PR.

@spadarian
Copy link
Contributor Author

I addressed all the comments... but CI failed on a unrelated test ( raster::processing::dem::slope::tests::test_slope)

@lnicola
Copy link
Member

lnicola commented Nov 29, 2023

Error: CplError { class: 3, number: 1, msg: "/__w/gdal/gdal/target/dem-hills-slope.tiff, band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedStrip() failed." }

@lnicola
Copy link
Member

lnicola commented Nov 29, 2023

Ah, there's a race between tpi and slope. I can try to fix it tomorrow.

Should we use /vsimem/ here?

@metasim
Copy link
Contributor

metasim commented Nov 30, 2023

@spadarian Will you accept this? spadarian#12

Thanks for merging! :)

@lnicola
Copy link
Member

lnicola commented Nov 30, 2023

@metasim isn't that already merged?

@metasim
Copy link
Contributor

metasim commented Nov 30, 2023

@metasim isn't that already merged?

Oops! Yes. Sorry. Was looking at a non-refreshed page.

@rouault
Copy link
Contributor

rouault commented Nov 30, 2023

little robustness fix in spadarian#13

@spadarian
Copy link
Contributor Author

Rebased @lnicola

src/raster/rasterband.rs Outdated Show resolved Hide resolved
/// Pointer to GDAL allocated array and lenght of histogram counts.
///
/// Requires freeing with [`VSIFree`][gdal_sys::VSIFree].
GdalAllocated(*mut u64, usize),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metasim this seems to use u64 consistently for the counts, did you have some concerns about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sounds fine. I just had comments about it to make sure we used the .*Ex variants in boths cases.

src/raster/rasterband.rs Outdated Show resolved Hide resolved
@lnicola lnicola merged commit d279c9f into georust:master Dec 2, 2023
9 checks passed
@spadarian spadarian deleted the histogram branch December 2, 2023 18:17
@lnicola lnicola mentioned this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants