Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reclassify raster #197

Merged
merged 96 commits into from
Feb 22, 2024
Merged

Add reclassify raster #197

merged 96 commits into from
Feb 22, 2024

Conversation

lehtonenp
Copy link
Collaborator

@lehtonenp lehtonenp commented Oct 10, 2023

This PR for reclassifying raster data functions.

@lehtonenp
Copy link
Collaborator Author

@nialov, I have made the requested changes to the reclassify methods and tests. Is there anything more to add?

Copy link
Collaborator

@nialov nialov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only few small things:

  • Remove tests/data/small_raster.tif.aux.xml file. It was probably just added by opening the raster in e.g. ArcGIS or QGIS.
  • Check that when you use the masked array you do not use np.nan... methods such as np.nanmax. Instead use np.ma.max and others from np.ma.

After these, LGTM!

mask = band == nan_value
masked_array = np.ma.masked_array(data=band, mask=mask)

median_value = np.nanmedian(masked_array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
median_value = np.nanmedian(masked_array)
median_value = np.ma.median(masked_array)

Comment on lines 287 to 288
min_tail = np.nanmin(tail_values)
max_tail = np.nanmax(tail_values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
min_tail = np.nanmin(tail_values)
max_tail = np.nanmax(tail_values)
min_tail = np.ma.min(tail_values)
max_tail = np.ma.max(tail_values)

from eis_toolkit.exceptions import InvalidParameterValueException


def _bands_non_negative(band: Sequence):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _bands_non_negative(band: Sequence):
def _check_bands_non_negative(band: Sequence):

Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I took a quick look at this and wrote some comments. I would consider renaming this module to "reclassify.py" and the tool names to reclassify_<strategy> or reclassify_with_<strategy>. Any thoughts about these renamings @lehtonenp and @nialov ?

Comment on lines 10 to 12
def _check_bands_non_negative(band: Sequence):
if any(n < 0 for n in band):
raise InvalidParameterValueException("The list bands contains negative values.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend taking this out and just using check_raster_bands found in utilities -> checks -> raster module.

def raster_with_manual_breaks( # type: ignore[no-any-unimported]
raster: rasterio.io.DatasetReader,
breaks: Sequence[int],
bands: Sequence[int],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not have any default value even thought the docstring claims so.

Suggested change
bands: Sequence[int],
bands: Optional[Sequence[int]] = None,

This applies to all tools in this module.


@beartype
def raster_with_geometrical_intervals( # type: ignore[no-any-unimported]
raster: rasterio.io.DatasetReader, number_of_classes: int, nan_value: Union[int, float], bands: Sequence[int]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the input is a raster DatasetReader, I don't think a nan_value parameter is strictly necessary. The raster should have nodata value defined in its metadata/profile and that should be used in the mask. But if we want to let the user to work with a faulty raster and override nodata in these kind tools, I would atleast change the name to nodata_value and the type Number (imported like this: from numbers import Number) which we have used for cases that allow floats and ints.

eis_toolkit/raster_processing/reclassify_raster.py Outdated Show resolved Hide resolved
@lehtonenp
Copy link
Collaborator Author

lehtonenp commented Feb 15, 2024

Hey, I took a quick look at this and wrote some comments. I would consider renaming this module to "reclassify.py" and the tool names to reclassify_<strategy> or reclassify_with_<strategy>. Any thoughts about these renamings @lehtonenp and @nialov ?

@nmaarnio, reclassify_with_<strategy> is a suitable name. I can change the function naming.

Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

Since @nialov was the main reviewer of this PR, let's wait for his approval before merging.

A small note not regarding the implementation itself:I guess the original comment of this PR that "It is not to be merged to the EIS toolkit code base." could be removed at this point

Copy link
Collaborator

@nialov nialov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Very well coded functionality!

@nmaarnio
Copy link
Collaborator

I actually encountered a few issues when implementing CLI functions for these tools just now. check_raster_bands expects the bands parameter to be a Sequence. Now None can be delivered to it. Furthermore, I realized that the functions are returning sequence of 2D Numpy arrays, whereas a 3D Numpy array is how mulitband raster data is usually represented (and can be easily saved with Rasterio).

I ended up creating a suggestion for refactored version of the public relcassification functions. Here cases where bands is None or empty sequence are handled, 3D Numpy array allocated and returned. Feel free to modify this or disagree, I just crafted it while thinking of these issues!

@beartype
def reclassify_with_manual_breaks(  # type: ignore[no-any-unimported]
    raster: rasterio.io.DatasetReader,
    breaks: Sequence[int],
    bands: Optional[Sequence[int]] = None,
) -> Tuple[np.ndarray, dict]:
    """Classify raster with manual breaks.

    If bands are not given, all bands are used for classification.

    Args:
        raster: Raster to be classified.
        breaks: List of break values for the classification.
        bands: Selected bands from multiband raster. Indexing begins from one. Defaults to None.

    Returns:
        Raster classified with manual breaks and metadata.

    Raises:
        InvalidParameterValueException: Bands contain negative values.
    """

    if bands is None or len(bands) == 0:
        bands = range(1, raster.count + 1)
    else:
        check_raster_bands(raster, bands)

    out_image = np.empty((len(bands), raster.height, raster.width))
    out_meta = raster.meta.copy()

    for i, band in enumerate(bands):
        band_data = raster.read(band)
        out_image[i] = _reclassify_with_manual_breaks(band_data, breaks)

    return out_image, out_meta

@nmaarnio
Copy link
Collaborator

Some changes to poetry.lock and pyproject.toml had slipped to this PR in some merge, I removed them. Looks good for merging now!

@nmaarnio nmaarnio merged commit 43f8919 into master Feb 22, 2024
6 checks passed
@nmaarnio nmaarnio deleted the add-reclassify-raster branch February 22, 2024 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants