Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

287 add local morans i #305

Merged
merged 17 commits into from
Feb 20, 2024
Merged

287 add local morans i #305

merged 17 commits into from
Feb 20, 2024

Conversation

tomiturunen1
Copy link
Collaborator

Added Local Moran's I function

@tomiturunen1 tomiturunen1 self-assigned this Jan 30, 2024
Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, even if I put quite many comments, looks solid overall! Most of my comments relate to cosmetic stuff and are straightforward changes

w = libpysal.weights.KNN.from_dataframe(gdf, k=k)
else:
raise ValueError("Invalid weight_type. Use 'queen' or 'knn'.")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we should row-standardize the weights to minimize bias. I am not sure if it is default/automatically done, but you could add this row to be sure:

Suggested change
weights.transform = 'R'

elif weight_type == "knn":
w = libpysal.weights.KNN.from_dataframe(gdf, k=k)
else:
raise ValueError("Invalid weight_type. Use 'queen' or 'knn'.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use InvalidParameterValueException from our custom exceptions here.

gdf: gpd.GeoDataFrame,
column: str,
weight_type: Literal["queen", "knn"] = "queen",
k: Union[int, None] = 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make remove the option to have None here. If the user decides to use knn, k must be defined, and if they don't use knn they can just ignore this parameter.

I am not entirely sure about this, but I would increase the default value of k a bit, perhaps to 4. Looking at our example geochemical dataset, it would make sense to me that k should be higher than 2, although I don't know how representative our example dataset is. Of course the users can increase k as they wish, if they know what they're doing..

Comment on lines 44 to 47
"""Execute Local Moran's I calculation for the area.

Args:
gdf: The geodataframe that contains the area to be examined with local morans I.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit nitpicky comment, but I would change the wording here from "area" to "data".

Suggested change
"""Execute Local Moran's I calculation for the area.
Args:
gdf: The geodataframe that contains the area to be examined with local morans I.
"""Calculate Local Moran's I statistics for input data.
Args:
gdf: Geodataframe containing the input data.

Geodataframe containing the calculations.

Raises:
EmptyDataFrameException if input geodataframe is empty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EmptyDataFrameException if input geodataframe is empty.
EmptyDataFrameException: The input geodataframe is empty.

Comment on lines 48 to 51
column: The column to be used in the analysis.
weight_type: The type of spatial weights matrix to be used. Defaults to "queen".
k: Number of nearest neighbors for the KNN weights matrix. Defaults to 2.
permutations: Number of permutations for significance testing. Defaults to 999.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add checks for the column, k and permutations inputs. So check that column is in the input geodataframe, k is at least 1, and permutations is at least 100 (technically, it only needs to be a positive number, but I think we could set the limit somewhere around 100. smaller values probably lead to too inaccurate p-values)

moran_loc = Moran_Local(gdf[column], w, permutations=permutations)

gdf[f"{column}_local_moran_I"] = moran_loc.Is
gdf[f"{column}_p_value"] = moran_loc.p_sim
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this column could be f"{column}_local_moran_I_p_value" or f"{column}_local_moran_I_p", so that it is clear which statistic's p-value it is.

Comment on lines 23 to 24
if len(gdf[column]) != len(w.weights):
raise ValueError("Dimension mismatch between data and weights matrix.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this mismatch happen? Anyway, change the exception to some eis_toolkit exception.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, we haven't made unit tests so far like they have been implemented here. Meaning that the target values are calculated during the test using the underlying library functions. You know more about testing @nialov , could you comment this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good that you added a notebook for example! However, I think we could use here our geochemical example data that is now in GitHub in the master branch. I think it's preferable to demonstrate using our geological data if convenient and possible, and I believe one of the primary uses for this Local Moran's I might be to inspect spatial autocorrelation of point geochemical data.

So the dataset is called "IOCG_CLB_Till_Geochem_reg_511p.gpkg" and should be found directly under the remote folder. For the target column, any of the mineral columns will probably do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook's data is "geographical" as I was not comfortable with using example data I do not fully understand. Hence, I used GDP of African countries to demonstrate the correctness of the Local Moran's I.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's absolutely fine. I have also used similar example datasets for some functions. This improvement just came to my mind, since we added the geochem data recently to master and I have personally used Local Moran's I for similar data exploration in my previous job.

Copy link
Collaborator

@lehtonenp lehtonenp Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in the context of EIS it would be better to use the geochem data. I just do not know what columns (maybe any) and which parameters for weight_type and k are feasible for this analysis.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think you could use for example "Li_ppm_551", so concentration of Lithium.

@nmaarnio
Copy link
Collaborator

nmaarnio commented Feb 6, 2024

I guess you have done all the updates now? Looks good to me otherwise, but I still think it could be better to compare results with static values rather than use the underlying library within the test. @nialov do you have opinions regarding this?

@nmaarnio
Copy link
Collaborator

@tomiturunen1 are you still working on this? If not, let me know and I can do the test adjustments and we'll get to merge this.

@nialov
Copy link
Collaborator

nialov commented Feb 16, 2024

I guess you have done all the updates now? Looks good to me otherwise, but I still think it could be better to compare results with static values rather than use the underlying library within the test. @nialov do you have opinions regarding this?

Static values yes! The current implementation after the recent changes looks about right.

Copy link
Collaborator

@nmaarnio nmaarnio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep looks fine now, merging!

@nmaarnio nmaarnio merged commit 9bb87ca into master Feb 20, 2024
6 checks passed
@nmaarnio nmaarnio deleted the 287-Add-Local-Morans-I branch February 20, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants