Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log vs arithmetic differences and curve CI's #165

Open
dmcglinn opened this issue Aug 20, 2017 · 7 comments
Open

log vs arithmetic differences and curve CI's #165

dmcglinn opened this issue Aug 20, 2017 · 7 comments

Comments

@dmcglinn
Copy link
Member

dmcglinn commented Aug 20, 2017

This comment was submitted to me from Niv de Malach [email protected]. Below I have also posted my replies each email as its own entry.

I started using your MOBR package (just the tutorial from Rpubs) and found it wonderful. Since I read the EL paper by Chase & Knight I thought that I should analyze data using this approach so this package will save me a lot of time.
Still, I have some queries about the package

  1. I do not understand why effect size is calculated as the difference in richness and not log richness. In most studies effect size is the log response ratio (equivalent to the difference in log richness). note that calculating the absolute difference means that five species difference is equal effect for both square meter scale and square kilometer scale (that's why in meta analyses they use log response ratio which is the relative difference). Another limitation of this absolute difference approach is that it cannot be compared to most species area curves that are plotted on log-log scale.
    While it possible to show the log scale for richness using "plot_rarefaction" command. I didn't find a way to do it for all the other commands (get_mob_stats,get_delata_stats). I will highly appreciate if you would be able to help me with this (I don't mind digging into the code and changing it).
  2. I guess all the plots showing results of rarefactions are based on (arithmetic) means . Is it possible to add some confidence interval\SE to assess its accuracy? I would be even better if it will give the actual points which would allow using regression since assessment of curves based on their means instead of real values might be biased (https://en.wikipedia.org/wiki/Ecological_fallacy).
    I apologize for all the annoying comments, if you are not the one writing the codes please refer me to the right person. Importantly, beside this comments, everything else in the package is just awesome!
@dmcglinn
Copy link
Member Author

Hey Niv,

Thank you so much for your very nice email. I am really glad that you are finding the package useful. You're questions are great and very natural. Do you mind if I post your questions and my reply on github (https://github.com/MoBiodiv/mobr/issues) ?

  1. We (Xiao Xiao, Jon Chase, Brian McGill, Nick Gotelli, and others) went back and forth on whether to use arithmetic or log differences. I agree with the reasons you listed as good arguments for why log differences in S are meaningful; however, I believe we decided to go with arithmetic differences primarily because 1) our theoretical construct of how N, SAD, and aggregation effects influence S is additive rather than multiplicative, 2) arithmetic differences are easier to grasp and communicate (e.g., the treatment increased richness by 3 species), 3) our target audience (folks that do experiments) would seem to be more interested in arithmetic differences, and 4) we carried out a simulation experiment that validated our ability to accurately detect (low type I and II error) N, SAD, and aggregation effects using arithmetic differences (we did not check if log differences were also valid). It would be possible to implement log differences in the package but it would require changing code in many different functions so it is currently low on my priority list but if you wanted to attempt a patch and submit a PR I would be happy to review it and then extend coauthorship to you on the R package.

  2. The package produces different kinds of rarefaction curves so you may have to be more specific about which curve you are referring to. In the spatial rarefaction curve we do calculate the mean of all possible starting points. A parametric confidence interval is likely not possible to derive. One could use 95% quantiles to visualize uncertainty but I would not use this approach as a formal test of differences between two curves. We developed a null model for this purpose which compares each treatment to complete spatial randomness and then compares the degree of difference between the observed and null curves. The raw spatial rarefaction points could be returned fairly easily but right now only their average is returned by the function rarefaction.

Let me know if you still have questions and if you feel ok having this conversation publicly on github where my coauthors and others can more easily chime in. Also we can tag PR's to your questions during future development.

@dmcglinn
Copy link
Member Author

Hi Dan,
Thanks a lot for the detailed answer...

  1. I am not sure what is your 'theoretical construct'...
  2. All the curves in this package are means of many randomizations (where each time a different sampling unit\individual is 'sampled' first). Obviously taking the average is leading to 'loss of information' (and I am not sure whether the arithmetic mean, the geometric mean or the median would be the best choice). Therefore I suggested showing the results of all the simulations (e.g. small bright-colored open circles) in addition to the curves. Another possibility is showing some measure of the dispersion from the mean (e.g. SD\SE\quantiles\CI).
    In my view, null models are essential but they cannot fully substitute the information of such measures so both are required for interpretation of the results.
    It seems to me that your view is similar since in all other comparisons in this pacakge (e.g. PIE, density) there are box plots (i.e. quantiles) and not only the significance level of the randomization tests.

I understand that my suggestions are probably not on the top of your priority list, but maybe they will be in the future... Anyway thanks for building this package

best,
Niv

P.S. You can post any part of this discussion on github...

@dmcglinn
Copy link
Member Author

Sorry that was a bit vague about our theoretical construct it is simply that changes in species richness between two communities are driven by changes in either the SAD, numbers of individuals (N), or spatial patchiness, and that these components can be additively partitioning out of differences in species richness.

The individual based and non-spatial based rarefaction curves (i.e., curves used to derive N and SAD effects) are analytically derived based on the mean expected value. CI's have been derived by other authors and we could build those into the package but as of yet we have no plans for this. The spatial rarefaction curve is deterministic (not stochastic) with the exception that we average over all possible starting plots. Also we have a draft of a paper describing the package circulating with coauthors that we hope to submit soon. If you would like I'm happy to send you.

Dan

@rueuntal
Copy link
Contributor

@dmcglinn thanks for responding to the question. It's super awesome to hear that other folks are using our package (and finding it helpful)! Totally made my day.

I think you've already provided great answers, but I have an idea regarding Niv's first point, which could be very easy to implement (without even changing our code maybe). Imagine at a particular scale the treatment has 18 species while the control has 20. So the treatment reduces S by 2, or 10% (which I think is what Niv meant by log-difference). Our analysis could show that 1 of the loss species is due to aggregation, -7 is due to SAD (ie SAD actually is more even in treatment), 8 is due to N. As you said, the framework is completely additive, and we have 2 = 1 + (-7) + 8.

Now, if someone like Niv wants to know the log-difference, wouldn't it be the same as setting the control as the baseline, and directly compare those delta-S values to the control? ie. instead of talking about absolute numbers we could say aggregation reduces S by 5%, SAD increases S by 35%, and N reduces S by 40%, and the net change is 10%.

Does that make sense or am I over-simplifying the question?

@NivDeMalach
Copy link

Great idea! this is what I was looking for...

@dmcglinn
Copy link
Member Author

hey @rueuntal that does seem like a really simple elegant solution! That is pretty similar to what our proportional stacked bar plots already do but I think it is worth implementing. I'll try to carve out time to add this week.

@rueuntal
Copy link
Contributor

Sounds great @dmcglinn thanks!

@dmcglinn dmcglinn mentioned this issue Sep 16, 2020
31 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants