two custom scaler methods are defined and shown in scaler_demo.ipynb
allows to set a suitable tradeoff between fixing the distribution of a feature and preserving information from original data. It is basically a discrete derivative of rank scaling (e.g. like pandas rank or similar to sklearn QuantileTransformer). compared to those it preserves more of the information from original data. while rank or quantile transformer produce uniform distribution (or close to uniform in case of repeated values), DerivativeRank produces a distribution which is somewhere between uniform and original. the parameter d, the order of derivative, controls how close the resulting distribution is to the original versus the uniform distribution.
Implementation as recursive funtion:
- array of values is ordered and consecutive deltas taken, repeated recursively d times
- differences are set to constant value 1
- integrating back d times, each time inversing the permutation of ordering applied in delta calculation, to obtain scaled array of values
an extension of logarithm to whole real numbers for feature scaling purpose. log-scaler has one parameter to control the behaviour around zero