-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
129 add MLP #252
129 add MLP #252
Conversation
…s, add utility func
eis_toolkit/prediction/mlp.py
Outdated
dropout_rate: Optional[Number] = None, | ||
early_stopping: bool = True, | ||
es_patience: int = 5, | ||
metrics: Optional[Sequence[Literal["accuracy", "precision", "recall"]]] = ["accuracy"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add F1Score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F1score does not seem to be listed at least here https://keras.io/api/metrics/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is one of the most used metrics for evaluation of classification problems 😉
For classification, I would def. go with
- accuracy
- precision
- recall
- f1 score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was puzzled why I couldn't find in the docs since it should be a used metric. Odd that it didn't show in the site I looked, but I added it now!
neurons: Sequence[int] = [16], | ||
validation_split: Optional[float] = 0.2, | ||
activation: Literal["relu"] = "relu", | ||
output_neurons: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For classification, we need at least two classes, not one ?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For binary classification, is it not one class what we want? That's what I've understood. Related to this, should we have sigmoid
as the default last activation and binary_crossentropy
as the loss function if we make binary classification the default mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my (limited) knowledge:
Using softmax
requires always the desired number of classes. I.e., the number of classes you want as an output. Either binary or multiclass, does not matter here. With the softmax
function, you'll get a probability for each class (2 or more). You have to check at which position the highest probability occurs (argmax
) and then, depending on the position in the label data, you`re able to assign/get the correct class.
In the opposide way, using sigmoid
, you need exactly one output unit (or neuron). The output is a probability (one value), which can be interpreted either to belong to the one or to the other class. You can assign the classes by rounding
them up/down. If ´probability´ <= 0.5, its class zero or the first class, if probability > 0.5
, it's the other.
To conclude, the one unit output is fine, but I would bring the defaults to a consistent level: either binary with softmax:2
or sigmoid:1
, setting the loss to binary_crossentropy
. Alternatively, keeping the loss at categorical_crossentropy
, keeping the output-activation with softmax
, but then set the classes to at least two (or maybe None
since we do not have an idea about the users case, if multi-class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it 👍 . Sounds like we have two valid options, I will go for the 1 output class, binary crossentropy, sigmoid combo for the defaults now if you think that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is 😉
eis_toolkit/prediction/mlp.py
Outdated
if dropout_rate is not None: | ||
model.add(keras.layers.Dropout(dropout_rate)) | ||
|
||
model.add(keras.layers.Flatten()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Flatten
is not needed, because we're not working with real images and convolution layers here. If building the sequence with Dense
layers, the Flatten
must be inserted right before the first Dense
layer to forward propagate the data. Just proove that the input tensor is of rank 1, meaning a 1-dim array with shape (X,). If the data are already in "good" shape, no flattening is needed.
Example:
Read rasters in advance and then create stack from it:
raster_stack = np.stack((raster_1, raster_2, raster_3), axis=-1)
raster_stack = raster_stack.squeeze()
For the model sequence, either use layers.Flatten
within the sequence, but you can keep the functional api-approach:
model = tf.keras.Sequential([
# Flattening
tf.keras.layers.Flatten(input_shape=raster_stack.shape),
# Dense layers
tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
tf.keras.layers.Dense(output_size, activation='softmax')
])
or reshape it manually like:
reshaped = raster_stack.reshape(-1, raster_stack.shape[-1])
model = tf.keras.Sequential([
# Flattening
tf.keras.layers.Input(shape=(reshaped.shape[-1],)),
# Dense layers
tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
tf.keras.layers.Dense(output_size, activation='softmax')
])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks. I think I just copied the Flatten layer from Luca's code where I saw it – perhaps it got accidentally there. I removed it now. I think we could assume that people who use EIS for scripting / notebooks can prepare their data. We could add some sort of check for the input shape if you think that's a good idea. For plugin users, we can make a reshaping/preparation function that we call in the CLI before passing data to these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Think for non-advanced users, a the "shaping" should be non-visibly done under the hood. Shape errors are the most common errors in this kind of work.
eis_toolkit/prediction/mlp.py
Outdated
def train_MLP_regressor( | ||
X: np.ndarray, | ||
y: np.ndarray, | ||
neurons: Sequence[int] = [16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's difficult to set this up with defaults. However, we're usually taking n * 2 + 1
for the number of neurons in a hidden unit as starting point, depending on the number n
of evidence layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you suggest that we remove the default for this parameter completely? I guess at least for QGIS plugin users we could offer a choice to with this n * 2 + 1
default which would be calculated from the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, let's remove and provide the necessary "dynamic defaults" within the QGIS Plugin 🐱👤
eis_toolkit/prediction/mlp.py
Outdated
X, | ||
y, | ||
epochs=epochs, | ||
validation_split=validation_split if validation_split else 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a validation_data
parameter, that can be used if you want to provide a "fixed" validation data set for training. The TF version shuffles during each run or epoch as far as I understood it. When providing a prepared set for validation, this would not be the case, like comparable to the sklearn
simple split.
Think that's an optional thing, but we could think about to integrate as an alternative.
Documentation says:
If both validation_data
and validation_split
are provided, validation_data
will override validation_split
So should be easy to implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good idea. I added this too. Do you think we want to include this option to provide already split validation data for the Sklearn ML models too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it now 😉 As far as I know, sk-learn
does not allow explicit validation datasets within their model.fit
functions. The .fit
call or rather the arguments might be dependend on the method, but I did not find any parameter that takes an extra validation data set. I would say, those functions are fine as they are right now. And the .score
function, which comes separately after fitting, is already fed with the test portion of the data.
Just to clarify: the .fit
function name is the common agreement to call the model training process across different libraries, but it's different from each to another - so we can´t simply add the same parameters from tf
into the sk-learn
stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was just thinking that we could provide an option in the functions we offer to give a separate validation dataset. When defined, it would override calling the split_data
inside _train_and_validate_sklearn_model
and directly assign X_valid
and y_valid
from the given dataset. But I don't see a need to add this, just thought I'd ask. Anyway it is separate from this MLP issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He Niko,
finally got through and left some comments. Looks pretty good overall, some parts may need a bit adjustment/additions. We could either chat or speak about if you want to, next week (or at the beginning of the new year).
Cheers,
Micha
- Added optimizers - Added activation functions - Removed Flatten layer - Added regression loss functions - Added validation data parameter - Added random state / seed
Hi Micha! Thanks for the review. I made corrections/additions already for most of the issues, and for a few I had a question/comment. We can have a chat if there's still further to discuss about these points, and even if not we could have another modeling meet in January |
Hi Niko 🐱👤 made some comments here. If you have questions or comments that needs to be adressed right now, we can have a shorter chat this week 🙂 Otherwise, I think it would be a good idea to meet anyway in January and maybe go through questions/stuff that arised or might be unclear. |
- Remove default for neurons parameter - Add F1-score metric for classifier - Add model data input shape checking - Make sigmoid and binary crossentropy defaults - Adjust docstrings
Hi, I finished implementing the changes you requested, so should be ready for another round of review! |
y = one_hot_encode(Y_IRIS).toarray() | ||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | ||
|
||
model, history = train_MLP_classifier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added the random_seed
to the functions, but do not use it for testing. Would add it here in the function call and check the metrics against the "fixed" score, rounded by some decimals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, forgot to modify the tests, good catch 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and ideas 😉
Ready for another round 🔍 EDIT: Just noticed that the Windows test failed. Apparently it could not import the legacy solvers. I don't have time to look into this now, but can do later. |
Hehe, okay 🐱👤 Will continue tomorrow with that, since I have to leave now. Just for your planning 😉 |
Did some investigations on this. Simply change the import from I've tested this issue on Windows Conda and Docker setups and it worked fine in both environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it looks quite good overall. Tests are also running when changing the imports. If it also runs for the OSX tests, lets merge that 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐱🏍
Added an MLP implementation that uses Keras/Tensorflow.
Includes:
When PR #210 is merged, the private functions of this PR (input check and keras optimizer getter) can be moved to
model_utils
.Currently, probabilistic MLP is not implemented. This can be done in a later PR, on top of this implementation.
Cross-validation is not implemented either.
There might be some mistakes in the allowed parameters/parameter combinations, so it would be good if @msmiyels or someone else could try to spot these. Also, the exposed parameter space can be expanded or shrunk as you see fit.