Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

129 add MLP #252

Merged
merged 13 commits into from
Jan 3, 2024
Merged

129 add MLP #252

merged 13 commits into from
Jan 3, 2024

Conversation

nmaarnio
Copy link
Collaborator

@nmaarnio nmaarnio commented Dec 7, 2023

Added an MLP implementation that uses Keras/Tensorflow.

Includes:

  • Separated classifier and regressor implementations
  • Checks for inputs
  • Docs
  • Tests

When PR #210 is merged, the private functions of this PR (input check and keras optimizer getter) can be moved to model_utils.

Currently, probabilistic MLP is not implemented. This can be done in a later PR, on top of this implementation.

Cross-validation is not implemented either.

There might be some mistakes in the allowed parameters/parameter combinations, so it would be good if @msmiyels or someone else could try to spot these. Also, the exposed parameter space can be expanded or shrunk as you see fit.

@nmaarnio nmaarnio linked an issue Dec 7, 2023 that may be closed by this pull request
@nmaarnio nmaarnio added modelling Features related to modelling functions enhancement New feature or request labels Dec 7, 2023
dropout_rate: Optional[Number] = None,
early_stopping: bool = True,
es_patience: int = 5,
metrics: Optional[Sequence[Literal["accuracy", "precision", "recall"]]] = ["accuracy"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add F1Score

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F1score does not seem to be listed at least here https://keras.io/api/metrics/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is one of the most used metrics for evaluation of classification problems 😉

TensorFlow-Doc F1:

For classification, I would def. go with

  • accuracy
  • precision
  • recall
  • f1 score

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was puzzled why I couldn't find in the docs since it should be a used metric. Odd that it didn't show in the site I looked, but I added it now!

neurons: Sequence[int] = [16],
validation_split: Optional[float] = 0.2,
activation: Literal["relu"] = "relu",
output_neurons: int = 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For classification, we need at least two classes, not one ?!

Copy link
Collaborator Author

@nmaarnio nmaarnio Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For binary classification, is it not one class what we want? That's what I've understood. Related to this, should we have sigmoid as the default last activation and binary_crossentropy as the loss function if we make binary classification the default mode?

Copy link
Collaborator

@msmiyels msmiyels Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my (limited) knowledge:

Using softmax requires always the desired number of classes. I.e., the number of classes you want as an output. Either binary or multiclass, does not matter here. With the softmax function, you'll get a probability for each class (2 or more). You have to check at which position the highest probability occurs (argmax) and then, depending on the position in the label data, you`re able to assign/get the correct class.

In the opposide way, using sigmoid, you need exactly one output unit (or neuron). The output is a probability (one value), which can be interpreted either to belong to the one or to the other class. You can assign the classes by rounding them up/down. If ´probability´ <= 0.5, its class zero or the first class, if probability > 0.5, it's the other.

To conclude, the one unit output is fine, but I would bring the defaults to a consistent level: either binary with softmax:2 or sigmoid:1, setting the loss to binary_crossentropy. Alternatively, keeping the loss at categorical_crossentropy, keeping the output-activation with softmax, but then set the classes to at least two (or maybe None since we do not have an idea about the users case, if multi-class).

Copy link
Collaborator Author

@nmaarnio nmaarnio Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it 👍 . Sounds like we have two valid options, I will go for the 1 output class, binary crossentropy, sigmoid combo for the defaults now if you think that's okay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is 😉

if dropout_rate is not None:
model.add(keras.layers.Dropout(dropout_rate))

model.add(keras.layers.Flatten())
Copy link
Collaborator

@msmiyels msmiyels Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Flatten is not needed, because we're not working with real images and convolution layers here. If building the sequence with Dense layers, the Flatten must be inserted right before the first Dense layer to forward propagate the data. Just proove that the input tensor is of rank 1, meaning a 1-dim array with shape (X,). If the data are already in "good" shape, no flattening is needed.

Example:

Read rasters in advance and then create stack from it:

raster_stack = np.stack((raster_1, raster_2, raster_3), axis=-1)
raster_stack = raster_stack.squeeze()

For the model sequence, either use layers.Flatten within the sequence, but you can keep the functional api-approach:

model = tf.keras.Sequential([
                            # Flattening
                            tf.keras.layers.Flatten(input_shape=raster_stack.shape),
    
                            # Dense layers
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')   
                            ])

or reshape it manually like:

reshaped = raster_stack.reshape(-1, raster_stack.shape[-1])

model = tf.keras.Sequential([
                            # Flattening
                            tf.keras.layers.Input(shape=(reshaped.shape[-1],)),
    
                            # Dense layers
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')   
                            ])

Copy link
Collaborator Author

@nmaarnio nmaarnio Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks. I think I just copied the Flatten layer from Luca's code where I saw it – perhaps it got accidentally there. I removed it now. I think we could assume that people who use EIS for scripting / notebooks can prepare their data. We could add some sort of check for the input shape if you think that's a good idea. For plugin users, we can make a reshaping/preparation function that we call in the CLI before passing data to these functions.

Copy link
Collaborator

@msmiyels msmiyels Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Think for non-advanced users, a the "shaping" should be non-visibly done under the hood. Shape errors are the most common errors in this kind of work.

def train_MLP_regressor(
X: np.ndarray,
y: np.ndarray,
neurons: Sequence[int] = [16],
Copy link
Collaborator

@msmiyels msmiyels Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's difficult to set this up with defaults. However, we're usually taking n * 2 + 1 for the number of neurons in a hidden unit as starting point, depending on the number n of evidence layers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest that we remove the default for this parameter completely? I guess at least for QGIS plugin users we could offer a choice to with this n * 2 + 1 default which would be calculated from the inputs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, let's remove and provide the necessary "dynamic defaults" within the QGIS Plugin 🐱‍👤

X,
y,
epochs=epochs,
validation_split=validation_split if validation_split else 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a validation_data parameter, that can be used if you want to provide a "fixed" validation data set for training. The TF version shuffles during each run or epoch as far as I understood it. When providing a prepared set for validation, this would not be the case, like comparable to the sklearn simple split.

Think that's an optional thing, but we could think about to integrate as an alternative.

Documentation says:
If both validation_data and validation_split are provided, validation_data will override validation_split

So should be easy to implement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good idea. I added this too. Do you think we want to include this option to provide already split validation data for the Sklearn ML models too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it now 😉 As far as I know, sk-learn does not allow explicit validation datasets within their model.fit functions. The .fit call or rather the arguments might be dependend on the method, but I did not find any parameter that takes an extra validation data set. I would say, those functions are fine as they are right now. And the .score function, which comes separately after fitting, is already fed with the test portion of the data.

Just to clarify: the .fit function name is the common agreement to call the model training process across different libraries, but it's different from each to another - so we can´t simply add the same parameters from tf into the sk-learn stuff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was just thinking that we could provide an option in the functions we offer to give a separate validation dataset. When defined, it would override calling the split_data inside _train_and_validate_sklearn_model and directly assign X_valid and y_valid from the given dataset. But I don't see a need to add this, just thought I'd ask. Anyway it is separate from this MLP issue.

Copy link
Collaborator

@msmiyels msmiyels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He Niko,

finally got through and left some comments. Looks pretty good overall, some parts may need a bit adjustment/additions. We could either chat or speak about if you want to, next week (or at the beginning of the new year).

Cheers,
Micha

- Added optimizers
- Added activation functions
- Removed Flatten layer
- Added regression loss functions
- Added validation data parameter
- Added random state / seed
@nmaarnio
Copy link
Collaborator Author

Hi Micha!

Thanks for the review. I made corrections/additions already for most of the issues, and for a few I had a question/comment. We can have a chat if there's still further to discuss about these points, and even if not we could have another modeling meet in January

@nialov nialov mentioned this pull request Dec 18, 2023
@msmiyels
Copy link
Collaborator

Hi Niko 🐱‍👤

made some comments here. If you have questions or comments that needs to be adressed right now, we can have a shorter chat this week 🙂 Otherwise, I think it would be a good idea to meet anyway in January and maybe go through questions/stuff that arised or might be unclear.

- Remove default for neurons parameter
- Add F1-score metric for classifier
- Add model data input shape checking
- Make sigmoid and binary crossentropy defaults
- Adjust docstrings
@nmaarnio
Copy link
Collaborator Author

Hi,

I finished implementing the changes you requested, so should be ready for another round of review!

y = one_hot_encode(Y_IRIS).toarray()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model, history = train_MLP_classifier(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added the random_seed to the functions, but do not use it for testing. Would add it here in the function call and check the metrics against the "fixed" score, rounded by some decimals.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, forgot to modify the tests, good catch 👍

Copy link
Collaborator

@msmiyels msmiyels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments and ideas 😉

@nmaarnio
Copy link
Collaborator Author

nmaarnio commented Dec 21, 2023

Ready for another round 🔍

EDIT: Just noticed that the Windows test failed. Apparently it could not import the legacy solvers. I don't have time to look into this now, but can do later.

@msmiyels
Copy link
Collaborator

Ready for another round 🔍

Hehe, okay 🐱‍👤 Will continue tomorrow with that, since I have to leave now. Just for your planning 😉

@msmiyels
Copy link
Collaborator

msmiyels commented Dec 22, 2023

Ready for another round 🔍

EDIT: Just noticed that the Windows test failed. Apparently it could not import the legacy solvers. I don't have time to look into this now, but can do later.

Did some investigations on this. Simply change the import from from keras.optimizers.legacy SGD, Adam, ... to from tensorflow.keras.optimizers.legacy import SGD, Adam, .... This works for both versions, with and without legacy. If you import those separately, you'll also see that the "newer" ones will be imported as a "_v2" version (but that does not matter here).

I've tested this issue on Windows Conda and Docker setups and it worked fine in both environments.

Copy link
Collaborator

@msmiyels msmiyels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it looks quite good overall. Tests are also running when changing the imports. If it also runs for the OSX tests, lets merge that 😃

Copy link
Collaborator

@msmiyels msmiyels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐱‍🏍

@nmaarnio nmaarnio merged commit 19f4d1b into master Jan 3, 2024
6 checks passed
@nmaarnio nmaarnio deleted the 129-add-MLP branch January 3, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request modelling Features related to modelling functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add MLP (multilayer perceptron)
2 participants