Ransomware production version #1176

elishahaim · 2023-09-07T16:34:53Z

In this version, we updated the ransomware detection pipeline to the production environment

copy-pr-bot · 2023-09-07T16:34:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

mdemoret-nv

Below are some general comments that need to be fixed:

The .idea directory and all of its contents should not be included in the PR. If we need to add to the .gitignore to prevent this in the future, we can.
Many find/replace errors changing common -> common2 need to be reverted.
The example changed the input source from a file to Kafka but the README was not updated to describe setting up Kafka and seeding the data into the message broker
CI needs to be passing

In addition, @bsuryadevara can you review the changes to the ransomware example stages?

mdemoret-nv · 2023-09-19T17:58:52Z

ci/scripts/gitutils.py

@@ -404,7 +404,7 @@ def get_merge_target():

 def determine_merge_commit(current_branch="HEAD"):
    """
-    When running outside of CI, this will estimate the target merge commit hash of `current_branch` by finding a common
+    When running outside of CI, this will estimate the target merge commit hash of `current_branch` by finding a common2


Find/replace error

mdemoret-nv · 2023-09-19T17:58:56Z

ci/scripts/gitutils.py

@@ -416,7 +416,7 @@ def determine_merge_commit(current_branch="HEAD"):
    Returns
    -------
    str
-        The common commit hash ID
+        The common2 commit hash ID


Find/replace error

mdemoret-nv · 2023-09-19T18:01:04Z

examples/ransomware_detection/README.md

@@ -43,7 +43,7 @@ docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/model
   tritonserver --model-repository=/models/triton-model-repo \
                --exit-on-error=false \
                --model-control-mode=explicit \
-                --load-model ransomw-model-short-rf
+                --load-model ransomware_model_tl


All of our models use hyphens instead of underscores. Can you rename the model ransomware-model-tl?

mdemoret-nv · 2023-09-19T18:01:30Z

examples/ransomware_detection/common/__init__.py

-# SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.


Why was this copyright removed?

mdemoret-nv · 2023-09-19T18:03:38Z

examples/ransomware_detection/models/ransomware_model_tl/mlflow-meta.json

@@ -0,0 +1,6 @@
+{


Question: What is this file used for?

@bsuryadevara I found this file in the previous model. Do you know if we need it?

@elishahaim It is generated by MLFlow not needed

mdemoret-nv · 2023-09-19T18:38:15Z

examples/ransomware_detection/stages/create_features.py

@@ -60,7 +60,7 @@ def __init__(
        n_workers: int = 2,
        threads_per_worker: int = 2,
    ):
-        self._client = Client(threads_per_worker=threads_per_worker, n_workers=n_workers)
+        # self._client = Client(threads_per_worker=threads_per_worker, n_workers=n_workers)


Excess comment should be removed.

mdemoret-nv · 2023-09-19T18:38:25Z

examples/ransomware_detection/stages/create_features.py

-
-        extract_func = self._fe.extract_features
-        combine_func = FeatureExtractor.combine_features
+        df['PID_Process'] = df.PID.astype(str) + '_'# + df.Process


Excess comment

@elishahaim do we need concatenate _ to PID_Process even though the process is being commented out?

I need to verify it, if we really need the underscore

@bsuryadevara I checked it, and we can remove it:
df['PID_Process'] = df.PID.astype(str)

mdemoret-nv · 2023-09-19T18:38:43Z

examples/ransomware_detection/stages/create_features.py

        # Close dask client when pipeline initiates shutdown
-        self._client.close()
+        pass
+        # self._client.close()


More excess comments

Can you please remove the on_completed function as we are not using Dask for the create_features stage?

mdemoret-nv · 2023-09-19T18:39:01Z

examples/ransomware_detection/stages/preprocessing.py

    """
-    This class extends PreprocessBaseStage and process the features that are derived from Appshield data.
+    This class extends PreprocessBaseStage and process the features that aree derived from Appshield data.


mdemoret-nv · 2023-09-19T18:40:03Z

examples/ransomware_detection/stages/preprocessing.py

+                                                 offset=0,
+                                                 count=snapshot_df_size)
+        current_time = datetime.datetime.now()
+        print(f"Preprocessing snapshot sequence: {sequence} is completed at time: {current_time.strftime('%Y-%m-%d %H:%M:%S.%f')}")


Use the logging module instead of print()

mdemoret-nv · 2023-09-19T18:41:12Z

/ok to test

bsuryadevara · 2023-09-22T14:44:18Z

Below are some general comments that need to be fixed:

The .idea directory and all of its contents should not be included in the PR. If we need to add to the .gitignore to prevent this in the future, we can.

Many find/replace errors changing common -> common2 need to be reverted.

The example changed the input source from a file to Kafka but the README was not updated to describe setting up Kafka and seeding the data into the message broker

CI needs to be passing

In addition, @bsuryadevara can you review the changes to the ransomware example stages?

@mdemoret-nv sure, will review

mdemoret-nv · 2023-09-25T16:15:32Z

If the AppShieldSourceStage is no longer being used anywhere, lets consider removing it and any tests for that stage.

bsuryadevara · 2023-09-27T17:03:06Z

examples/ransomware_detection/stages/appshield_partitioner.py

+        """
+        return (MessageMeta, )
+
+    def supports_cpp_node(self):


Could you add return type annotation to all public functions?

bsuryadevara · 2023-09-27T17:10:28Z

examples/ransomware_detection/stages/appshield_partitioner.py

+                    multiple_snapshots.setdefault(source, []).append(scan_id)
+        return multiple_snapshots
+
+    def _hold_plugin_df(self, source, scan_id, plugin, plugin_df):


Would it be possible to consider adding a time parameter to retain the plugin dataframes in memory?

For instance, when the pipeline receives snapshot 1, plugin 1 processes it, and plugin 2 waits for plugin 3 to bundle its data and push it to the "create features" stage. However, in some cases, if plugin 3 fails to ingest data into the pipeline due to various reasons memory allocated to the plugin 1 and plugin 2 will not be released.

With the addition of a time parameter, we can ensure that plugin 1 and plugin 2 remain in memory only for the duration of the pipeline context, improving memory management and resource efficiency.

Hi @bsuryadevara,
I think we already removing the old snapshots.
I printed the existing snapshots in the source (dictionary - memory object), and it looks like we already clean it whenever a new snapshot is added:

We have to handled the missed sequence snapshots? For example, if snapshots s1 and s12 are received but s3 is skipped and s4 arrives instead, will the workflow retain those snapshots held in the memory.
All we have to do is, If a sequence is found to be missing, add a check condition to initiate the cleanup process for the previous snapshots that are stored in memory.

def _clean_snapshots(self, source, scan_id): scan_ids_exist = source.keys() for scan_id_exist in scan_ids_exist: if scan_id > scan_id_exist+2: del source[scan_id_exist] def _hold_plugin_df(self, source, scan_id, plugin, plugin_df): if source not in self._plugin_df_dict: self._plugin_df_dict[source] = {} source = self._plugin_df_dict[source] if scan_id not in source: source[scan_id] = {} snapshot = source[scan_id] if plugin not in snapshot: snapshot[plugin] = plugin_df else: snapshot[plugin] = pd.concat([snapshot[plugin], plugin_df]) self._clean_snapshots(source, scan_id)

bsuryadevara · 2023-09-27T17:14:00Z

examples/ransomware_detection/stages/appshield_partitioner.py

+        metas = []
+
+        for source, df in x.items():
+            # Now make a AppShieldMessageMeta with the source name
+            meta = AppShieldMessageMeta(df, source)
+            metas.append(meta)


Suggested change

metas = []

for source, df in x.items():

# Now make a AppShieldMessageMeta with the source name

meta = AppShieldMessageMeta(df, source)

metas.append(meta)

metas = [AppShieldMessageMeta(df, source) for source, df in x.items()]

bsuryadevara · 2023-09-27T17:18:16Z

examples/ransomware_detection/stages/create_features.py

-
-        extract_func = self._fe.extract_features
-        combine_func = FeatureExtractor.combine_features
+        df['PID_Process'] = df.PID.astype(str) + '_'# + df.Process


@elishahaim do we need concatenate _ to PID_Process even though the process is being commented out?

bsuryadevara · 2023-09-27T17:20:55Z

examples/ransomware_detection/stages/create_features.py

        # Close dask client when pipeline initiates shutdown
-        self._client.close()
+        pass
+        # self._client.close()


Can you please remove the on_completed function as we are not using Dask for the create_features stage?

bsuryadevara · 2023-09-27T17:24:04Z

examples/ransomware_detection/common/feature_extractor.py

        # Amount of files path in handles files
-        file_paths = x[x.Type == 'File'].Name.str.lower()
+        file_paths = x.Name.str.lower()#x[x.Type == 'File'].Name.str.lower()


Excess comment

bsuryadevara · 2023-09-27T17:26:36Z

examples/ransomware_detection/run.py

 @click.option(
    "--n_dask_workers",
-    default=6,
+    default=1,
    type=click.IntRange(min=1),
    help="Number of dask workers.",
 )
 @click.option(
    "--threads_per_dask_worker",
-    default=2,
+    default=1,
    type=click.IntRange(min=1),
    help="Number of threads per each dask worker.",
 )


Can we remove this option as Dask not being used?

bsuryadevara · 2023-09-27T17:31:42Z

examples/ransomware_detection/README.md

@@ -108,20 +110,13 @@ Options:
  --server_url TEXT               Tritonserver url  [required]
  --sliding_window INTEGER RANGE  Sliding window to be used for model input
                                  request  [x>=1]
-  --input_glob TEXT               Input glob pattern to match files to read.
+  --input_topic TEXT              Input Kafka topic for receiving the 


Could you remove Dask related options?

bsuryadevara · 2023-09-27T17:48:55Z

examples/ransomware_detection/README.md

@@ -72,10 +72,12 @@ Run the following from the `examples/ransomware_detection` directory to start th
 ```bash
 python run.py --server_url=localhost:8001 \
              --sliding_window=3 \
-              --model_name=ransomw-model-short-rf \
+              --model_name=ransomware_model_tl \


Could we include some high-level information about the model, including details about the dataset used for training? Additionally, it would be greatly beneficial to explain how to generate the dataset for training the models or specify the required data for running the inference pipeline.

Would be valuable to create a notebook that demonstrates how to train the model using a sample dataset and run it through the pipeline to showcase its ransomware capabilities.

Providing an explanation of the output structure generated by the pipeline would greatly enhance the comprehensibility of the documentation. Adding this information to the documentation would be much appreciated.

bsuryadevara · 2023-09-27T17:54:55Z

examples/ransomware_detection/stages/preprocessing.py

-                                    snapshot_ids: typing.List[int],
-                                    source_pid_process: str,
-                                    snapshot_df: pd.DataFrame):
+    def _rollover_pending_snapshots(self, source_pid_process: str, snapshots_dict):


Could we also consider adding a time duration to retain the pending rollover snapshots in memory? Without this, they will remain in memory and keep waiting for the sequence (in case if sequences were not ingested to the pipeline)

Hi @bsuryadevara ,
I added a function that checks if the scan_ids are consecutively ordered.. like: [1,2,3]...
I think that we don't have pending snapshots in the memory... (I printed them, and they were always cleaned)...

I saw in the past, that we can have an anomaly if we have old snapshots. I.e, if we restarted the pipeline but we had snapshots that we don't read from kafka... So it can be problematic...

elishahaim added 2 commits September 5, 2023 19:54

Ransomware detection model - production version

8bac072

Ransomware detection model - production version 2

3e7b749

elishahaim requested review from a team as code owners September 7, 2023 16:34

mdemoret-nv requested changes Sep 19, 2023

View reviewed changes

mdemoret-nv added non-breaking Non-breaking change improvement Improvement to existing functionality labels Sep 19, 2023

jarmak-nv assigned elishahaim Sep 20, 2023

bsuryadevara requested changes Sep 27, 2023

View reviewed changes

jarmak-nv mentioned this pull request Dec 11, 2023

[BUG]: Delay drift between kafka input data to kafka output data #1144

Open

2 tasks

Ransomware production version #1176

Are you sure you want to change the base?

Ransomware production version #1176

Conversation

elishahaim commented Sep 7, 2023

copy-pr-bot bot commented Sep 7, 2023

mdemoret-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elishahaim Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdemoret-nv commented Sep 19, 2023

bsuryadevara commented Sep 22, 2023

mdemoret-nv commented Sep 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elishahaim Oct 18, 2023 •

edited

Loading