New Turbinia LLM analyzer, LLM lib interface and LLM lib implemntation for VertexAI #1441

sa3eed3ed · 2024-02-21T17:03:21Z

New Turbinia LLM analyzer, LLM lib interface and LLM lib implementation for VertexAI

please assign to @hacktobeer for review, he is aware of this work

New LLM lib interface
LLM lib for Vertex AI (using Gemini pro 1.0 model)
Interface can be extended or implemented for other LLM providers
New configs for Vertex AI
LLM_PROVIDER config value can be used to choose LLM provider (currently only Vertex AI)
New Job to analyze history, log and config files using LLM
New evidence type (ExportedFileArtifactLLM) for FileArtifactExtractionTask to avoid redundant procesing of artifacts between LLM analyzer and other analyzers using same artifacts
Files to analyze are extracted using FileArtifactExtractionTask, i.e. all artifacts supported by image_exporter.py are supported
Tested end to end using evidence/artifact_disk.dd

hacktobeer · 2024-02-21T17:53:59Z

Excellent @sa3eed3ed - I have assigned myself and will review before EOW.

hacktobeer

Initial pass. ptal.

pyproject.toml

turbinia/jobs/llm_artifacts_analyzer.py

turbinia/lib/llm_libs/llm_lib_base.py

turbinia/lib/llm_libs/vertex_ai_lib.py

jleaniz · 2024-02-23T15:48:54Z

Drive-by comment: could we specify a minimum version of the new dependencies in pyproject.toml with "^x.y.z" instead of "*"? That way we are less likely to run into dependency breakages down the line. There's also an open PR that will remove most GCP library dependencies from the Turbinia code base. From what I can tell, the vertexAI package only depends on google-api-core which would be kept anyway so it's not a problem.

sa3eed3ed · 2024-02-23T19:31:37Z

Drive-by comment: could we specify a minimum version of the new dependencies in pyproject.toml with "^x.y.z" instead of "*"? That way we are less likely to run into dependency breakages down the line. There's also an open PR that will remove most GCP library dependencies from the Turbinia code base. From what I can tell, the vertexAI package only depends on google-api-core which would be kept anyway so it's not a problem.

Done, added version, I thought even if google-api-core is removed from pyproject.toml the poetry.lock file will have all the deps needed by vertexAI package

jleaniz · 2024-02-23T19:37:49Z

Drive-by comment: could we specify a minimum version of the new dependencies in pyproject.toml with "^x.y.z" instead of "*"? That way we are less likely to run into dependency breakages down the line. There's also an open PR that will remove most GCP library dependencies from the Turbinia code base. From what I can tell, the vertexAI package only depends on google-api-core which would be kept anyway so it's not a problem.

Done, added version, I thought even if google-api-core is removed from pyproject.toml the poetry.lock file will have all the deps needed by vertexAI package

Yes, it will have the dependencies. My point was just to add a version, nothing else is needed. :) the core lib is included in libcloudforwnsics dependencies as well , which is already in the tonl file

hacktobeer · 2024-02-26T15:37:13Z

Thanks @sa3eed3ed. I have reviewed and tested the PR, looks pretty cool, looking forward to getting more real life results! I have no other review comments.
Example output for others following along:

* LLMAnalyzerTask (/evidence/002ef2465f6b46c1a63d2ad93c783a02/1708894370-9c15b072c9bd49f8b5e13fd04b4fbcad-FileArtifactExtractionTask/export/etc/redis/redis.conf): **Summary:** Redis configuration file contains default bind address of "0.0.0.0", allowing remote clients to connect without authentication.

* LLMAnalyzerTask (/evidence/002ef2465f6b46c1a63d2ad93c783a02/1708894292-8b20bc2016f14f16b5d5bbd8ee39b278-FileArtifactExtractionTask/export/home/dummyuser/.jupyter/jupyter_notebook_config.py): **Summary:** Jupyter Notebook server is exposed to the internet with weak security settings, allowing unauthorized access, remote code execution, and potential compromise of sensitive data.

* LLMAnalyzerTask (/evidence/002ef2465f6b46c1a63d2ad93c783a02/1708894416-d4f26a75a3124996bf90723c51c501a3-FileArtifactExtractionTask/export/etc/ssh/sshd_config): **SSH configuration allows weak ciphers, root login, password authentication, and empty passwords, posing a high security risk.**

hacktobeer · 2024-02-26T15:38:06Z

@aarontp - before I merge this can I get your opinion on the inclusion of this analyser in all triage recipes?

hacktobeer

LGTM

hacktobeer · 2024-02-26T15:40:06Z

For future ideas regarding this analyser:

bundling output reports (this is more generic and applies to other analysers as well in case we get eg disk images from GKE nodes with tons of containers)
adding/removing the analyser from any triage recipe depending on real world output results
make module configuration parameters configurable in eg recipes depending

aarontp

Cool analysis task! I just left a drive by comment about potentially consolidating at least the extraction tasks.

turbinia/workers/analysis/llm_analyzer.py

turbinia/jobs/llm_artifacts_analyzer.py

aarontp · 2024-02-27T02:31:57Z

@aarontp - before I merge this can I get your opinion on the inclusion of this analyser in all triage recipes?

Do we have any data about how long it takes to run on a typical input disk? Assuming it doesn't take too long to run, generally I would say it makes sense to include it anywhere we are including the other analysis tasks, which at the moment are not in the triage recipes as defined by the triage-* recipes here: https://github.com/google/turbinia/tree/master/turbinia/config/recipes, but we do have them in the disk related dftimewolf recipes, so we could include it in the turbinia recipes used by those (I can't remember if those disk related dftimewolf recipes are currently just using the default recipe, or if there is a dedicated recipe, but we do have a goal of making every dftimewolf recipe use a corresponding turbinia recipe this year).

berggren

Drive by comment, sorry for freelancing :)

poetry.lock

turbinia/lib/llm_libs/llm_client.py

turbinia/workers/analysis/llm_analyzer.py

hacktobeer · 2024-02-27T14:12:28Z

@aarontp - before I merge this can I get your opinion on the inclusion of this analyser in all triage recipes?

Do we have any data about how long it takes to run on a typical input disk? Assuming it doesn't take too long to run, generally I would say it makes sense to include it anywhere we are including the other analysis tasks, which at the moment are not in the triage recipes as defined by the triage-* recipes ....

It's fast, faster than plaso. FileExtraction is fast as the artifact definitions are pretty specific and VertexAI calling is fast as well. It will be done faster than the plaso task that is ran in parallel.

… to avoid redundent processing by several analyzers that process the same artifacts

…ully when config is missing, some more useful comments and docs

Co-authored-by: Johan Berggren <[email protected]>

sa3eed3ed · 2024-02-28T19:02:34Z

@aarontp - before I merge this can I get your opinion on the inclusion of this analyser in all triage recipes?

Do we have any data about how long it takes to run on a typical input disk? Assuming it doesn't take too long to run, generally I would say it makes sense to include it anywhere we are including the other analysis tasks, which at the moment are not in the triage recipes as defined by the triage-* recipes ....

It's fast, faster than plaso. FileExtraction is fast as the artifact definitions are pretty specific and VertexAI calling is fast as well. It will be done faster than the plaso task that is ran in parallel.

Removed from Triage recipes

hacktobeer · 2024-02-29T11:31:23Z

Ran local tests and looks good. One final nit.
Can you add below to the configuration template? turbinia/config/turbinia_config_tmpl.py

}, {
    'job': 'LLMAnalysisJob',
    'programs': [],
    'docker_image': None,
    'timeout': 600
}, {
    'job': 'LLMArtifactsExtractionJob',
    'programs': [],
    'docker_image': None,
    'timeout': 600

After that I'll do a final check if the e2e tests run fine and will approve/merge

sa3eed3ed · 2024-02-29T12:28:25Z

Ran local tests and looks good. One final nit. Can you add below to the configuration template? turbinia/config/turbinia_config_tmpl.py
}, {
    'job': 'LLMAnalysisJob',
    'programs': [],
    'docker_image': None,
    'timeout': 600
}, {
    'job': 'LLMArtifactsExtractionJob',
    'programs': [],
    'docker_image': None,
    'timeout': 600
After that I'll do a final check if the e2e tests run fine and will approve/merge

done, I made the timeout 3600 matching default

turbinia/turbinia/job_utils.py

Line 34 in d8c7377

timeout_default = 3600

I don't expect it to take 1 hour, but there seem to be many other jobs with longer timeouts but if you think this might be problematic feel free to amend

hacktobeer · 2024-02-29T14:07:50Z

Local e2e (with api key added) run good. I am going to approve and merge, we can tune based on real world usage results.
@sa3eed3ed Thank you very much for this awesome contribution. I am looking forward to tune this based on the results!

…on for VertexAI (google#1441) New Turbinia LLM analyzer, LLM lib interface and LLM lib implementation for VertexAI * New LLM lib interface * LLM lib for Vertex AI (using Gemini pro 1.0 model) * Interface can be extended or implemented for other LLM providers * New configs for Vertex AI * LLM_PROVIDER config value can be used to choose LLM provider (currently only Vertex AI) * New Job to analyze history, log and config files using LLM * New evidence type (ExportedFileArtifactLLM) for FileArtifactExtractionTask to avoid redundant procesing of artifacts between LLM analyzer and other analyzers using same artifacts * Files to analyze are extracted using FileArtifactExtractionTask, i.e. all artifacts supported by image_exporter.py are supported * Tested end to end using evidence/artifact_disk.dd

hacktobeer self-requested a review February 21, 2024 17:53

hacktobeer reviewed Feb 23, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

turbinia/jobs/llm_artifacts_analyzer.py Show resolved Hide resolved

turbinia/lib/llm_libs/llm_lib_base.py Outdated Show resolved Hide resolved

turbinia/lib/llm_libs/vertex_ai_lib.py Show resolved Hide resolved

sa3eed3ed requested a review from hacktobeer February 23, 2024 13:58

hacktobeer approved these changes Feb 26, 2024

View reviewed changes

aarontp reviewed Feb 27, 2024

View reviewed changes

turbinia/workers/analysis/llm_analyzer.py Show resolved Hide resolved

turbinia/jobs/llm_artifacts_analyzer.py Outdated Show resolved Hide resolved

berggren reviewed Feb 27, 2024

View reviewed changes

poetry.lock Outdated Show resolved Hide resolved

turbinia/lib/llm_libs/llm_client.py Outdated Show resolved Hide resolved

turbinia/workers/analysis/llm_analyzer.py Outdated Show resolved Hide resolved

turbinia/workers/analysis/llm_analyzer.py Show resolved Hide resolved

sa3eed3ed added 15 commits February 28, 2024 11:16

http

2fb9b20

fix imports

a817aff

fixes for comments

61d07d1

fix for comments

df3803f

fixes

82ac7ec

fixing some edge cases after more testing

b50e5d5

nit

8f2b17d

more tests

7e632b4

test for vertex lib

7697435

patch chat history

6adf69d

use seperate evidence type to path exported artifacts to LLM analyzer…

7704fd8

… to avoid redundent processing by several analyzers that process the same artifacts

nit doc change

479a400

edge cases of redundant evidence from multiple artifacts, fail gracef…

3bffcfb

…ully when config is missing, some more useful comments and docs

fix unittests Dockerfile and some nit comments changes

96a0c4d

single artifactExtractionTask for all LLM analyzer artifacts

b0b1e2d

sa3eed3ed and others added 4 commits February 28, 2024 11:21

Update turbinia/lib/llm_libs/llm_client.py

838e9f0

Co-authored-by: Johan Berggren <[email protected]>

Update turbinia/workers/analysis/llm_analyzer.py

6f6a976

Co-authored-by: Johan Berggren <[email protected]>

yapf

5b29a7c

deps after latest PR

9591827

sa3eed3ed force-pushed the master branch from 0be7c17 to 9591827 Compare February 28, 2024 13:01

removing LLMAnalyzer from Triage recipes

b3aabc0

hacktobeer self-requested a review February 29, 2024 10:01

llm configs

c7b46ed

hacktobeer merged commit dbfe4cb into google:master Feb 29, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Turbinia LLM analyzer, LLM lib interface and LLM lib implemntation for VertexAI #1441

New Turbinia LLM analyzer, LLM lib interface and LLM lib implemntation for VertexAI #1441

sa3eed3ed commented Feb 21, 2024 •

edited

Loading

hacktobeer commented Feb 21, 2024

hacktobeer left a comment

jleaniz commented Feb 23, 2024

sa3eed3ed commented Feb 23, 2024

jleaniz commented Feb 23, 2024

hacktobeer commented Feb 26, 2024

hacktobeer commented Feb 26, 2024

hacktobeer left a comment

hacktobeer commented Feb 26, 2024

aarontp left a comment

aarontp commented Feb 27, 2024

berggren left a comment

hacktobeer commented Feb 27, 2024

sa3eed3ed commented Feb 28, 2024

hacktobeer commented Feb 29, 2024

sa3eed3ed commented Feb 29, 2024

hacktobeer commented Feb 29, 2024

New Turbinia LLM analyzer, LLM lib interface and LLM lib implemntation for VertexAI #1441

New Turbinia LLM analyzer, LLM lib interface and LLM lib implemntation for VertexAI #1441

Conversation

sa3eed3ed commented Feb 21, 2024 • edited Loading

hacktobeer commented Feb 21, 2024

hacktobeer left a comment

Choose a reason for hiding this comment

jleaniz commented Feb 23, 2024

sa3eed3ed commented Feb 23, 2024

jleaniz commented Feb 23, 2024

hacktobeer commented Feb 26, 2024

hacktobeer commented Feb 26, 2024

hacktobeer left a comment

Choose a reason for hiding this comment

hacktobeer commented Feb 26, 2024

aarontp left a comment

Choose a reason for hiding this comment

aarontp commented Feb 27, 2024

berggren left a comment

Choose a reason for hiding this comment

hacktobeer commented Feb 27, 2024

sa3eed3ed commented Feb 28, 2024

hacktobeer commented Feb 29, 2024

sa3eed3ed commented Feb 29, 2024

hacktobeer commented Feb 29, 2024

sa3eed3ed commented Feb 21, 2024 •

edited

Loading