Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: main CLI crashes around monitor_stage.py (24.03.01 runtime image) #1626

Closed
2 tasks done
pdmack opened this issue Apr 15, 2024 · 0 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@pdmack
Copy link
Contributor

pdmack commented Apr 15, 2024

Version

24.03.01

Which installation method(s) does this occur on?

Docker, Kubernetes

Describe the bug.

A test pipeline that has worked with previous releases of Morpheus now crashes, possibly at the monitor stage.

Minimum reproducible example

morpheus --log_level=DEBUG run --num_threads=2 --edge_buffer_size=4 --pipeline_batch_size=8196 --model_max_batch_size=32 --use_cpp=True pipeline-nlp --model_seq_length=128 --labels_file=data/labels_phishing.txt from-file --filename=/common/data/email.jsonlines monitor --description 'FromFile Rate' --smoothing=0.001 deserialize preprocess --vocab_hash_file=data/bert-base-uncased-hash.txt --truncation=True --do_lower_case=True --add_special_tokens=False monitor --description 'Preprocess Rate' inf-triton --model_name=phishing-bert-onnx --server_url=ai-engine:8000 --force_convert_inputs=True monitor --description 'Inference Rate' --smoothing=0.001 --unit inf add-class --label=is_phishing --threshold=0.7 serialize to-file --filename=/common/data/output/phishing-bert-onnx-output.jsonlines --overwrite```

Relevant log output

Click here to see error details
Parameter, 'labels_file', with relative path, 'data/labels_phishing.txt', does not exist. Using package relative location: '/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/data/labels_phishing.txt'
Configuring Pipeline via CLI
Loaded labels file. Current labels: [['not_phishing', 'is_phishing']]
Module 'FileBatcher' was successfully registered with 'morpheus' namespace.
Module 'FileToDF' was successfully registered with 'morpheus' namespace.
Module 'FilterCmFailed' was successfully registered with 'morpheus' namespace.
Module 'FilterControlMessage' was successfully registered with 'morpheus' namespace.
Module 'FilterDetections' was successfully registered with 'morpheus' namespace.
Module 'FromControlMessage' was successfully registered with 'morpheus' namespace.
Module 'MLFlowModelWriter' was successfully registered with 'morpheus' namespace.
Module 'PayloadBatcher' was successfully registered with 'morpheus' namespace.
Module 'Serialize' was successfully registered with 'morpheus' namespace.
Module 'ToControlMessage' was successfully registered with 'morpheus' namespace.
Module 'WriteToElasticsearch' was successfully registered with 'morpheus' namespace.
Module 'WriteToFile' was successfully registered with 'morpheus' namespace.
Module 'deserialize' was successfully registered with 'morpheus' namespace.
Parameter, 'vocab_hash_file', with relative path, 'data/bert-base-uncased-hash.txt', does not exist. Using package relative location: '/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/data/bert-base-uncased-hash.txt'
====Pipeline Pre-build====
====Pre-Building Segment: linear_segment_0====
====Pre-Building Segment Complete!====
====Pipeline Pre-build Complete!====
====Registering Pipeline====
Starting pipeline via CLI... Ctrl+C to Quit
====Building Pipeline====
====Building Pipeline Complete!====
====Registering Pipeline Complete!====
Config:
{
 "ae": null,
 "class_labels": [
   "not_phishing",
   "is_phishing"
 ],
 "debug": false,
 "edge_buffer_size": 4,
 "feature_length": 128,
 "fil": null,
 "log_config_file": null,
 "log_level": 10,
 "mode": "NLP",
 "model_max_batch_size": 32,
 "num_threads": 2,
 "pipeline_batch_size": 8196,
 "plugins": []
}
E20240415 16:16:06.705006    46 builder_definition.cpp:283] Exception during segment initializer. Segment name: linear_segment_0, Segment Rank: 0. Exception message:
RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr<morpheus::MultiResponseMessage>
At:
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
CPP Enabled: True
====Starting Pipeline====
E20240415 16:16:06.706972    46 service.cpp:40] Must call Service::call_in_destructor to ensure service is cleaned up before being destroyed
E20240415 16:16:06.707026    46 controller.cpp:62] exception caught while performing update - this is fatal - issuing kill
====Pipeline Started====
====Building Segment: linear_segment_0====
E20240415 16:16:06.707924    46 context.cpp:124] rank: 0; size: 1; tid: 140427135669824; fid: 0x7fb7b8040f00: set_exception issued; issuing kill to current runnable. Exception msg: RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr<morpheus::MultiResponseMessage>
At:
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
E20240415 16:16:06.707991    46 manager.cpp:87] error detected on controller
E20240415 16:16:06.708143    39 runner.cpp:189] Runner::await_join - an exception was caught while awaiting on one or more contexts/instances - rethrowing
Added source: <from-file-0; FileSourceStage(filename=/common/data/email.jsonlines, iterative=False, file_type=FileTypes.Auto, repeat=1, filter_null=True, parser_kwargs={})>
 └─> morpheus.MessageMeta
E20240415 16:16:06.708204    39 service.cpp:224] Service[pipeline::Manager]: caught exception in service_await_join: RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr<morpheus::MultiResponseMessage>
At:
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
E20240415 16:16:06.708288    39 service.cpp:224] Service[ExecutorDefinition]: caught exception in service_await_join: RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr<morpheus::MultiResponseMessage>
At:
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
 /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
Added stage: <monitor-1; MonitorStage(description=FromFile Rate, smoothing=0.001, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
 └─ morpheus.MessageMeta -> morpheus.MessageMeta
Module 'deserialize' with namespace 'morpheus' is successfully loaded.
Added stage: <deserialize-2; DeserializeStage(ensure_sliceable_index=True, message_type=<class 'morpheus.messages.multi_message.MultiMessage'>, task_type=None, task_payload=None)>
 └─ morpheus.MessageMeta -> morpheus.MultiMessage
Added stage: <preprocess-nlp-3; PreprocessNLPStage(vocab_hash_file=/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/data/bert-base-uncased-hash.txt, truncation=True, do_lower_case=True, add_special_tokens=False, stride=-1, column=data)>
 └─ morpheus.MultiMessage -> morpheus.MultiInferenceMessage
Added stage: <monitor-4; MonitorStage(description=Preprocess Rate, smoothing=0.05, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
 └─ morpheus.MultiInferenceMessage -> morpheus.MultiInferenceMessage
Added stage: <inference-5; TritonInferenceStage(model_name=phishing-bert-onnx, server_url=ai-engine:8000, force_convert_inputs=True, use_shared_memory=False, needs_logits=None, inout_mapping={}, input_mapping={}, output_mapping={})>
 └─ morpheus.MultiInferenceMessage -> morpheus.MultiResponseMessage
Exception occurred in pipeline. Rethrowing
Traceback (most recent call last):
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 405, in post_start
   await executor.join_async()
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 317, in inner_build
   stage.build(builder)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
   dep.build(builder, do_propagate=do_propagate)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
   dep.build(builder, do_propagate=do_propagate)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
   dep.build(builder, do_propagate=do_propagate)
 [Previous line repeated 3 more times]
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 391, in build
   out_ports_nodes = self._build(builder=builder, input_nodes=in_ports_nodes)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py", line 81, in _build
   return [self._build_single(builder, input_nodes[0])]
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py", line 131, in _build_single
   builder.make_edge(input_node, node)
RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr<morpheus::MultiResponseMessage>
Traceback (most recent call last):
 File "/opt/conda/envs/morpheus/bin/morpheus", line 11, in <module>
   sys.exit(run_cli())
====Pipeline Complete====
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/cli/run.py", line 20, in run_cli
   cli(obj={}, auto_envvar_prefix='MORPHEUS', show_default=True, prog_name="morpheus")
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
   return self.main(*args, **kwargs)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1078, in main
   rv = self.invoke(ctx)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
   return _process_result(sub_ctx.command.invoke(sub_ctx))
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
   return _process_result(sub_ctx.command.invoke(sub_ctx))
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1720, in invoke
   return _process_result(rv)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1657, in _process_result
   value = ctx.invoke(self._result_callback, value, **ctx.params)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 783, in invoke
   return __callback(*args, **kwargs)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
   return f(get_current_context(), *args, **kwargs)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/cli/commands.py", line 644, in post_pipeline
   pipeline.run()
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 651, in run
   asyncio.run(self.run_async())
 File "/opt/conda/envs/morpheus/lib/python3.10/asyncio/runners.py", line 44, in run
   return loop.run_until_complete(main)
 File "/opt/conda/envs/morpheus/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
   return future.result()
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 632, in run_async
   await self.join()
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 449, in join
   await self._post_start_future
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 405, in post_start
   await executor.join_async()
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 317, in inner_build
   stage.build(builder)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
   dep.build(builder, do_propagate=do_propagate)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
   dep.build(builder, do_propagate=do_propagate)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
   dep.build(builder, do_propagate=do_propagate)
 [Previous line repeated 3 more times]
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 391, in build
   out_ports_nodes = self._build(builder=builder, input_nodes=in_ports_nodes)
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py", line 81, in _build
   return [self._build_single(builder, input_nodes[0])]
 File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py", line 131, in _build_single
   builder.make_edge(input_node, node)
RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr<morpheus::MultiResponseMessage>

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

@drobison00 commented:

The error is caused by trying to create an edge between the generic python wrapper type morpheus uses and a c++ pointer. If this is the real conversion that we want you can add a c++ declaration in messages/module.cpp
Something like this, but with MultiResponseMessage:

    mrc::edge::EdgeConnector<std::shared_ptr<morpheus::MessageMeta>, mrc::pymrc::PyObjectHolder>::register_converter();
    mrc::edge::EdgeConnector<mrc::pymrc::PyObjectHolder, std::shared_ptr<morpheus::MessageMeta>>::register_converter();

Code of Conduct

  • I agree to follow Morpheus' Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@pdmack pdmack added the bug Something isn't working label Apr 15, 2024
@dagardner-nv dagardner-nv self-assigned this Apr 15, 2024
rapids-bot bot pushed a commit that referenced this issue Apr 24, 2024
* PR #659 inadvertently excluded the monitor stage from several of the end-to-end pipeline tests.
* Adds an environment variable `MORPHEUS_MONITOR_ALWAYS_ENABLED` which when set, will force the monitor stage to always be enabled.
* Adds an auto-use fixture `monitor_stage_always_enabled` which ensures the environment variable is set & present. 

Requires nv-morpheus/MRC#473 to be merged first

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Christopher Harris (https://github.com/cwharris)
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #1629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants