Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying Spark fails with insufficient max JVM memory #981

Open
jakubadamek opened this issue Mar 8, 2024 · 2 comments
Open

Deploying Spark fails with insufficient max JVM memory #981

jakubadamek opened this issue Mar 8, 2024 · 2 comments

Comments

@jakubadamek
Copy link
Collaborator

I was testing a simple developer setup for pipelines:

cd ~/gits
git clone https://github.com/google/fhir-data-pipes.git 
cd fhir-data-pipes/
chmod a+rx docker/dwh
cd docker/
docker network create cloudbuild
docker-compose  -f hapi-compose.yml up  --force-recreate -d
curl -H "Content-Type: application/json; charset=utf-8"   'http://localhost:8091/fhir/Patient' -v
cd ..
python3 -m venv venv
venv/bin/pip install -r ./synthea-hiv/uploader/requirements.txt
venv/bin/python3 synthea-hiv/uploader/main.py HAPI   http://localhost:8091/fhir --input_dir   ./synthea-hiv/sample_data_small/

This finished with some errors which I ignored: https://paste.googleplex.com/5983969268465664

docker-compose -f docker/compose-controller-spark-sql-single.yaml up --force-recreate

Finished with error due to JVM max memory: https://paste.googleplex.com/4827520718864384

Fixed by editing docker/.env
JAVA_OPTS=-Xms10g -Xmx10g

Should we make this change permanent?

@bashir2
Copy link
Collaborator

bashir2 commented Mar 8, 2024

Yes this is a known issue but we prefer not to increase the default memory significantly. To see the reasoning, please take a look at where JAVA_OPTS is set here and the comments above it. So I am guessing that you are running this on a machine with a lot of cores (how many?). We cannot change JVM memory configuration in the controller code but we can check in advance and fail early if we know that the big number of threads is going to fail the pipeline later on.

Another option is to limit the number of threads dynamically, i.e., even if numThreads is set to use all cores (e.g., here) we can override it and set it to a smaller number where the provided memory is enough. That way the pipeline won't fail but may suffer performance-wise; @chandrashekar-s WDYT about this capping change?

@chandrashekar-s
Copy link
Collaborator

chandrashekar-s commented Mar 11, 2024

Yes this is a known issue but we prefer not to increase the default memory significantly. To see the reasoning, please take a look at where JAVA_OPTS is set here and the comments above it. So I am guessing that you are running this on a machine with a lot of cores (how many?). We cannot change JVM memory configuration in the controller code but we can check in advance and fail early if we know that the big number of threads is going to fail the pipeline later on.

Another option is to limit the number of threads dynamically, i.e., even if numThreads is set to use all cores (e.g., here) we can override it and set it to a smaller number where the provided memory is enough. That way the pipeline won't fail but may suffer performance-wise; @chandrashekar-s WDYT about this capping change?

This is a good idea to dynamically set the number of cores, if the configured memory is insufficient and when the numThreads is set to use all cores (or say optimum value). This way we don't fail the application, but we should also warn the user about the same in the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants