Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS in parallel mode: log messages in child processes swallowed #67

Open
dongbohu opened this issue Feb 24, 2020 · 2 comments
Open

macOS in parallel mode: log messages in child processes swallowed #67

dongbohu opened this issue Feb 24, 2020 · 2 comments

Comments

@dongbohu
Copy link
Contributor

dongbohu commented Feb 24, 2020

On a macOS test box (Mojave, 10.14.6, running Python 3.8.1), when parallel data processing is enabled by -parallel n option (in which n > 0), most of the log messages generated by child processes disappeared from the main console.

For example, when running the test case in examples/gwas/gwas.yml, here is the output generated by cimr processor -process -yaml-file gwas.yml -chunksize 500 -parallel 4:

[2020-02-24 17:00:03 root:INFO    ]  processing .../cimr/examples/gwas/gwas.yml
[2020-02-24 17:00:03 root:INFO    ]  file exists in submitted_data/gwas/
[2020-02-24 17:00:03 root:INFO    ]  md5sum verified.
[2020-02-24 17:00:03 root:INFO    ]  following columns will be renamed: 
[2020-02-24 17:00:03 root:INFO    ]  dict_keys(['panel_variant_id', 'chromosome', 'position', 'variant_id', 'effect_allele', 'non_effect_allele', 'effect_size', 'standard_error', 'zscore', 'pvalue', 'imputation_status'])
[2020-02-24 17:00:03 root:INFO    ]  making dir: processed_data/gwas/
[2020-02-24 17:00:03 root:INFO    ]  found submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:03 root:INFO    ]  loading submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:03 root:INFO    ]  chunksize: 0.0005 million
[2020-02-24 17:00:03 root:INFO    ]  number of parallel processes: 4
WARNING:root: chunk #2: chromosome id needs to be checked.
WARNING:root: chunk #1: chromosome id needs to be checked.
WARNING:root: chunk #3: chromosome id needs to be checked.
WARNING:root: chunk #4: chromosome id needs to be checked.
WARNING:root: chunk #5: chromosome id needs to be checked.
[2020-02-24 17:00:04 root:INFO    ]  combine chunk output files ...
[2020-02-24 17:00:04 root:INFO    ]  finished processing submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:04 root:INFO    ]  output has been saved as processed_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:04 root:INFO    ]  ------------------------------------------------------------

[2020-02-24 17:00:04 root:INFO    ]  reading a local catalog file catalog.txt.
[2020-02-24 17:00:04 root:INFO    ]  context_id is not provided.
[2020-02-24 17:00:04 root:INFO    ]  updating catalog.txt for submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:00:04 root:INFO    ]  ------------------------------------------------------------

Only five lines of log messages from child processes were shown above:

WARNING:root: chunk #2: chromosome id needs to be checked.
WARNING:root: chunk #1: chromosome id needs to be checked.
WARNING:root: chunk #3: chromosome id needs to be checked.
WARNING:root: chunk #4: chromosome id needs to be checked.
WARNING:root: chunk #5: chromosome id needs to be checked.

But if we change the command to non-parallel mode:
cimr processor -process -yaml-file gwas.yml -chunksize 500 -parallel 0
then the output log messages became:

[2020-02-24 17:02:40 root:INFO    ]  processing .../cimr/examples/gwas/gwas.yml
[2020-02-24 17:02:40 root:INFO    ]  file exists in submitted_data/gwas/
[2020-02-24 17:02:40 root:INFO    ]  md5sum verified.
[2020-02-24 17:02:40 root:INFO    ]  following columns will be renamed: 
[2020-02-24 17:02:40 root:INFO    ]  dict_keys(['panel_variant_id', 'chromosome', 'position', 'variant_id', 'effect_allele', 'non_effect_allele', 'effect_size', 'standard_error', 'zscore', 'pvalue', 'imputation_status'])
[2020-02-24 17:02:40 root:INFO    ]  making dir: processed_data/gwas/
[2020-02-24 17:02:40 root:INFO    ]  found submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  loading submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  chunksize: 0.0005 million
[2020-02-24 17:02:40 root:INFO    ]  number of parallel processes: 0
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: chromosome(s) included: ['chr1' 'chr10' 'chr11' 'chr12' 'chr13']
[2020-02-24 17:02:40 root:WARNING ]  chunk #1: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 426
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #1: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: chromosome(s) included: ['chr14' 'chr15' 'chr16' 'chr17' 'chr18']
[2020-02-24 17:02:40 root:WARNING ]  chunk #2: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 434
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #2: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: chromosome(s) included: ['chr19' 'chr2' 'chr20' 'chr21' 'chr22']
[2020-02-24 17:02:40 root:WARNING ]  chunk #3: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 441
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #3: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: chromosome(s) included: ['chr3' 'chr4' 'chr5' 'chr6' 'chr7']
[2020-02-24 17:02:40 root:WARNING ]  chunk #4: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 500
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 428
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #4: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: start processing ...
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: renaming columns based on column dictionary.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: chromosome(s) included: ['chr8' 'chr9']
[2020-02-24 17:02:40 root:WARNING ]  chunk #5: chromosome id needs to be checked.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: finished checking chromosome ids.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: variant_id has been standardized.
[2020-02-24 17:02:40 root:INFO    ]  variant counts before filtering palindromic variants: 200
[2020-02-24 17:02:40 root:INFO    ]  variant counts after filtering palindromic variants: 167
[2020-02-24 17:02:40 root:INFO    ]  data contains z-score column.
[2020-02-24 17:02:40 root:INFO    ]  pvalue is numeric.
[2020-02-24 17:02:40 root:INFO    ]  zscore is numeric.
[2020-02-24 17:02:40 root:INFO    ]  effect_size is numeric.
[2020-02-24 17:02:40 root:INFO    ]  standard_error is numeric.
[2020-02-24 17:02:40 root:INFO    ]  pvalue only contains values between 0 and 1.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: dropping duplicate columns.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: reordering processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: writing processed data.
[2020-02-24 17:02:40 root:INFO    ]  chunk #5: ---------------------- DONE ----------------------
[2020-02-24 17:02:40 root:INFO    ]  finished processing submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  output has been saved as processed_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  ------------------------------------------------------------

[2020-02-24 17:02:40 root:INFO    ]  reading a local catalog file catalog.txt.
[2020-02-24 17:02:40 root:INFO    ]  context_id is not provided.
[2020-02-24 17:02:40 root:INFO    ]  updating catalog.txt for submitted_data/gwas/gwas.txt.gz.
[2020-02-24 17:02:40 root:INFO    ]  ------------------------------------------------------------

All lines from:

[2020-02-24 17:02:40 root:INFO    ]  chunk #1: start processing ...

until:

[2020-02-24 17:02:40 root:INFO    ]  finished processing submitted_data/gwas/gwas.txt.gz

were generated by child processes.

This issue doesn't happen on Linux boxes (Ubuntu and Debian, Python 3.6.x, 3,7,x and 3.8.1).

@dongbohu
Copy link
Contributor Author

This article discusses logging in multiprocessing module: https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes
It seems to make sense that logging in child processes is independent from logging of parent process. But why does this only happen on macOS?

@dongbohu dongbohu changed the title Log messages garbled on macOS in parallel mode macOS in parallel mode: log messages in child processes swallowed Feb 25, 2020
@dongbohu
Copy link
Contributor Author

This issue is also confirmed on macOS 10.13.6 (High Sierra) running Python 3.8.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant