Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: vector memory exhausted (limit reached?) #51

Open
intheravine opened this issue Dec 14, 2018 · 3 comments
Open

Error: vector memory exhausted (limit reached?) #51

intheravine opened this issue Dec 14, 2018 · 3 comments

Comments

@intheravine
Copy link

Error: vector memory exhausted (limit reached?)

I’m getting the above error when trying to stringdist_left_join two tables - the left table is 185K rows and the right table is 4.37M rows. The R session never appears to use more than 6GB of memory (according to Activity Monitor) while I’m on a machine with 32GB of memory with available memory in the range of 10GB when the vector memory exhausted error arises. I’ve followed various recommendations to increase R_MAX_VSIZE to a large number - 700GB as shown in the Sys.getenv() output shown below. All this to say it appears that stringdist_left_join does not pay attention to R_MAX_VSIZE. Is there some other setting I can change to use more of the available memory on my machine?

Sys.getenv()

Apple_PubSub_Socket_Render          /private/tmp/com.apple.launchd.sSrL33I64Z/Render
COLUMNS                             80
COMMAND_MODE                        unix2003
DISPLAY                             /private/tmp/com.apple.launchd.tTt2eLd6xQ/org.macosforge.xquartz:0
DYLD_FALLBACK_LIBRARY_PATH          /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre/lib/server
DYLD_LIBRARY_PATH                   /Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre/lib/server
EDITOR                              vi
HOME                                /Users/geoffreysnyder
LD_LIBRARY_PATH                     :@JAVA_LD@
LINES                               24
LN_S                                ln -s
LOGNAME                             geoffreysnyder
MAKE                                make
PAGER                               /usr/bin/less
PATH                                /usr/local/bin:/usr/local/mysql/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:~/Library/Python/3.7/bin
PWD                                 /Users/geoffreysnyder/repos/Data_Load/code
R_ARCH                              
R_BROWSER                           /usr/bin/open
R_BZIPCMD                           /usr/bin/bzip2
R_DOC_DIR                           /Library/Frameworks/R.framework/Resources/doc
R_GZIPCMD                           /usr/bin/gzip
R_HOME                              /Library/Frameworks/R.framework/Resources
R_INCLUDE_DIR                       /Library/Frameworks/R.framework/Resources/include
R_LIBS_SITE                         
R_LIBS_USER                         ~/Library/R/3.5/library
R_MAX_VSIZE                         700GB
R_PAPERSIZE                         a4
R_PDFVIEWER                         /usr/bin/open
R_PLATFORM                          x86_64-apple-darwin15.6.0
R_PRINTCMD                          lpr
R_QPDF                              /Library/Frameworks/R.framework/Resources/bin/qpdf
R_RD4PDF                            times,inconsolata,hyper
R_SESSION_TMPDIR                    /var/folders/xw/402kc2hc8xl82d008k8x64f00000gq/T//RtmpJdct7Y
R_SHARE_DIR                         /Library/Frameworks/R.framework/Resources/share
R_SYSTEM_ABI                        osx,gcc,gxx,gfortran,?
R_TEXI2DVICMD                       /usr/local/bin/texi2dvi
R_UNZIPCMD                          /usr/bin/unzip
R_ZIPCMD                            /usr/bin/zip
SECURITYSESSIONID                   186a8
SED                                 /usr/bin/sed
SHELL                               /bin/zsh
SHLVL                               0
SSH_AUTH_SOCK                       /private/tmp/com.apple.launchd.UNOOV1wxev/Listeners
SUBLIMEREPL_AC_IP                   127.0.0.1
SUBLIMEREPL_AC_PORT                 None
TAR                                 /usr/bin/tar
TMPDIR                              /var/folders/xw/402kc2hc8xl82d008k8x64f00000gq/T/
TZ                                  America/Los_Angeles
USER                                geoffreysnyder
XPC_FLAGS                           0x0
XPC_SERVICE_NAME                    0
__CF_USER_TEXT_ENCODING             0x1F7:0x0:0x0
sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2  RJDBC_0.2-7.1   rJava_0.9-10    DBI_1.0.0       fuzzyjoin_0.1.4 readr_1.2.0     dplyr_0.7.8    
[8] lubridate_1.7.4 stringr_1.3.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       tidyr_0.8.2      assertthat_0.2.0 R6_2.3.0         magrittr_1.5     pillar_1.2.3    
 [7] rlang_0.3.0.1    stringi_1.2.4    tools_3.5.1      glue_1.3.0       purrr_0.2.5      hms_0.4.2.9000  
[13] compiler_3.5.1   pkgconfig_2.0.2  bindr_0.1.1      tidyselect_0.2.5 tibble_1.4.2    
@markbneal
Copy link

An observation from my experience: I was doing a fuzzy join and ran out of RAM, but the largest dataframe was only 200,000 rows. I subsetted the two dataframes by a common identifier, did the fuzzy join for each subset, then looped across the list of identifiers - this worked very quickly. Maybe someone could check the efficiency of code across larger examples? I'm assuming making a reprex for big data examples is a hassle.

@aranryan
Copy link

aranryan commented Feb 2, 2021

Similar as markbneal above, I was doing my first fuzzy join and ran into a vector memory exhausted error. I was doing it through a purrr::map step, joining a dataframe with about 50,000 rows onto individual rows of a dataframe with 5,000 rows. My solution was to re-write it as a for loop.

@Erinaceida
Copy link

Very similar here, I was doing a fuzzy_join of 43MB file to a 68KB one, and at its peak R used 12GB of ram (almost 300 times more than individual objects!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants