Offline downloads record limit #413

nielsklazenga · 2021-06-08T08:31:38Z

I did a download for the following query, https://biocache-test.ala.org.au/occurrences/search?&q=*&fq=data_resource_uid%3Adr376&disableAllQualityFilters=true, which contained only 500,000 rows.

The same download in the Biocache, https://biocache.ala.org.au/occurrences/search?&q=*&fq=data_resource_uid%3Adr376&disableAllQualityFilters=true, gives me all 994,654 records.

The Biocache Store has slightly more records than LA Pipelines, but not that many more.

I have never done such a big download before, but I can see myself doing bigger downloads in the future. Is the lower record limit on purpose?

javier-molina · 2021-06-09T01:49:02Z

@nickdos will be able to add more details if need it but my understanding is that the limit is there for two reasons:

Some users issue a big download without actually realising if they actually need that big dataset.
Big downloads have a performance hit on the system hence it is important to have a guard like this in place to maintain service responsiveness and availability. Our second cut will include improvements in this area Implement downloads with SOLR streaming web services #367. @nielsklazenga If Implement downloads with SOLR streaming web services #367 does not allow to raise the limit to make it useful when you need it first thing that comes to mind is implement a power user role that has more allowances.

nielsklazenga · 2021-06-09T02:36:42Z

@javier-molina , I was just observing the difference with the old system. If the 500,000 record download limit is intended, that is perfectly fine. It might be good to issue a warning and not even start the download if the query yields more than 500,000 records (again, not a show-stopper).

If I am going to need a 500,000+ record download, it is for a very specific thing (all plant records from the VBA) and will not happen more than once a year, so I can make special arrangements. In future, a power user role or something with API keys might be a good idea.

nickdos · 2021-06-09T02:38:36Z

I thought the limit was going to be higher than 500,000. @peggynewman is best placed to advise on this. My understanding is we want users to be able to download the single largest dataset but not the "whole ALA". So limit needs to be something like eBird or BirdLife number of records.

javier-molina added the question Further information is requested label Jun 9, 2021

javier-molina added the post-v1.1 Not required for version 1 or v1.1 release label Jun 10, 2021

javier-molina added biocache-service Issues related to biocache service and removed post-v1.1 Not required for version 1 or v1.1 release labels Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline downloads record limit #413

Offline downloads record limit #413

nielsklazenga commented Jun 8, 2021

javier-molina commented Jun 9, 2021

nielsklazenga commented Jun 9, 2021

nickdos commented Jun 9, 2021

Offline downloads record limit #413

Offline downloads record limit #413

Comments

nielsklazenga commented Jun 8, 2021

javier-molina commented Jun 9, 2021

nielsklazenga commented Jun 9, 2021

nickdos commented Jun 9, 2021