Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecordDeletor accepts HTML files as key sources #243

Open
ansell opened this issue Apr 11, 2018 · 0 comments
Open

RecordDeletor accepts HTML files as key sources #243

ansell opened this issue Apr 11, 2018 · 0 comments

Comments

@ansell
Copy link
Contributor

ansell commented Apr 11, 2018

Yesterday, there was an instance of RecordDeletor accepting an HTML file caused by a misspecification of the input source. This caused 334 records to be erroneously deleted, and there is no log indicating which records were affected.

Given the disastrous results of an invalid key file being used, RecordDeletor must be fixed to perform a sanity check on complete file before processing any delete instructions.

2018-04-11 15:48:03,915 INFO : [RecordDeletor] - Using file name: https://www.dropbox.com/s/redacted/redacted.csv?dl=0
2018-04-11 15:48:03,916 INFO : [RecordDeletor] - downloading remote file.. https://www.dropbox.com/s/redacted/redacted.csv?dl=0
2018-04-11 15:48:05,544 INFO : [RecordDeletor] - Creating file: /data/tmp/delete_row_key_file.csv
2018-04-11 15:48:06,558 INFO : [RecordDeletor] - Deleting ID : <!DOCTYPE html><html xml:lang="en" class="maestro" xmlns="http://www.w4.org/1999/xhtml"><head><script nonce="6btDux7OucsAG9RXvyNs">
2018-04-11 15:48:06,587 WARN : [OccurrenceDAO] - Unable to find record in occurrence store with uuid: <!DOCTYPE html><html xml:lang="en" class="maestro" xmlns="http://www.w4.org/1999/xhtml"><head><script nonce="6btDux7OucsAG9RXvyNs">
..... [Very worrying lines where everything is passed to Cassandra in a query without any verification before the final line confirms the worst case scenario that cassandra actually had 334 records deleted as a result of processing the HTML file as a key file]
2018-04-11 15:48:08,278 INFO : [RecordDeletor] - Records deleted from index : 334
ansell added a commit that referenced this issue Apr 12, 2018
Initial attempt to fix the lack of checks in FileDelete for invalid files, such as error pages when people send in URLs, instead of pushing all of the lines through to the database and hoping they all error out.

Signed-off-by: Peter Ansell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants