Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Custom query validation throwing error with sql files ending with semicolon(;) #591

Merged
merged 6 commits into from
Oct 10, 2022

Conversation

celiajmd
Copy link
Contributor

Closes issue #565

  • Remove the semicolon if this semicolon is at the end of the string, or only '\n' and/or spaces is after it.

@celiajmd
Copy link
Contributor Author

/gcbrun

@@ -450,6 +450,9 @@ def get_query_from_file(self, filename):
try:
file = open(filename, "r")
query = file.read()
query_semicolon = query[query.rfind(";") :]
if set(query_semicolon).issubset({";", " ", "\n"}):
query = query[0 : query.rfind(";")]
except IOError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just do an query = query.rstrip(';')? Or would there be some functionality missing if we did that

Copy link
Contributor

@Raniksingh Raniksingh Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both rfind and rstrip does the job. If we need to use it for including complex scenarios then rfind has that flexibility, as it gets the index values which can be used later. I don't foresee any such use case though.
Happy to have more thoughts on this and agree to Neha's point to make it simpler and flexible.

For E2E : We have used 'split' function which basically splits the input query and removes semicolon.

Copy link
Contributor Author

@celiajmd celiajmd Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the second thought!

I tired query = query.rstrip(';') in the first place, but it still gives the error: google.api_core.exceptions.BadRequest: 400 Syntax error: Expected ")" but got ";" at [2:11]

The test query is:

SELECT * FROM `celiaji-dvt-dev.source.trips1` 
LIMIT 1000;

When using a ValueError to print out the query after the

            raise ValueError(
                "query is. "
                f"query is: {query}"
            )

We can see in the error message, the semicolon is still there:

raise ValueError(
ValueError: query is SELECT * FROM `celiaji-dvt-dev.source.trips1` 
LIMIT 1000;

One assumption is that the query string has multiple lines, and the semicolon only appears at the last line, so that the rstrip(';') hasn't been able to detect or delete it in this case. But, I did test rstrip(';') on the query string alone, and it works. I am not sure if it is because there are other functions in DVT which may interfere the result.

So I switched to rfind(). rfind() can make sure to find the last meaningless semicolon, by checking if the right side of this semicolon are all just spaces and enters.

I didn't choose split() is because I wanted to save the semicolon which may be meaningful in a query string, such as select * from table a where text = "abc;"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python file.read() method automatically appends a newline at the end of the string returned. Therefore, we should expect a newline and trim the newline/whitespace and semicolon accordingly:
query = query.rstrip(';\n'))

Using rstrip will ensure we are only removing trailing semicolons and not those in the query

Copy link
Collaborator

@nehanene15 nehanene15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we could make this even simpler.

@celiajmd
Copy link
Contributor Author

/gcbrun

@celiajmd celiajmd merged commit 16a89ac into develop Oct 10, 2022
@celiajmd celiajmd deleted the issue565-remove-semicolon branch October 10, 2022 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants