-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plan on adding some support for regular expressions to the language? #241
Comments
After time https://github.com/google/starlark-go/tree/wip-skylark-time
I imagine the basic API would look something like the Go
We might also want variants that return the indices (not text) of each match. |
Any update on this, yet? |
Sorry, no. |
For what it's worth, I stumbled on https://godoc.org/github.com/qri-io/starlib/re |
@b5 What about this one? Ready for it? :-) |
I think this would be great to get to attack next given the roll we're on with the
My answers to these questions:
That said, I'd love to get input from others. Would a @alandonovan, would you accept a package if we could get one to pass muster? What's your bandwidth like for either review and/or design input? |
I actually implemented this for the Java version using re2. It works really well for our use case. You can see it here:
It exposed a bug in re2j implementation (PR is coming soon-ish). |
Interesting, thanks. It's funny, I wrote RE2/J a decade back for use in the regexp package of a Google in-house embedded configuration language that had implementations in multiple languages and needed a consistent regexp interface. That description now applies equally to Starlark, and it's using RE2/J for the same reason. BTW, although RE2/J is asymptotically faster than j.u.regex, in practice it's usually significantly slower. Not that that really matters here. It appear that you are using Starlark as a "secure" embedded language within your VGS product. By design, Starlark is hermetic in many ways, but we've never claimed that it is secure for running untrusted code. Scripts can easily cause denial of service by exhausting all memory, or by hash flooding. Also, RE2/J can be induced to run out of stack when parsing a deep expression such as "(((...(((". (The Go version is somewhat less susceptible to such attacks because Go stacks can grow 100x larger than JVM threads.) |
Yes, I'm aware of this (actually noticed this discussion re: hash flooding in the Fnv32 hash function discussions in the bytes PR :D). We've enlisted Trail of Bits for a code audit (and we'll be happy to contribute whatever findings upstream) and what you've mentioned so far was on our roadmap. This isn't in production yet, not until we've checked all the bells and whistles, but it's actually a pretty neat idea if it works for what we want to use it for.
I was not aware of this problem, that's good to know, thank you! |
@b5 please, let me know if you cannot work on this feature because if so, no worries, I can do it alone this time 😄 |
@essobedo sounds great! I'd be delighted to jump on peer review if you can write the implementation! I think it'll get done much quicker that way 😊 |
@b5 If so, may I rely on what you have done for https://github.com/qri-io/starlib/commits/master/re? |
Some preliminary questions:
|
Absolutely I see two ways to get started:
Either works for me, just let me know which you'd prefer @essobedo.
Very much agreed here. The PR we file should use RE2 syntax and link to current spec in package documentation.
Noted. Bonus points if we can provide a clear error when a user attempts to construct a regex with
I think we should go the interpreted language route, at least at first. It's a non-breaking change to add regex compilation later. Python's My question for you @adonovan: can we compile expressions on
I need to do a little more digging on this & report back.
Noted! Another thing to consider long-term is a cache of compiled regular expressions, my question here is more about where that should be stored, and establishing a policy on how packages should handle something like a cache. I'm assuming this should be stored on the starlark thread. I don't think this is needed for an initial implementation, but would be great to get some initial guidance |
The way #2 would be great if it is possible for you |
I'm not sure I understand what you're getting at. Patterns are immutable, so their freeze method is a no-op. And compilation can fail, so you need to report the error immediately. Also, you can compile and run a pattern in the same module, before freeze comes into play. |
Your confusion answers my question 😄. I was having trouble building a mental model of the point where compilation is happening |
Before moving forward, I think we should double check that this is the API we'd like to target for an initial package. This is @adonovan's suggested API, and the one I think we should go with:
I made a mistake in my earlier comment regarding |
This has many parts:
One can definitely implement the first 2 with minimum friction, but as I find Starlark's strength to be in its Turing-incompleteness guarantees I am wondering if this was / is being discussed as an addition to the language.
The text was updated successfully, but these errors were encountered: