-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added arguments to mapper. #186
base: master
Are you sure you want to change the base?
Conversation
This is a backward-incompatible change that changes the signature of all mappers from (row) to (row, ...args), and Map(name, f) to Map(name, f, ...args) where args are passed by the driver to the executors. This allows for more complex map operations where the maps depend on intermediary operations / arguments.
Thanks for the PR! How about adding a new type of Mapper, e.g., MapperWithArgs, to avoid incompatible changes? |
And what would be the signature of |
Yes, that seems cleaner. |
Looking at the code, this seems a significant change: dependencies on Mapper's signature:
Do we create an equivalent function for each of these with |
yes. a big internal change. Actually I am also thinking to provide mapper and reducer objects that can have internal states. This is something related, and can be re-engineered together. |
I am not sure about the side-effects of adding state to map reducers: doesn't map-reduce architecture relies on mappers being pure functions for many of the optimizations? e.g., if a mapper has a state, that state has to be shared among the executors, which requires a lock to the state on the master and all the consequent communication between executors. I would consider your idea to deserve an issue on its own. Coming back to the mapper arguments: I propose to add arguments to the mappers, which is the natural way to support dynamic mappers in a compiled language. Could you propose a design for this? The design I proposed is backward compatible. Are you fine with adding all new methods for "argumented" mappers? |
Yes, your proposal looks fine, even though it adds another set of functions.
|
This is a backward-incompatible change that modifies the signature of all mappers from
(row)
to(row, ...args)
andMap(name, f)
toMap(name, f, ...args)
, whereargs
are passed by the driver to the executors.This allows for more complex map operations where the maps depend on intermediary operations / arguments known to the driver.
I used
json
for serializing/deserializing the arguments, as per comments in #185. The tests are currently failing because marshaling/unmarshaling changes the argument's types (int64
->float64
). In general, I believe that we want to keep the types as part of the serialization / de-serialization, which hints that the json may not be sufficient for our needs. However. I do not have a strong opinion on this: @chrislusf , what do you think?I decided to use
"github.com/stretchr/testify/assert"
for the testing, but I have no strong opinions about it. My experience is that it is easy to tell when something is not equal.