Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added arguments to mapper. #186

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jorgecarleitao
Copy link

@jorgecarleitao jorgecarleitao commented Jan 30, 2020

This is a backward-incompatible change that modifies the signature of all mappers from (row) to (row, ...args) and Map(name, f) to Map(name, f, ...args), where args are passed by the driver to the executors.

This allows for more complex map operations where the maps depend on intermediary operations / arguments known to the driver.

I used json for serializing/deserializing the arguments, as per comments in #185. The tests are currently failing because marshaling/unmarshaling changes the argument's types (int64 -> float64). In general, I believe that we want to keep the types as part of the serialization / de-serialization, which hints that the json may not be sufficient for our needs. However. I do not have a strong opinion on this: @chrislusf , what do you think?

I decided to use "github.com/stretchr/testify/assert" for the testing, but I have no strong opinions about it. My experience is that it is easy to tell when something is not equal.

This is a backward-incompatible change that changes the signature
of all mappers from (row) to (row, ...args), and Map(name, f) to
Map(name, f, ...args) where args are passed by the driver to
the executors.

This allows for more complex map operations where the maps depend
on intermediary operations / arguments.
@chrislusf
Copy link
Owner

Thanks for the PR!

How about adding a new type of Mapper, e.g., MapperWithArgs, to avoid incompatible changes?

@jorgecarleitao
Copy link
Author

And what would be the signature of Map? Would we add a MapWithArgs also?

@chrislusf
Copy link
Owner

And what would be the signature of Map? Would we add a MapWithArgs also?

Yes, that seems cleaner.

@jorgecarleitao
Copy link
Author

jorgecarleitao commented Jan 30, 2020

Looking at the code, this seems a significant change: dependencies on Mapper's signature:

  • Map
  • RegisterMapper
  • GetMapper
  • mappers map[MapperId]MapperObject
  • type MapperObject struct
  • gleamTaskOption
  • processMapper
  • doProcessMapper

Do we create an equivalent function for each of these with *WithArgs, or do we make their arguments interface{}?

@chrislusf
Copy link
Owner

yes. a big internal change.

Actually I am also thinking to provide mapper and reducer objects that can have internal states. This is something related, and can be re-engineered together.

@jorgecarleitao
Copy link
Author

I am not sure about the side-effects of adding state to map reducers: doesn't map-reduce architecture relies on mappers being pure functions for many of the optimizations? e.g., if a mapper has a state, that state has to be shared among the executors, which requires a lock to the state on the master and all the consequent communication between executors.

I would consider your idea to deserve an issue on its own.

Coming back to the mapper arguments: I propose to add arguments to the mappers, which is the natural way to support dynamic mappers in a compiled language. Could you propose a design for this? The design I proposed is backward compatible. Are you fine with adding all new methods for "argumented" mappers?

@chrislusf
Copy link
Owner

chrislusf commented Feb 1, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants