Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MongoDB as JobRepository #877

Open
spring-projects-issues opened this issue May 29, 2018 · 11 comments
Open

Add support for MongoDB as JobRepository #877

spring-projects-issues opened this issue May 29, 2018 · 11 comments

Comments

@spring-projects-issues
Copy link
Collaborator

Ramin Zare opened BATCH-2727 and commented

adding Support for storing Job instances in Mongo instead of JDBC 


Issue Links:

  • BATCH-2836 Add Support For MongoDB or Api To Be Implemented To Store JobRepository in Any Arbitary Store
    ("is duplicated by")
  • BATCH-1596 Support for NoSQL database persistence
@rcardin
Copy link

rcardin commented Sep 3, 2021

Hey, guys. I and some friends could be interested in implementing this feature. If so, does anyone on the Spring Batch team can help us with some hints?

@fmbenhassine
Copy link
Contributor

@rcardin Thank you for your interest to help! You would need to implement 4 DAO interfaces (JobInstanceDao, JobExecutionDao, StepExecutionDao, ExecutionContextDao) and a factory bean that extends AbstractJobRepositoryFactoryBean. The abstract factory bean already takes care of creating a transactional proxy (that would be based on a MongoTransactionManager) around a SimpleJobRepository with the 4 DAOs. I have created the initial stubs here while evaluating another issue (Point 5 of #3942). For the implementation logic of the DAOs, you can get inspiration from the JDBC -based ones.

You are welcome to contribute if you want. I would love to help if you need support on this!

@rcardin
Copy link

rcardin commented Sep 7, 2021

Hey @benas. First, thanks for pointing us to your stubs. They're handy. We started to look at the code, and we noticed immediately that the main business classes, such as JobExecution and JobInstance, extend a class called Entity. This class forces us to have a Long for every object.

However, as you know, Mongo doesn't generate ids of type Long, and there is no simple way to generate an autoincremented identifier. I suppose this fact is strictly related to the fact that the business models and the persistent models are collapsed into the same classes.

Do you suggest having a conservative approach, trying to overcome the problem of the generated unique identifier with some trick like these, or to have a more disruptive approach, reviewing the models?

@fmbenhassine
Copy link
Contributor

I think you are hitting an issue similar to #1317. The implications of this choice touch not only the domain model (which is designed around sequences), but also other core parts of the framework, like:

  • All APIs related to these entities are built around Long (See input parameters and return types of methods like JobExplorer#getJobExecution, JobOperator#getRunningExecutions, etc)
  • The logic of getting the last job/step executions (used in restarts) is based on max(ID)
  • All JDBC DAOs use a DataFieldMaxValueIncrementer from Spring Framework which is designed to increment the data store field's maximum value. Incrementing a value implies using a sequence, even if values are of type String for example. The point here is rather about the fact that ordering is required for the previous point to work
  • JSR-352 APIs use the long type for IDs. While we are planning to deprecate the support for JSR-352 (in Remove JSR-352 implementation #3894), we need to make sure the version that introduces the required changes here is compatible with the JSR (or delay this feature to a version in which the JSR implementation is removed).

As you can see, the impact of changing the type of Entity ID is quite substantial. While I'm not against reviewing the model, I would like to take time to evaluate such change (or see it in a fork). That said, I think trying to mimic sequences with Mongo's counter documents might work here (using a single document for all the 3 sequences won't work, but using 3 separate documents or even separate collections, one for each sequence might work). Do you want to try the conservative approach with a quick prototype? If we find any blockers, we can consider the disruptive approach. What do you think?

@rcardin
Copy link

rcardin commented Sep 8, 2021

@benas, thanks for the detailed response. We completely agree with you. We will implement a prototype that mimics sequences using Mongo features.

@fmbenhassine fmbenhassine added this to the 5.0.0-M6 milestone Aug 31, 2022
@fmbenhassine fmbenhassine modified the milestones: 5.0.0-M6, 5.0.0-M7 Sep 21, 2022
@fmbenhassine fmbenhassine modified the milestones: 5.0.0-M7, 5.0.0-M8 Oct 4, 2022
@fmbenhassine fmbenhassine modified the milestones: 5.0.0-M8, 5.0.0 Oct 12, 2022
@fmbenhassine fmbenhassine modified the milestones: 5.0.0, 5.1.0 Nov 24, 2022
@fmbenhassine fmbenhassine changed the title Support Mongo for JobRepository [BATCH-2727] Add support for MongoDB as JobRepository Sep 27, 2023
@fmbenhassine fmbenhassine removed the status: waiting-for-triage Issues that we did not analyse yet label Sep 28, 2023
@fmbenhassine fmbenhassine removed this from the 5.1.0 milestone Oct 26, 2023
fmbenhassine added a commit to spring-projects-experimental/spring-batch-experimental that referenced this issue Oct 26, 2023
@fmbenhassine
Copy link
Contributor

@rcardin I managed to create a PoC based on counter documents as described in https://www.mongodb.com/blog/post/generating-globally-unique-identifiers-for-use-with-mongodb , section: "Use a single counter document to generate unique identifiers one at a time" [*].

While this avoided the need to change the type of entities to something other than Long, I noticed that the current domain model is not suitable to be persisted in a non-relational data store (due to the lack of default constructors, presence of circular dependencies like job execution <-> step execution, etc). Therefore, I believe we need a persistence model suitable for such a target store. The persistence model does not have to be the same as the domain model, and could be designed from the ground up for persistence:

  • Provide default constructors + getters/setter suitable for persistence (Java records are a good option)
  • No circular dependencies
  • For non-relational databases, the model does not have to be normalized. In fact, the job parameters and the execution context could be embedded in enclosing documents ( => no unnecessary and expensive table joins!)
  • Not all domain types should have an equivalent in the persistence model (like JobParameters)

The persistence model in the aforementioned PoC tries to cover all these points and was designed to be usable by other non-relational solutions (ie no MongoDB specific annotations (like @Transient) or APIs (like ObjectId)).

With that in place, we now need a way to convert entities from the domain model to the persistence model and vice-versa, without impacting the framework's logic. This is also done in the PoC, see the converter package.

That said, and while the PoC seems to work, I think the "disruptive approach" (ie changing the Entity ID type to something other than Long, update the ordering logic of entities based on creation date to remove the need for sequences) should work as well. I explained this here. But this is for another discussion, and definitely for a major Spring Batch version (if this option is retained).

If you are interested, I would be grateful if you could give the experimental feature a try and share your feedback! Thank you upfront.

[*]: the potential contention drawback mentioned in the disclaimer does not really apply to Spring Batch, based on the frequency in which batch jobs are typically launched.

@N4SoftwareNinja
Copy link

Hi everyone,

are there any plans to take the PoC by @fmbenhassine into the main spring batch projects soon? We would be interested to use this in production.

@Ivan-flowers
Copy link

Hey, we also would be interested in the MongoDB support for JobRepository.

@fmbenhassine are there any plans or timelines when to bring it live?

@bsreddy125
Copy link

Hello Spring team,

are there any plans to take the experimental PoCs into the main spring batch projects soon?
@fmbenhassine We would be interested to use this in production.

@fmbenhassine
Copy link
Contributor

Apologies for the late reply on this. It's great to see some interest and feedback, thank you all! I am eager to plan this feature for the next release, but I want to hear from those who tried the PoC if there are any major issues (minor issues can be fixed in subsequent patch releases).

If no major issues, then I will plan it for the upcoming 5.2 release.

@bsreddy125
Copy link

bsreddy125 commented Sep 13, 2024

@fmbenhassine everything works well, didn't see any major issues with PoC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants