-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HANA source connector with Incrementing mode is missing some messages #106
Comments
@srkpers I may be wrong but I suspect the problem occurs because the timestamp values are not strictly incrementing and may have duplicates. When a series of records with the same timestamp values may be inserted into the source table not at once. In that case, the first polls by the connector may fetch n records whose last timestamp value is inc_col = t, and the following fetch with the where-clause where inc_col > t will miss the remaining records with timestamp inc_col = t. If this is indeed what is happening and the incrementing column's values are not strictly increasing, we could think about the following options.
|
@elakito |
Each poll by the source connector will update its incrementing boundary value. Therefore, if there are some records inserted to the table that have older timestamp values, those records won't be read. So, none of the three options will work for such source table. In other words, if you don't have a column that has values monotonically increasing with the physical time, you cannot use the incrementing mode. Your option would be #105, which will be updated with more info. |
@elakito |
@srkpers Maybe the above fix regarding the incremental query using timestamp values has solved this problem. Could you try it again? |
@elakito |
@srkpers The mentioned problem affects the timestamp based incremental queries in general and it definitely affected your scenario as well unless your system's timezone was set to UTC. But since you also mentioned that you observed the problem when using a plain sequencing column, there could be another cause how some records are missing. |
We have run several tests to replicate messages from HANA table into Kafka topic using HANA source connector by using Incrementing column (Timestamp based with microsecond precision). At random we are noticing that the number of rows in HANA table is not matching with the number of messages in the Kafka topic. Over a period of time when more rows are inserted in HANA the difference with messages in topic is increasing.
It appears the select statement which the connector is running to fetch data from HANA table has some issue and it is skipping some rows. Not sure where the exact issue is.
For this testing we are using a HANA table which has 22 partitions and has more than 4 billion rows. We are creating the connector offset ahead of time before launching the connector so we get the messages from a certain date/timestamp onwards or else it will start replicating the entire table.
When there is no activity or very low activity the rows in HANA and messages in topic match but over a period of time when there is more activity there is discrepancy in the count.
We tested with 22 kafka partitions, 22 tasks and in another test used just 1 partition and 1 task. Basically tried multiple combinations with different tasks, partitions, polling internal, batch max rows etc but the issue is still there.
Any input on what can be done?
The text was updated successfully, but these errors were encountered: