Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

net_plugin bugfix for privacy test corner case #10349

Merged
merged 2 commits into from
May 11, 2021

Conversation

dimas1185
Copy link
Contributor

Change Description

There is a flaky bug (appears in at least 30-50% of runs) in net_plugin that reproduces in my newly added test for feature privacy. For now I have workaround node restart to avoid it.
What happens?
topology: [Prod1, Prod2] -> Node1 -> Node2
on some point via security group removal we cause Node1 (and hence Node2) to stop receive any blocks. After a while we are adding node1 back and it starts to receive blocks again.
it getting new blocks and sends those to Node2.
Shortly it gets unlink-able block exception because of gap, sends handshake, gets notice and enters lib catchup state [sync 1 (head < lib received)].
Once in sync it sends handshake to everyone including Node2
Node2 enters head catchup state [sync 3 (head < head received)] as head/lib sent by node1 was smaller than those sent by prod1/prod2
Then it receives one of most recent blocks (probably queued earlier), gets unlinkable exception, sends handshake and after back and forth notice exchange receives lib notice. it responses with sync_request_message to it without breaking current head catchup. Issue happening here because of it is doing this while still in head catchup state.
why?
Node1 getting sync_request_message and overwrites peer_requested structure that contains range of blocks to be send. Node2 saved in its state that it wants HEAD and once it will get it it will send handshake.
However because of this interrupted LIB catchup Node1 now thinks that Node2 requires different range and since LIB < HEAD, it doesn’t send requested HEAD block.
So Node2 is waiting for few more blocks to send handshake
Node1 thinks it sent everything and waits for handshake before moving Node2 back to in sync to continue to send fresh blocks.

FIX:
Do not overwrite peer_requested structure if it contains valid range, instead extend it to have maximum end_block (either existing or newly requested). This looks easiest fix not to change protocol. Discussed with @brianjohnson5972 and @heifner

Change Type

Select ONE:

  • Documentation
  • Stability bug fix
  • Other
  • Other - special case

Testing Changes

Select ANY that apply:

  • New Tests
  • Existing Tests
  • Test Framework
  • CI System
  • Other

Consensus Changes

  • Consensus Changes

API Changes

  • API Changes

Documentation Additions

  • Documentation Additions

Copy link
Contributor

@heifner heifner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be back-ported to 2.0.x, 2.1.x

@dimas1185 dimas1185 merged commit e78eb42 into develop May 11, 2021
@dimas1185 dimas1185 deleted the net_plugin-privacy-bugfix branch May 11, 2021 15:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants