Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In some scenarios, the ‘sendSnapshot’ does not really transfer a Snapshot #5240

Closed
developer-zc opened this issue Jan 11, 2023 · 4 comments · Fixed by #5420
Closed

In some scenarios, the ‘sendSnapshot’ does not really transfer a Snapshot #5240

developer-zc opened this issue Jan 11, 2023 · 4 comments · Fixed by #5420
Assignees
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Milestone

Comments

@developer-zc
Copy link

Describe the bug (required)
as the title

Your Environments (required)

  • nebula 3.1 ~ 3.4

How To Reproduce(required)

Steps to reproduce the behavior:
Supposed we have a space with 1 partition with 3 replica factor, partA(leader)/partB/partC.
The initial status is that Storaged which holds partC is stopped, then put some graph data into partA&partB, and some WAL that expired are cleaned .

  1. The initial status is that Storaged which holds partC is stopped, then put some graph data into partA&partB;
  2. Wait util some WAL which expired are cleaned, then stop the partA && partB, and start the partC;
  3. Start partA(or partB), the sendSnapshot operation is then triggered, the process will go smoothly, but the actual transmission is the status machine real-time data, not from a snapshot;

Underlying reason

  1. PartA just started, no log committed in the term, so RaftPart::commitInThisTerm_ is false;
  2. In NebulaStore::GetSnapshot process, NebulaStore::checkLeader will be invoked, and the result is false, because RaftPart::commitInThisTerm_ == false;
  3. The final result of NebulaStore::GetSnapshot is nullptr;

Proposal

  1. Existing NebulaStore::checkLeader logic applies for read and write operations;
  2. A new checkLeader may be required for sendSnapshot, not required for checking commitInThisTerm_.
@developer-zc developer-zc added the type/bug Type: something is unexpected label Jan 11, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Jan 11, 2023
@critical27
Copy link
Contributor

critical27 commented Jan 12, 2023

Good catch, what you mentioned is correct except one place: if GetSnapshot returns a nullptr, there maybe inconsistency between several reads, but each Seek in rocksdb is a snapshot. So maybe it is not real-time data.

@wey-gu
Copy link
Contributor

wey-gu commented Jan 17, 2023

Dear @developer-zc ,

We would like to send you a gift for the "good catch" NebulaGraph community award, would you mind sending us a mail to lisa.liu@vesoft.com with your address that can receive gift shipment?

Thanks and welcome to the community!

cc @QingZ11 @lisahui

@flymysql
Copy link
Contributor

2. commitInThisTerm_

I have a problem that may be caused by this, but how can I solve it?

#5347

@liwenhui-soul
Copy link
Contributor

yes, will fix it later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. process/fixed Process of bug severity/none Severity of bug type/bug Type: something is unexpected
5 participants