Conversation
|
Interesting. This PR: is pretty close to the same thing. |
There was a problem hiding this comment.
Without the lock, influxdb attempts to take multiple snapshots at the same time. If you comment those lines out, rebuild influxdb, and then run it & influxdb_stress (described above) at the same time...you'll see the messages in the log.
There was a problem hiding this comment.
OK, got it, I was thinking that might be it from the title of the PR. I guess I might have put the lock inside TakeSnapshot(). This is kind of a matter of taste though -- depends on one thinks about TakeSnapshot().
|
@otoolep it was related. There were actually two spots that needed to be reset to nil. |
|
Do you know why this |
|
@jvshahid no but I think it's a separate issue. |
|
Agree, I was just wondering why it's failing to save the state in the first place. I'll merge this in a bit. |
The
TakeSnapshot()function in _vendor/raft/server.go was failing ats.stateMachine.Save()and exiting without resettings.pendingShapshot = nil. That left the raft server in an invalid state.s.stateMachine.Save()was failing with the error message "gob: encodeReflectValue: nil element", which comes from theSave()function in cluster/cluster_configuration.go. This is a separate issue.