Skip to content

Commit 73e195c

Browse files
Shanzitapmcfadin
authored andcommitted
CASSANDRA-13342: Document failure reason codes in native protocol v5 spec
1 parent f79e8b3 commit 73e195c

1 file changed

Lines changed: 79 additions & 0 deletions

File tree

doc/native_protocol_v5.spec

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1359,6 +1359,82 @@ Table of Contents
13591359
for the failure. The map is encoded starting with an [int] n
13601360
followed by n pairs of <endpoint><failure_code> where
13611361
<endpoint> is an [inetaddr] and <failure_code> is a [short].
1362+
The currently defined <failure_code> values are:
1363+
- 0x0000 UNKNOWN: the failure reason is
1364+
unknown. This code is also used when the
1365+
coordinator receives an unrecognized code
1366+
from a replica running a newer version, for
1367+
forward compatibility. Clients should handle
1368+
unknown failure codes gracefully.
1369+
- 0x0001 READ_TOO_MANY_TOMBSTONES: the read
1370+
scanned too many tombstones on the replica
1371+
and was aborted. The threshold is configured
1372+
by tombstone_failure_threshold in
1373+
cassandra.yaml (default: 100000).
1374+
- 0x0002 TIMEOUT: the replica did not respond
1375+
to the coordinator within the expected
1376+
timeout for this operation. This is a
1377+
per-replica failure reported within the
1378+
<reason_map>; it differs from Read_timeout
1379+
(0x1200) and Write_timeout (0x1100), which
1380+
indicate that the overall consistency level
1381+
could not be met.
1382+
- 0x0003 INCOMPATIBLE_SCHEMA: the replica
1383+
could not process the request because its
1384+
schema differs from the coordinator's (e.g.
1385+
a column or table referenced in the request
1386+
is not recognized). This can occur during
1387+
rolling upgrades or when schema changes have
1388+
not yet propagated to all nodes.
1389+
- 0x0004 READ_SIZE: the amount of data read
1390+
on the replica exceeded a configured size
1391+
threshold and the read was aborted. The
1392+
thresholds are local_read_size_fail_threshold
1393+
and row_index_read_size_fail_threshold in
1394+
cassandra.yaml (both disabled by default;
1395+
require read_thresholds_enabled: true).
1396+
- 0x0005 NODE_DOWN: the replica was known to
1397+
be down when the coordinator was assembling
1398+
the request. The replica was never contacted.
1399+
- 0x0006 INDEX_NOT_AVAILABLE: a secondary
1400+
index required by the query is not yet
1401+
available on the replica. This can occur
1402+
after a node restart, before the index has
1403+
completed its initial build or recovery.
1404+
- 0x0007 READ_TOO_MANY_INDEXES: the read
1405+
referenced more SSTable indexes via SAI
1406+
(Storage-Attached Indexes) than the
1407+
configured guardrail allows. The threshold
1408+
is sai_sstable_indexes_per_query_fail_threshold
1409+
in cassandra.yaml.
1410+
- 0x0008 NOT_CMS: the request was sent to a
1411+
node that is not a member of the Cluster
1412+
Metadata Service (CMS), but the operation
1413+
requires a CMS member.
1414+
- 0x0009 INVALID_ROUTING: the replica does
1415+
not own the token or token range for the
1416+
requested data according to its current
1417+
cluster metadata. This typically indicates
1418+
that the coordinator's topology view is
1419+
stale.
1420+
- 0x000A COORDINATOR_BEHIND: the coordinator's
1421+
cluster metadata is behind the replica's
1422+
(e.g. it references a table that has been
1423+
dropped, or does not yet know about a table
1424+
that has been created). The coordinator
1425+
needs to update its metadata before retrying.
1426+
- 0x000B RETRY_ON_DIFFERENT_TRANSACTION_SYSTEM:
1427+
the operation was attempted using one
1428+
transaction system (Accord or Paxos) but
1429+
must be retried using the other, because
1430+
the data is managed by or conflicts with
1431+
operations on the other system.
1432+
- 0x01F7 INDEX_BUILD_IN_PROGRESS: a secondary
1433+
index required by the query is currently
1434+
being rebuilt on the replica and cannot
1435+
serve reads until the build completes. This
1436+
is a more specific form of
1437+
INDEX_NOT_AVAILABLE (0x0006).
13621438
<data_present> is a single byte. If its value is 0, it means
13631439
the replica that was asked for data had not
13641440
responded. Otherwise, the value is != 0.
@@ -1385,6 +1461,9 @@ Table of Contents
13851461
for the failure. The map is encoded starting with an [int] n
13861462
followed by n pairs of <endpoint><failure_code> where
13871463
<endpoint> is an [inetaddr] and <failure_code> is a [short].
1464+
See the <failure_code> values defined in
1465+
Read_failure (0x1300) above. The same set of
1466+
codes applies to write failures.
13881467
<writeType> is a [string] that describes the type of the write
13891468
that failed. The value of that string can be one
13901469
of:

0 commit comments

Comments
 (0)