Python client 19.2.1: undocumented string hosts entries are parsed unsafely with strtok on a Python Unicode buffer
Summary
The Python client documentation shows hosts as tuples like [("127.0.0.1", 3000)], but the constructor also accepts bare string entries such as "host:3000". That string path is parsed unsafely: it casts away const from PyUnicode_AsUTF8(py_host) and passes the resulting pointer to strtok().
This means the client silently accepts an undocumented config shape and parses it via mutation of Python-owned string storage. Even when it appears to work, that path is not safe.
Environment
- Aerospike Python client 19.2.1
- Reproduced while investigating startup behavior on Debian Bookworm with Python 3.11.2
- Also relevant on Debian Trixie with Python 3.13.5 because the parser logic is in the shared Python extension code
Minimal Reproduction
from aerospike import Client
conf = {
"hosts": ["seed1.example:3000"],
}
client = Client(conf)
Expected Behavior
One of these should happen:
- only the documented tuple form should be accepted, with a clear validation error for strings
- or string entries should be explicitly documented and parsed safely without mutating Python-owned buffers
Actual Behavior
- The undocumented string form is silently accepted.
- The parser calls
strtok((char *)PyUnicode_AsUTF8(py_host), ":").
- It then reparses from a duplicate string and uses
atoi() for the port with no strict validation.
- IPv6-style addresses are inherently broken by colon-splitting.
Impact
- Undefined behavior from mutating the buffer returned by
PyUnicode_AsUTF8().
- Silent acceptance of an undocumented config shape.
- Weak validation of the port component.
- Hard-to-debug, environment-sensitive behavior when users rely on
"host:port" shorthand.
Technical Analysis
The documented and tuple-based path is straightforward: the constructor reads a tuple, copies the address string, and requires the port to be an integer.
The string path is materially different:
- it accepts a bare Unicode string instead of a tuple
- it casts away
const from PyUnicode_AsUTF8()
- it tokenizes that pointer with
strtok()
- it then tokenizes a duplicated copy again
- it uses
atoi() for the port without strict syntax checking
That makes the string shorthand qualitatively less safe than the documented tuple form.
Relevant Source Locations
Verified against the extracted 19.2.1 source tree.
src/main/aerospike.c:51-57 documents hosts as tuple entries like [("127.0.0.1", 3000)].
src/main/client/type.c:758-784 shows the tuple path, including integer validation for the port.
src/main/client/type.c:787-794 shows the undocumented string path and the strtok((char *)PyUnicode_AsUTF8(py_host), ":") call.
src/main/client/type.c:795-806 adds the parsed host to the config if any address string was produced.
Suggested Fixes
- Reject bare string entries in
hosts with a clear error message, keeping only the documented tuple form.
- Or, if string entries are intended to be supported, parse them from a copied buffer without mutating Python-owned storage.
- Replace
atoi() with strict validation.
- Explicitly define whether IPv6 literals are supported in this input form.
Likely Fix Scope
- Primary fix surface is the Python wrapper constructor in
src/main/client/type.c.
- The lowest-risk fix is to reject bare string
hosts entries outright and enforce the documented tuple form.
- If compatibility requires keeping string support, the parser should be rewritten locally in that same constructor path to operate on copied buffers, validate ports strictly, and define IPv6 behavior explicitly.
- Risk is low if the undocumented shorthand is rejected, and low to medium if compatibility parsing is retained.
- The most useful regression tests would cover documented tuple inputs, invalid string inputs, port validation failures, and any explicitly supported IPv6 forms.
Notes
We stumbled upon this bug during a search for the cause of a mysterious slow-down on client start-up which we eventually traced to some problems with the shared memory feature. I will post separate issues for that.
Python client 19.2.1: undocumented string
hostsentries are parsed unsafely withstrtokon a Python Unicode bufferSummary
The Python client documentation shows
hostsas tuples like[("127.0.0.1", 3000)], but the constructor also accepts bare string entries such as"host:3000". That string path is parsed unsafely: it casts awayconstfromPyUnicode_AsUTF8(py_host)and passes the resulting pointer tostrtok().This means the client silently accepts an undocumented config shape and parses it via mutation of Python-owned string storage. Even when it appears to work, that path is not safe.
Environment
Minimal Reproduction
Expected Behavior
One of these should happen:
Actual Behavior
strtok((char *)PyUnicode_AsUTF8(py_host), ":").atoi()for the port with no strict validation.Impact
PyUnicode_AsUTF8()."host:port"shorthand.Technical Analysis
The documented and tuple-based path is straightforward: the constructor reads a tuple, copies the address string, and requires the port to be an integer.
The string path is materially different:
constfromPyUnicode_AsUTF8()strtok()atoi()for the port without strict syntax checkingThat makes the string shorthand qualitatively less safe than the documented tuple form.
Relevant Source Locations
Verified against the extracted 19.2.1 source tree.
src/main/aerospike.c:51-57documentshostsas tuple entries like[("127.0.0.1", 3000)].src/main/client/type.c:758-784shows the tuple path, including integer validation for the port.src/main/client/type.c:787-794shows the undocumented string path and thestrtok((char *)PyUnicode_AsUTF8(py_host), ":")call.src/main/client/type.c:795-806adds the parsed host to the config if any address string was produced.Suggested Fixes
hostswith a clear error message, keeping only the documented tuple form.atoi()with strict validation.Likely Fix Scope
src/main/client/type.c.hostsentries outright and enforce the documented tuple form.Notes
We stumbled upon this bug during a search for the cause of a mysterious slow-down on client start-up which we eventually traced to some problems with the shared memory feature. I will post separate issues for that.