Add Portable flag for cross-platform filter serialization#49
Add Portable flag for cross-platform filter serialization#49Giulio2002 wants to merge 1 commit intoFastFilter:masterfrom
Conversation
|
Thanks. Currently, we do not have any serialization or deserialization function, thus the problem that you describe is not present in the library. Thus far, the library did not tell you on to share the data. They are simple arrays. Users who wanted to support big endian platforms (that are vanishingly rare these days) should obviously consider the issue when designing their data interchange. If you use protobuf (for example), it will handle endianness automatically. We are going to add helper functions that can be used to serialize/deserialize the data for users that do not want to roll their own, or for users that do not want to use protobuf or some existing data interchange format. The library itself does not care about endianness. |
|
We were doing it by writting the whole thing to file. every in-memory repressentation is already serialized and yes you do care about endianess, when you do some operations the underlying byte memory is ordered with respect to the CPU endianess. in anycase your PR is also fine and probably has less moat so thanks |
|
@Giulio2002 Can you elaborate on your business application and how you are encountering big endian hardware? You are working with IBM mainframes? |
Problem
When distributing pre-built xor/binary fuse filters to users (e.g., as part of a database, blocklist, or other data file), the serialized filter data may be created on a machine with different endianness than the target machine. Since the library stores multi-byte fingerprints (uint16, uint32) in native byte order, a filter built on a little-endian x86 machine will produce incorrect results when loaded on a big-endian PowerPC or SPARC machine.
This is problematic for use cases like:
The rationale of defaulting to LE is that LE CPUs are more common than BEs.
Solution
Add a
Portablebool field toBinaryFuse[T]andXor8structs. WhenPortable=true:This ensures filters created with
Portable=truecan be serialized once and used correctly on any platform.Usage
Backward Compatibility
This change is fully backward compatible:
Portablefield defaults tofalse, preserving existing behavior