Skip to content

[RFC] More flexible floating-point type specification #185

@hameerabbasi

Description

@hameerabbasi

Motivation

There are a large number of floating-point types in DLPack, especially 4-bit and 8-bit variants. The structure to define new FP types for DLPack exchange currently requires modifications to DLPack itself, and this RFC aims to relax that limitation.

MLIR moved to a more flexible system in the beginning of last year (discussion, PR), and I feel we should follow in their footsteps.

@seberg commented in a different issue that a way forward without ABI breakage would be a requirement to do this short-term, and I agree.

Proposal

This RFC proposes a few things, that when taken together, allow one to specify many¹ floating point types.

  1. The introduction of two new enum values in the kDLDataTypeCode enum: kDLFlexibleFloat and kDLFlexibleComplex.
  2. That the DLDataType struct be turned into a (C-style) union. The existing layout would become the first variant of the struct, and the second variant is specified below.
typedef struct {
  /*!
   * \brief Type code of base types.
   * We keep it uint8_t instead of DLDataTypeCode for minimal memory
   * footprint, but the value should be one of DLDataTypeCode enum values.
   * */
  uint8_t code;
  /*!
   * \brief Number of bits minus one, common choices are 2, 4, 8, 16, 32, 64.
   * For complex, specify the number of bits for the components rather than the whole.
   */
  uint8_t bitsm1;
  /*!
   * \brief Number of mantissa bits.
   * Implies number of exponent bits are bits - 1 - mantissa in the common case.
   * mantissa = 0 specifies an exponent-only format (one example is float8_e8m0fnu)
   */
  uint8_t mantissa;
  /*!
   * \brief The flags associated with this FP type.
   * Beginning with the LSB:
   * Bit 0: Finite bit, often specified by f
   * Bit 1: NaN bit, often specified by n
   * Bit 2: (Signed) zero bit, often specified by z
   * Bit 3: Unsigned zero bit, often specified by uz
   * Bits 5-7: Reserved.
   */
  uint8_t flags;
  /*! \brief Specifies the number of lanes in the type, used for vector types.
   * The value should be raised to a power of two, i.e., a value of 5 means 32 lanes.
   */
  uint8_t p2lanes;
} DLDataTypeFloatingSpec;

This variant of the struct is only activated by the two enum values above. The existing enum and equivalents here would be considered aliases to each other, but implementations are not required to support the latter if they already support the former.

Open Questions

The float8_e4m3b11fnuz type is not covered by this proposal in its current form. I'm currently debating how best to address that. Options are:

  1. Leave it as out of scope/requiring its own enum value.
  2. Split p2lanes into two 4-bit fields, one for p2lanes and the other for bias.
  3. Use the reserved bits for bias instead.
  4. Use a combination to increase the bits available to bias.

¹ I would consider non IEEE-style types such as POSIT and TAKUM to be explicitly excluded from the scope of this proposal, perhaps to be addressed by future flags.

cc @seberg and @hpkfft as they showed interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions