Description of Issue
The pyproject.toml clarifies that the protobuf version compatible with this library is >=3.20.0. However, one of the critical parts of this library is the including_default_value_fields variable in the json_format._Printer class from google.protobuf. In protobuf version 5.26.1, including_default_value_fields was deprecated for always_print_fields_with_no_presence on commit 7d43131
Recreate Example
poetry install
poetry add protobuf==5.26.1
# test.py
from pyspark.sql.session import SparkSession
from example.example_pb2 import SimpleMessage
from pbspark import from_protobuf
from pbspark import to_protobuf
spark = SparkSession.builder.getOrCreate()
example = SimpleMessage(name="hello", quantity=5, measure=12.3)
data = [{"value": example.SerializeToString()}]
df_encoded = spark.createDataFrame(data)
df_decoded = df_encoded.select(from_protobuf(df_encoded.value, SimpleMessage).alias("value"))
df_expanded = df_decoded.select("value.*")
df_expanded.show()
# +-----+--------+-------+
# | name|quantity|measure|
# +-----+--------+-------+
# |hello| 5| 12.3|
# +-----+--------+-------+
df_reencoded = df_decoded.select(to_protobuf(df_decoded.value, SimpleMessage).alias("value"))
# run below
poetry run python test.py
Traceback (most recent call last):
File "/home/project/test.py", line 14, in <module>
df_expanded.show()
File "/usr/local/lib/python3.12/site-packages/pyspark/sql/dataframe.py", line 947, in show
print(self._show_string(n, truncate, vertical))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pyspark/sql/dataframe.py", line 965, in _show_string
return self._jdf.showString(n, 20, vertical)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pyspark/errors/exceptions/captured.py", line 185, in deco
raise converted from None
pyspark.errors.exceptions.captured.PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "/home/project/pbspark/_proto.py", line 343, in decoder
return self.message_to_dict(
^^^^^^^^^^^^^^^^^^^^^
File "/home/project/pbspark/_proto.py", line 227, in message_to_dict
printer = _Printer(
^^^^^^^^^
File "/home/project/pbspark/_proto.py", line 79, in __init__
super().__init__(**kwargs)
TypeError: _Printer.__init__() got an unexpected keyword argument 'including_default_value_fields'
Are there plans of updating this library to work with newer version of protobuf? Would you as the developers be opposed to me creating the changes on another branch?
Description of Issue
The pyproject.toml clarifies that the protobuf version compatible with this library is
>=3.20.0. However, one of the critical parts of this library is theincluding_default_value_fieldsvariable in thejson_format._Printerclass fromgoogle.protobuf. In protobuf version5.26.1,including_default_value_fieldswas deprecated foralways_print_fields_with_no_presenceon commit 7d43131Recreate Example
# run below poetry run python test.pyAre there plans of updating this library to work with newer version of protobuf? Would you as the developers be opposed to me creating the changes on another branch?