Error in to_data_frame() when feeding numpy matrix as label

I am trying to reproduce the basic autoencoder example from the Keras blog:

https://blog.keras.io/building-autoencoders-in-keras.html

```
# Define basic parameters
encoding_dim = 32
batch_size = 32
epochs = 1

# Build model
input_img = Input(shape=(784,))
encoded = Dense(encoding_dim, activation='relu')(input_img)
decoded = Dense(784, activation='sigmoid')(encoded)
autoencoder = Model(input_img, decoded)
encoder = Model(input_img, encoded)
encoded_input = Input(shape=(encoding_dim,))
decoded_layer = autoencoder.layers[-1]
decoder = Model(encoded_input, decoded_layer(encoded_input))
print(autoencoder.summary())


# Load data
(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.reshape(60000, 784).astype('float32') / 255.
x_test = x_test.reshape(10000, 784).astype('float32') / 255.
plt.imshow(x_train[randint(0,60000-1),:].reshape(28,28))
plt.gray()
plt.show()

print(x_train.shape, 'train samples')
print(x_test.shape, 'test samples')

# Create Spark context
conf = SparkConf().setAppName('Mnist_Spark_MLP').setMaster('local[8]')
sc = SparkContext(conf=conf)

# Build RDD from numpy features and labels
test_df = to_data_frame(sc, x_test, x_test, categorical=False)

```

This generates an error, which I believe is due to the label only being accepted as input if it is a scalar array, not a matrix of vectors:

```
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "/Users/***/ams/px-seed-model/scripts/elephas_ae_mnist.py", line 79, in <module>
    test_df = to_data_frame(sc, x_test, x_test, categorical=False)
  File "/Users/***/ams/px-seed-model/pinpoint/src/elephas/elephas/ml/adapter.py", line 11, in to_data_frame
    lp_rdd = to_labeled_point(sc, features, labels, categorical)
  File "/Users/***/ams/px-seed-model/pinpoint/src/elephas/elephas/utils/rdd_utils.py", line 38, in to_labeled_point
    lp = LabeledPoint(y, to_vector(x))
  File "/Users/***/ams/px-seed-model/pinpoint/lib/python3.6/site-packages/pyspark/mllib/regression.py", line 53, in __init__
    self.label = float(label)
TypeError: only size-1 arrays can be converted to Python scalars

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Error in to_data_frame() when feeding numpy matrix as label #100

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Error in to_data_frame() when feeding numpy matrix as label #100

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions