When I run the code.
I got an error immediately.
Following are all the output I got. Is there any idea of what was happening?
R-RNNEM/InputWrapper1/Conv/weights:0 [4, 4, 1, 16]
R-RNNEM/InputWrapper1/Conv/biases:0 [16]
R-RNNEM/LayerNormI1/LayerNorm/beta:0 [16]
R-RNNEM/LayerNormI1/LayerNorm/gamma:0 [16]
R-RNNEM/InputWrapper2/Conv/weights:0 [4, 4, 16, 32]
R-RNNEM/InputWrapper2/Conv/biases:0 [32]
R-RNNEM/LayerNormI2/LayerNorm/beta:0 [32]
R-RNNEM/LayerNormI2/LayerNorm/gamma:0 [32]
R-RNNEM/InputWrapper3/Conv/weights:0 [4, 4, 32, 64]
R-RNNEM/InputWrapper3/Conv/biases:0 [64]
R-RNNEM/LayerNormI3/LayerNorm/beta:0 [64]
R-RNNEM/LayerNormI3/LayerNorm/gamma:0 [64]
R-RNNEM/InputWrapper5/fully_connected/weights:0 [4096, 512]
R-RNNEM/InputWrapper5/fully_connected/biases:0 [512]
R-RNNEM/LayerNormI5/LayerNorm/beta:0 [512]
R-RNNEM/LayerNormI5/LayerNorm/gamma:0 [512]
R-RNNEM/basic_rnn_cell/kernel:0 [762, 250]
R-RNNEM/basic_rnn_cell/bias:0 [250]
R-RNNEM/LayerNormR0/LayerNorm/beta:0 [250]
R-RNNEM/LayerNormR0/LayerNorm/gamma:0 [250]
R-RNNEM/OutputWrapper0/fully_connected/weights:0 [250, 512]
R-RNNEM/OutputWrapper0/fully_connected/biases:0 [512]
R-RNNEM/LayerNormO0/LayerNorm/beta:0 [512]
R-RNNEM/LayerNormO0/LayerNorm/gamma:0 [512]
R-RNNEM/OutputWrapper1/fully_connected/weights:0 [512, 4096]
R-RNNEM/OutputWrapper1/fully_connected/biases:0 [4096]
R-RNNEM/LayerNormO1/LayerNorm/beta:0 [4096]
R-RNNEM/LayerNormO1/LayerNorm/gamma:0 [4096]
R-RNNEM/OutputWrapper3/Conv/weights:0 [4, 4, 64, 32]
R-RNNEM/OutputWrapper3/Conv/biases:0 [32]
R-RNNEM/LayerNormO3/LayerNorm/beta:0 [32]
R-RNNEM/LayerNormO3/LayerNorm/gamma:0 [32]
R-RNNEM/OutputWrapper4/Conv/weights:0 [4, 4, 32, 16]
R-RNNEM/OutputWrapper4/Conv/biases:0 [16]
R-RNNEM/LayerNormO4/LayerNorm/beta:0 [16]
R-RNNEM/LayerNormO4/LayerNorm/gamma:0 [16]
R-RNNEM/OutputWrapper5/Conv/weights:0 [4, 4, 16, 1]
R-RNNEM/OutputWrapper5/Conv/biases:0 [1]
4611827 total variables
INFO:tensorflow:Error reported to Coordinator: <class 'RuntimeError'>, Attempted to use a closed Session.
INFO - tensorflow - Error reported to Coordinator: <class 'RuntimeError'>, Attempted to use a closed Session.
ERROR - R-NEM - Failed after 0:01:31!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,4,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, R-RNNEM/OutputWrapper5/Conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: train/total_loss/truediv/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46718_train/total_loss/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent calls WITHOUT Sacred internals):
File "nem.py", line 749, in run
log_dict = run_epoch(session, train_inputs, train_graph, debug_graph, training['debug_samples'], "train_e{}".format(epoch), train_op=train_op)
File "nem.py", line 370, in run_epoch
out = session.run(fetches=fetches)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,4,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, R-RNNEM/OutputWrapper5/Conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: train/total_loss/truediv/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46718_train/total_loss/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D', defined at:
File "nem.py", line 691, in <module>
@ex.automain
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/experiment.py", line 137, in automain
self.run_commandline()
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/experiment.py", line 209, in run
run()
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/run.py", line 221, in __call__
self.result = self.main_function(*args)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "nem.py", line 718, in run
train_op, train_graph, valid_graph, debug_graph = build_graphs(train_inputs.output, valid_inputs.output)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "nem.py", line 296, in build_graphs
actions=train_inputs.get('actions', None)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "nem.py", line 214, in build_graph
collisions=collisions, actions=actions)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "/home/hehaodele/repos/Relational-NEM/nem_model.py", line 309, in static_nem_iterations
hidden_state, output = nem_cell(inputs, hidden_state)
File "/home/hehaodele/repos/Relational-NEM/nem_model.py", line 161, in __call__
preds, h_new = self.run_inner_rnn(masked_deltas, h_old)
File "/home/hehaodele/repos/Relational-NEM/nem_model.py", line 112, in run_inner_rnn
preds, h_new = self.cell(reshaped_masked_deltas, h_old)
File "/home/hehaodele/repos/Relational-NEM/network.py", line 222, in __call__
output, res_state = self._cell(inputs, state)
File "/home/hehaodele/repos/Relational-NEM/network.py", line 252, in __call__
output, res_state = self._cell(inputs, state)
File "/home/hehaodele/repos/Relational-NEM/network.py", line 194, in __call__
projected = slim.layers.conv2d(resized, self._spec['size'], self._spec['kernel'], activation_fn=None)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1049, in convolution
outputs = layer.apply(inputs)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 828, in apply
return self.__call__(inputs, *args, **kwargs)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 717, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 168, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 868, in __call__
return self.conv_op(inp, filter)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in __call__
return self.call(inp, filter)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 204, in __call__
name=self.name)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 956, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,16,4,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, R-RNNEM/OutputWrapper5/Conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: train/total_loss/truediv/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46718_train/total_loss/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hi,
When I run the code.
I got an error immediately.
Following are all the output I got. Is there any idea of what was happening?
Thanks!