Skip to content

Error 'Attempted to use a closed Session.' #3

@hehaodele

Description

@hehaodele

Hi,

When I run the code.

python nem.py with dataset.balls4mass64 network.r_nem nem.k=5

I got an error immediately.
Following are all the output I got. Is there any idea of what was happening?

R-RNNEM/InputWrapper1/Conv/weights:0 [4, 4, 1, 16]
R-RNNEM/InputWrapper1/Conv/biases:0 [16]
R-RNNEM/LayerNormI1/LayerNorm/beta:0 [16]
R-RNNEM/LayerNormI1/LayerNorm/gamma:0 [16]
R-RNNEM/InputWrapper2/Conv/weights:0 [4, 4, 16, 32]
R-RNNEM/InputWrapper2/Conv/biases:0 [32]
R-RNNEM/LayerNormI2/LayerNorm/beta:0 [32]
R-RNNEM/LayerNormI2/LayerNorm/gamma:0 [32]
R-RNNEM/InputWrapper3/Conv/weights:0 [4, 4, 32, 64]
R-RNNEM/InputWrapper3/Conv/biases:0 [64]
R-RNNEM/LayerNormI3/LayerNorm/beta:0 [64]
R-RNNEM/LayerNormI3/LayerNorm/gamma:0 [64]
R-RNNEM/InputWrapper5/fully_connected/weights:0 [4096, 512]
R-RNNEM/InputWrapper5/fully_connected/biases:0 [512]
R-RNNEM/LayerNormI5/LayerNorm/beta:0 [512]
R-RNNEM/LayerNormI5/LayerNorm/gamma:0 [512]
R-RNNEM/basic_rnn_cell/kernel:0 [762, 250]
R-RNNEM/basic_rnn_cell/bias:0 [250]
R-RNNEM/LayerNormR0/LayerNorm/beta:0 [250]
R-RNNEM/LayerNormR0/LayerNorm/gamma:0 [250]
R-RNNEM/OutputWrapper0/fully_connected/weights:0 [250, 512]
R-RNNEM/OutputWrapper0/fully_connected/biases:0 [512]
R-RNNEM/LayerNormO0/LayerNorm/beta:0 [512]
R-RNNEM/LayerNormO0/LayerNorm/gamma:0 [512]
R-RNNEM/OutputWrapper1/fully_connected/weights:0 [512, 4096]
R-RNNEM/OutputWrapper1/fully_connected/biases:0 [4096]
R-RNNEM/LayerNormO1/LayerNorm/beta:0 [4096]
R-RNNEM/LayerNormO1/LayerNorm/gamma:0 [4096]
R-RNNEM/OutputWrapper3/Conv/weights:0 [4, 4, 64, 32]
R-RNNEM/OutputWrapper3/Conv/biases:0 [32]
R-RNNEM/LayerNormO3/LayerNorm/beta:0 [32]
R-RNNEM/LayerNormO3/LayerNorm/gamma:0 [32]
R-RNNEM/OutputWrapper4/Conv/weights:0 [4, 4, 32, 16]
R-RNNEM/OutputWrapper4/Conv/biases:0 [16]
R-RNNEM/LayerNormO4/LayerNorm/beta:0 [16]
R-RNNEM/LayerNormO4/LayerNorm/gamma:0 [16]
R-RNNEM/OutputWrapper5/Conv/weights:0 [4, 4, 16, 1]
R-RNNEM/OutputWrapper5/Conv/biases:0 [1]
4611827 total variables
INFO:tensorflow:Error reported to Coordinator: <class 'RuntimeError'>, Attempted to use a closed Session.
INFO - tensorflow - Error reported to Coordinator: <class 'RuntimeError'>, Attempted to use a closed Session.
ERROR - R-NEM - Failed after 0:01:31!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,4,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, R-RNNEM/OutputWrapper5/Conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: train/total_loss/truediv/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46718_train/total_loss/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "nem.py", line 749, in run
    log_dict = run_epoch(session, train_inputs, train_graph, debug_graph, training['debug_samples'], "train_e{}".format(epoch), train_op=train_op)
  File "nem.py", line 370, in run_epoch
    out = session.run(fetches=fetches)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,4,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, R-RNNEM/OutputWrapper5/Conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: train/total_loss/truediv/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46718_train/total_loss/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op 'train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D', defined at:
  File "nem.py", line 691, in <module>
    @ex.automain
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/experiment.py", line 137, in automain
    self.run_commandline()
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/experiment.py", line 260, in run_commandline
    return self.run(cmd_name, config_updates, named_configs, {}, args)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/experiment.py", line 209, in run
    run()
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/run.py", line 221, in __call__
    self.result = self.main_function(*args)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "nem.py", line 718, in run
    train_op, train_graph, valid_graph, debug_graph = build_graphs(train_inputs.output, valid_inputs.output)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "nem.py", line 296, in build_graphs
    actions=train_inputs.get('actions', None)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "nem.py", line 214, in build_graph
    collisions=collisions, actions=actions)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "/home/hehaodele/repos/Relational-NEM/nem_model.py", line 309, in static_nem_iterations
    hidden_state, output = nem_cell(inputs, hidden_state)
  File "/home/hehaodele/repos/Relational-NEM/nem_model.py", line 161, in __call__
    preds, h_new = self.run_inner_rnn(masked_deltas, h_old)
  File "/home/hehaodele/repos/Relational-NEM/nem_model.py", line 112, in run_inner_rnn
    preds, h_new = self.cell(reshaped_masked_deltas, h_old)
  File "/home/hehaodele/repos/Relational-NEM/network.py", line 222, in __call__
    output, res_state = self._cell(inputs, state)
  File "/home/hehaodele/repos/Relational-NEM/network.py", line 252, in __call__
    output, res_state = self._cell(inputs, state)
  File "/home/hehaodele/repos/Relational-NEM/network.py", line 194, in __call__
    projected = slim.layers.conv2d(resized, self._spec['size'], self._spec['kernel'], activation_fn=None)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1049, in convolution
    outputs = layer.apply(inputs)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 828, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 717, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 168, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 868, in __call__
    return self.conv_op(inp, filter)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in __call__
    return self.call(inp, filter)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 204, in __call__
    name=self.name)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 956, in conv2d
    data_format=data_format, dilations=dilations, name=name)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/hehaodele/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,16,4,4] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[Node: train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](train/R-RNNEM/step_18/OutputWrapper5/Conv/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, R-RNNEM/OutputWrapper5/Conv/weights/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Node: train/total_loss/truediv/_29 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_46718_train/total_loss/truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions