Merge pull request #11 from ixaxaar/debug

Temporal debugging of memory
ixaxaar · Nov 12, 2017 · 680488a · 680488a
2 parents c8bfa6c + e618417
commit 680488a
Show file tree

Hide file tree

Showing 9 changed files with 276 additions and 197 deletions.
diff --git a/README.md b/README.md
@@ -18,28 +18,38 @@ pip install dnc
 
 **Parameters**:
 
+Following are the constructor parameters:
+
 | Argument | Default | Description |
 | --- | --- | --- |
-| input_size | None | Size of the input vectors |
-| hidden_size | None | Size of hidden units |
-| rnn_type | 'lstm' | Type of recurrent cells used in the controller |
-| num_layers | 1 | Number of layers of recurrent units in the controller |
-| num_hidden_layers | 2 | Number of hidden layers per layer of the controller |
-| bias | True | Bias |
-| batch_first | True | Whether data is fed batch first |
-| dropout | 0 | Dropout between layers in the controller |
-| bidirectional | False | If the controller is bidirectional (Not yet implemented) |
-| nr_cells | 5 | Number of memory cells |
-| read_heads | 2 | Number of read heads |
-| cell_size | 10 | Size of each memory cell |
-| nonlinearity | 'tanh' | If using 'rnn' as `rnn_type`, non-linearity of the RNNs |
-| gpu_id | -1 | ID of the GPU, -1 for CPU |
-| independent_linears | False | Whether to use independent linear units to derive interface vector |
-| share_memory | True | Whether to share memory between controller layers |
-| reset_experience | False | Whether to reset memory (This is a parameter for the forward pass) |
-
-
-Example usage:
+| input_size | `None` | Size of the input vectors |
+| hidden_size | `None` | Size of hidden units |
+| rnn_type | `'lstm'` | Type of recurrent cells used in the controller |
+| num_layers | `1` | Number of layers of recurrent units in the controller |
+| num_hidden_layers | `2` | Number of hidden layers per layer of the controller |
+| bias | `True` | Bias |
+| batch_first | `True` | Whether data is fed batch first |
+| dropout | `0` | Dropout between layers in the controller |
+| bidirectional | `False` | If the controller is bidirectional (Not yet implemented |
+| nr_cells | `5` | Number of memory cells |
+| read_heads | `2` | Number of read heads |
+| cell_size | `10` | Size of each memory cell |
+| nonlinearity | `'tanh'` | If using 'rnn' as `rnn_type`, non-linearity of the RNNs |
+| gpu_id | `-1` | ID of the GPU, -1 for CPU |
+| independent_linears | `False` | Whether to use independent linear units to derive interface vector |
+| share_memory | `True` | Whether to share memory between controller layers |
+
+Following are the forward pass parameters:
+
+| Argument | Default | Description |
+| --- | --- | --- |
+| input | - | The input vector `(B*T*X)` or `(T*B*X)` |
+| hidden | `(None,None,None)` | Hidden states `(controller hidden, memory hidden, read vectors)` |
+| reset_experience | `False` | Whether to reset memory (This is a parameter for the forward pass |
+| pass_through_memory | `True` | Whether to pass through memory (This is a parameter for the forward pass |
+
+
+### Example usage:
 
 ```python
 from dnc import DNC
@@ -62,13 +72,58 @@ output, (controller_hidden, memory, read_vectors) = \
   rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))
 ```
 
+### Debugging:
+
+The `debug` option causes the network to return its memory hidden vectors (numpy `ndarray`s) for the first batch each forward step.
+These vectors can be analyzed or visualized, using visdom for example.
+
+```python
+from dnc import DNC
+
+rnn = DNC(
+  input_size=64,
+  hidden_size=128,
+  rnn_type='lstm',
+  num_layers=4,
+  nr_cells=100,
+  cell_size=32,
+  read_heads=4,
+  batch_first=True,
+  gpu_id=0,
+  debug=True
+)
+
+(controller_hidden, memory, read_vectors) = (None, None, None)
+
+output, (controller_hidden, memory, read_vectors), debug_memory = \
+  rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors, reset_experience=True))
+```
+
+Memory vectors returned by forward pass (`np.ndarray`):
+
+| Key | Y axis (dimensions) | X axis (dimensions) |
+| --- | --- | --- |
+| `debug_memory['memory']` | layer * time | nr_cells * cell_size
+| `debug_memory['link_matrix']` | layer * time | nr_cells * nr_cells
+| `debug_memory['precedence']` | layer * time | nr_cells
+| `debug_memory['read_weights']` | layer * time | read_heads * nr_cells
+| `debug_memory['write_weights']` | layer * time | nr_cells
+| `debug_memory['usage_vector']` | layer * time | nr_cells
+
 ## Example copy task
 
 The copy task, as descibed in the original paper, is included in the repo.
 
 From the project root:
 ```bash
-python ./tasks/copy_task.py -cuda 0
+python ./tasks/copy_task.py -cuda 0 -optim rmsprop -batch_size 32 -mem_slot 64 # (original implementation)
+
+python ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -nlayer 1 -nhlayer 2 -mem_slot 32 -batch_size 32 -optim adam # (faster convergence)
+```
+
+For the full set of options, see:
+```
+python ./tasks/copy_task.py --help
 ```
 
 The copy task can be used to debug memory using [Visdom](https://github.com/facebookresearch/visdom).
@@ -88,13 +143,12 @@ python ./tasks/copy_task.py -cuda 0
 
 The visdom dashboard shows memory as a heatmap for batch 0 every `-summarize_freq` iteration:
 
-![Visdom dashboard](https://user-images.githubusercontent.com/144122/32119691-a54ebd86-bb73-11e7-9ffa-4fa720d7a21a.png)
+![Visdom dashboard](./docs/dnc-mem-debug.png)
 
 
 ## General noteworthy stuff
 
 1. DNCs converge with Adam and RMSProp learning rules, SGD generally causes them to diverge.
-2. Using a large batch size (> 100, recommended 1000) prevents gradients from becoming `NaN`.
 
 Repos referred to for creation of this repo: