Add Video Question and Answering model JSFusion #61

yunseong · 2019-07-18T07:39:42Z

This PR adds a video QA model named JSFusion (Yu et al. ECCV 2018). Most of the code has been taken from @jsjason's implementation (https://github.com/jsjason/jsfusion-pytorch).

Changes specifically made for our system:

All components implement RnB interface (e.g., VideoPathIterator, Runner)
Move the place of creating inputs (masks, sentences, sentence_masks) from loader to steps, where the inputs are actually used
Place all tensors (for model parameters) to corresponding GPUs; most parameters are defined as class variables (not sure it's a correct name) and torch.nn.Module.register_buffer is used if unavailable.

How to run

$ pip install hickle    # Q: how do I add this entry to spec-file.txt?
$ export LSMDC_PATH=/cmsdata/hdd0/cmslab/LSMDC # in elsa-12
$ CUDA_VISIBLE_DEVICES=0 python benchmark.py -c config/jsfusion-whole.json -v 10

Although the code still has a vast space to optimize, I would like to parallelize the review process by taking a chance to ask questions on my side.

runner.py

yunseong · 2019-07-18T07:44:26Z

models/jsfusion/module.py

+    (frames,), filename = tensors
+    resnet_output = self.pool(frames)
+    resnet_output = resnet_output.view(resnet_output.shape[0], resnet_output.shape[1])
+    # TODO handle the case when resnet_output.shape[0] < num_frames (fill zeros)


This comment points to the branch for computing mask in https://github.com/jsjason/jsfusion-pytorch/blob/master/infer.py#L177, which I think can become simplified.

jsjason · 2019-07-18T11:09:58Z

models/jsfusion/model.py

+    # frames: (40, 3, 224, 224)
+
+    filename = os.path.basename(file_path)
+    out = (frames, filename)


I think this should be ((frames,), filename). The first item needs to be a tuple of tensors, not a single tensor.

jsjason · 2019-07-18T11:15:39Z

models/jsfusion/module.py

+
+    conv3 = self.conv3(bn2)
+    relu3 = self.relu3(conv3)
+    bn3 = self.bn3(relu3)


I've seen a lot of open PyTorch code that does something like the following.

x = self.conv3(x) x = self.relu3(x) x = self.bn3(x)

I'm not sure what the real intention of this is, but I'm assuming it's for reducing the memory footprint? If our implementations takes up a lot of memory, then I guess we could change our code to have this pattern, too. I'm not sure how this affects inference time.

jsjason · 2019-07-18T11:18:26Z

models/jsfusion/module.py

+
+    cut_mask_indices = [i for i in range(cut_mask.shape[1]) if i % 2 == 1 and i < cut_mask.shape[1] - 1]
+    cut_mask_indices = torch.tensor(cut_mask_indices)
+    cut_mask_indices = cut_mask_indices.to(device=self.device, non_blocking=True)


If this part takes long, then we perhaps could pre-allocate a bunch of tensors for every possible mask length, and simply fetch the correct cut_mask_indices tensor at runtime. The default max length 40 isn't THAT long, so this shouldn't take up a lot of memory.

jsjason · 2019-07-18T11:23:37Z

models/jsfusion/module.py

+
+    # sentences: np.ndarray shape=(Bx5xL) dtype=int32
+    # sentence_masks: np.ndarray shape=(Bx5xL) dtype=float32
+    sentences, sentence_masks = self.parse_sentences(self.word2idx, mc_path, self.num_frames)


If this segment turns out to be a bottleneck, I guess we could construct a very complex pipeline in which two steps (e.g., ResNetFeatureExtractor and SentenceParser) can be run concurrently and their results are aggregated in the final step MCModel... maybe next year?

yunseong commented Jul 18, 2019

View reviewed changes

jsjason self-requested a review July 18, 2019 09:35

Port JsFusion to RnB

bdb3fc2

yunseong force-pushed the ys-vqa branch from b4ad5d3 to bdb3fc2 Compare July 18, 2019 11:00

jsjason reviewed Jul 18, 2019

View reviewed changes

yunseong added 2 commits July 18, 2019 14:36

Fix type mismatch in JSFusionLoader

b83d92e

Make sentence parsing efficient

fa3818e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Video Question and Answering model JSFusion #61

Add Video Question and Answering model JSFusion #61

yunseong commented Jul 18, 2019 •

edited

Loading

yunseong Jul 18, 2019

jsjason Jul 18, 2019

jsjason Jul 18, 2019 •

edited

Loading

jsjason Jul 18, 2019

jsjason Jul 18, 2019

Add Video Question and Answering model JSFusion #61

Are you sure you want to change the base?

Add Video Question and Answering model JSFusion #61

Conversation

yunseong commented Jul 18, 2019 • edited Loading

How to run

yunseong Jul 18, 2019

Choose a reason for hiding this comment

jsjason Jul 18, 2019

Choose a reason for hiding this comment

jsjason Jul 18, 2019 • edited Loading

Choose a reason for hiding this comment

jsjason Jul 18, 2019

Choose a reason for hiding this comment

jsjason Jul 18, 2019

Choose a reason for hiding this comment

yunseong commented Jul 18, 2019 •

edited

Loading

jsjason Jul 18, 2019 •

edited

Loading