Skip to content

JinhwanSul/mrsumtest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mr. Sum: Large-scale Video Summarization Dataset and Benchmark

Mr. Sum is a large-scale video summarization dataset, which contains 31,892 videos selected from YouTube-8M dataset and reliable frame importance score labels aggregated from 50,000+ users per video.

Most Replayed Statistics for Summarization

Example 1: AC Sparta Praha - Top 10 goals, season 2013/2014

1 2 3 4
gif1 gif2 gif3 gif4

The four most viewed scenes in the "AC Sparta Praha" video (Link) all show players scoring goals.

Example 2: Best bicyle kick goals in Peru

1 2 3 4
gif1 gif2 gif3 gif4

The four most viewed scenes in the above video all show players scoring goals with amazing bicycle kicks.(Link)

Example 3: Neo - 'The One' | The Matrix

1 2 3
gif1 gif2 gif3

The first most viewed scene, noted as 1 in the video, as soon as Neo meets Agent Smith, he is immediately shot by the gun. The second most viewed scene, noted as 2, plenty of Agent Smiths shoots Neo and Neo reaches out his hand to block the bullets. Lastly, in the most viewed scene 3, Neo engages in combat with Agent Smith. (Link)

Update

  • 2023.06.07, Repository created.

Getting Started

  1. Download YouTube-8M dataset.

  2. Download mrsum.h5 and metadata.csv place it under dataset folder.


Complete Mr.Sum Dataset

You need four fields on your mrsum.h5 to prepare.

  1. features: Video frame features from YouTube-8M dataset.
  2. gtscore: Most replayed statistics normalized in 0 to 1 score.
  3. change_points: Shot boundary information obtained using the Kernel Temporal Segmentation algorithm.
  4. gtsummary: Ground truth summary obtained by solving 0/1 knapsack algorithm on shots.

We provide three fields, gtscore, change_points, and gtsummary, inside mrsum.h5.

After downloading YouTube-8M dataset, you can add the features field using

python preprocess/preprocess.py

Please read DATASET.md for more details about the Mr.Sum.


Apply Mr.Sum on your summarization model

We provide a sample code for training and evaluating summarization models on Mr.Sum.

A summarization model developer can test their own model by implementing pytorch models under the model/networks folder.

We provide the SimpleMLP summarization model as an example.

You can train your model on Mr.Sum dataset using the command below. Modify configurations with your taste!

python main.py --train True --batch_size 8 --epochs 50 --tag exp1

We referred the code from PGL-SUM.


License

This dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) license following the YouTube-8M dataset. All the Mr.Sum dataset users must comply with YouTube Terms of Service and YouTube API Services Terms of Service.


Releases

No releases published

Packages

No packages published

Languages