Skip to content

A heroku based dashboard displaying YouTube video statistics about trending videos. An exploratory data analysis in R.

License

Notifications You must be signed in to change notification settings

hanrach/youtube_data_dashboard

 
 

Repository files navigation

Group 2: Exploring YouTube Statistics in Canada

Project repo

Rachel Han & Marion Nyberg

Dataset

Our dataset is Daily trending videos on YouTube.

Preliminary exploratory data analysis can be found here.

Click here for the final report.

To knit the documents above in R, run the following commands in order first:

make data/youtube_data.csv
make data/youtube_processed.csv

Dashboard App

You can find our interactive app to visualize the data here.

Usage

To do the full data analysis and produce the report, follow the instructions below.

Option 1

  1. Clone this repo: git clone https://github.com/STAT547-UBC-2019-20/group_2_youtube

  2. Install the following packages:

  • kableExtra
  • tidyverse
  • ggplot2
  • knitr
  • broom
  • here
  • glue
  • broom
  • corrplot
  • docopt
  • rmarkdown
  1. Run the following scripts in order as specified in the base group_2_youtube directory on the terminal:
# Download data from the web
Rscript scripts/load_data.R --data_url="https://raw.githubusercontent.com/hanrach/youtube_dataset/master/CAvideos.csv"

# Clean data
Rscript scripts/process_data.R --data_path="data/youtube_data.csv" --save_path="data/youtube_processed.csv"

# Create images from exploratory data anlysis
Rscript scripts/eda.R --image_path="images/"

# Peform regression on data and save the model
Rscript scripts/analysis.R --data_path="data/youtube_processed.csv"

# Create a final report in html and pdf format.
Rscript scripts/knit.R --final_report="docs/finalreport.Rmd"

Option 2

  • Make sure you have GNU Make on your machine.
  • Make sure you are in the base directory group_2_youtube.
  • You can choose to run the following make commands in order:
# load data
make data/youtube_data.csv 

# clean data
make data/youtube_processed.csv 

# eda
make images/views_likes.png images/corr_plot.png images/num_vids_category.png images/top10_mean_views_likes.png 

# analysis
make rds/lm.rds rds/glm.rds images/lm_status_views.png images/pois_status_views.png 
		
# knit final report
make docs/finalreport.html docs/finalreport.pdf 
  • Or you can simply run make all to execute the above commands all at once.

  • You can run make clean to delete all the files in the subdirectories except the scipts.

Tests

Tests are written to make sure all the dependencies are loaded in. It's likely to be useful if you are not running make all and choose to run each step incrementally.

  • Run test_dir("tests/testthat") in the base directory group_2_youtube in Rstudio console. All the tests should fail in the beginning since the directories are clean.

  • After all the steps, all the tests should pass.

Dashboard proposal

Description: This app, titled “Canadian YouTube Statistics” will allow users to explore the relationships between YouTube video categories, view, comment and like/dislike counts. On the landing page there will be a bar chart that displays the number of videos per category. From a dropdown list, users will be able to filter a scatter plot by video category so that they are able to visualise the relationship between view count and likes/dislikes for their category of interest (e.g. within the video category ‘Music’). Likes and dislikes will be colour coded and regression lines showing the relationship between view number and likes and dislikes will be shown. Similarly, there will be another plot that allows users to filter video categories and explore the relationship between comment count and likes and dislikes for their category of interest. Again, likes and dislikes will be colour coded and there will be regression lines to show relationships between variables. There will also be 2 range slider options for both graphs that allows users to view relationships over a specific range of like/dislike counts.

Usage: Mary is a psychologist who is trying to understand how certain personalities may be correlated with their level of engagement on social media. To do this she is using video category as a proxy for personality. She is trying to determine whether watching a specific video category means you are more or less likely to comment or press like/dislike. When Mary visits the ‘Canadian YouTube Statistics’ app she will be able to view which video categories are the most popular, and then visualise the relationships between the number of likes/dislikes, comment count and view number. From a dropdown she will be able to filter the scatter plots so that she is viewing the relationships specific to each video category. She is also able to scale the scatter plots so that she is viewing the data for a specific range of like/dislike numbers, e.g. view relationships for video categories with <1000 likes. When she does this, she notices that for the video category ‘Entertainment’, there is the highest correlation between comment count and view number, so she hypothesizes that people who watch films of this category may have a more outspoken personality.

Dashboard draft sketch

Link to Draft app

The proposal has been implemented, see Dashboard App above.

About

A heroku based dashboard displaying YouTube video statistics about trending videos. An exploratory data analysis in R.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 88.3%
  • Makefile 7.1%
  • Dockerfile 3.6%
  • Rich Text Format 1.0%