From 93f403c91f5abf517e94834f2a5944f1b3737c6e Mon Sep 17 00:00:00 2001
From: "Luke W. Johnston" <lwjohnst86@users.noreply.github.com>
Date: Sat, 30 Nov 2024 19:01:21 +0100
Subject: [PATCH] docs(project): revised project work to have clearer tasks
 (#35)

---
 appendix/project.qmd  | 77 ++++++++++++++++++++++++++-----------------
 preamble/syllabus.qmd |  7 +++-
 2 files changed, 53 insertions(+), 31 deletions(-)

diff --git a/appendix/project.qmd b/appendix/project.qmd
index 95f14b4..1495111 100644
--- a/appendix/project.qmd
+++ b/appendix/project.qmd
@@ -2,11 +2,20 @@
 
 To maximize how much you learn and how much you will retain, you as a
 group will take what you learn in the course and apply it to create a
-reproducible project. This project ...
+reproducible project within a server environment. This project ...
+
+We will have created a project folder to work in on the server as well
+as assigning you and your team mate(s) to the project. Within your
+project, you will also be given a specific set of outputs to create (a
+figure, a table, and a basic report), based on the data we provide to
+you. The outputs will not require using all of the data given.
+
+<!-- TODO: Include a panelset of specific data and their outputs? -->
 
 During the last session of the course you will work on this assignment.
-In the last \~20 minutes of this session, the lead instructor will ...
-and re-generate your report to check that it is reproducible.
+In the last \~20 minutes of this session, the lead instructor will go
+into your projects and re-generate your report to check that it is
+reproducible.
 
 ## Specific tasks
 
@@ -16,33 +25,31 @@ quickly start collaborating together on the project.
 
 Your specific tasks are:
 
-Sequence of steps for project:
-
--   Starting point:
-    -   Learning how to identify what file storage format (e.g. csv or
-        SAS dataset) there are and knowing how convert those files into
-        more efficient formats (like Parquet or a SQL database)
-    -   Give them a few server environment types, and the same data but
-        with different starting formats. And then they figure out the
-        next steps based on that information
-    -   Multiple data is big enough to prevent doing it normal way (1 Gb
-        or larger?)
--   Explaining why the original data format might not be ideal and then
-    converting the data into more efficient format
--   Identify what the desired sample is for the dataset, only select and
-    filter data they need for analysis
--   Split the data into smaller chunk to prototype code (running code on
-    all the data later)
--   Run basic analysis (descriptive statistics)... Not modeling
--   Implement some code to run with parallel processing
--   Identifying which format data or items can be downloaded, and
-    converting that to that format
-
-Assumptions:
-
--   Assume they have taken the intermediate course (need to know
-    functionals and function-based workflows), and either have read or
-    taken the advanced course or are familiar enough with targets
+1.  Review the outputs we want you to create as well as the specific
+    data needed for creating them. Keep these in mind for later tasks.
+2.  Identify what resources you have available in the server
+    environment, such as the number of cores and amount of memory. Use
+    this information to guide your coding.
+3.  Look into the ... folder that contains the raw data you will use for
+    this project. Identify which file storage format the data is saved
+    in and convert it into a more efficient method if necessary.
+    Depending on which format it is, save the converted format into the
+    ... folder. Make use of parallel processing to speed up this (using
+    `{furrr}`).
+4.  Read in a subset of the data that only contains the columns and rows
+    you need for your outputs. Randomly keep a slice of this data to use
+    for prototyping code later.
+5.  Working backwards, write out a set of (empty) functions that provide
+    the sequences of steps that will create a specific output. Begin
+    filling in these functions, making sure they work before adding them
+    to the `{targets}` pipeline. Write the functions and `{targets}`
+    pipeline so that it will run things in parallel.
+6.  Incorporate the outputs into a report, include that in the
+    `{targets}` pipeline.
+7.  Comment out the line of code that randomly keeps a slice of the data
+    and then run the `{targets}` pipeline with `targets::tar_make()`.
+8.  The project will be complete if you can regenerate all the outputs
+    using only the `{targets}` pipeline.
 
 ## Quick "checklist" for a good project
 
@@ -50,7 +57,17 @@ Assumptions:
 
 What we expect you to do for the group project:
 
+-   Use parallel processing.
+-   Use a `{targets}` pipeline.
+-   Use functional programming (including creating your own functions).
+-   Use an efficient file storage format.
+
 What we don't expect:
 
+-   No complicated analyses.
+-   No complicated figures or tables.
+-   No processing that isn't specifically related to creating the
+    assigned output.
+
 Essentially, the group project is a way to reinforce what you learned
 during the course, but in a more relaxed and collaborative setting.
diff --git a/preamble/syllabus.qmd b/preamble/syllabus.qmd
index 59012d5..3c6716f 100644
--- a/preamble/syllabus.qmd
+++ b/preamble/syllabus.qmd
@@ -69,7 +69,12 @@ To help manage expectations and develop the material for this course, we
 make a few assumptions about *who you are* as a participant in the
 course:
 
--   Assumptions
+Assumptions:
+
+-   This course builds off of the content found within the intermediate
+    course (specifically, using functionals and function-based
+    workflows) and the advanced course (specifically, using `{targets}`
+    to build pipelines).
 
 While we have these assumptions to help focus the content of the course,
 if you have an interest in learning R but don't fit any of the above