fixes

haosulab · Apr 25, 2024 · 9b06ff0 · 9b06ff0
1 parent 4f6dc75
commit 9b06ff0
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 19 deletions.
diff --git a/lectures/html/L6_off_policy.html b/lectures/html/L6_off_policy.html
@@ -166,7 +166,7 @@ <h1 class="nt">TD-based Q Function Learning</h1>
                     <li>In continuous deterministic Q-learning, the update target becomes $r+\gamma Q(s', \pi_{\phi}(s'))$.</li>
                     <li>In literature, the policy network is also known as the "actor" and the value network is known as the "critic"</li>
                 </ul>
-                <img src="./L16/ddpg_networks.png" width="80%"/>
+                <img src="./L6/ddpg_networks.png" width="80%"/>
             </div>
 
             <!-- ################################################################### -->
@@ -341,10 +341,10 @@ <h1 class="nt"> Issue: Rare Beneficial Samples <br/>in the Replay Buffer </h1>
                     <li> Example: Montezuma's revenge, Blind Cliffwalk, Any long-horizon sparse reward problem </li>
                     <div class="row">
                         <div class="column" style="flex: 10%">
-                            <img src="./L16/MR.png" width="75%"></img>
+                            <img src="./L6/MR.png" width="75%"></img>
                         </div>
                         <div class="column" style="flex: 10%">
-                            <img src="./L16/Blind_Cliffwalk.png" width="100%"></img>
+                            <img src="./L6/Blind_Cliffwalk.png" width="100%"></img>
                         </div>
                     </div>
                 </ul>
@@ -358,7 +358,7 @@ <h1 class="nt"> Blind Cliffwalk </h1>
                     <li> Episode is terminated whenever the agent takes the $\color{red}{\text{wrong}}$ action. </li>
                     <li> Agent will get reward $1$ after taking $n$ $\color{black}{\text{right}}$ actions. </li>
                     <div style=margin-top:10px>
-                        <img src="./L16/Blind_Cliffwalk.png" width="75%"/>
+                        <img src="./L6/Blind_Cliffwalk.png" width="75%"/>
                     </div>
                 </ul>
                 <div class="credit"><a href="https://arxiv.org/pdf/1511.05952.pdf"> Schaul et. al, Prioritized experience replay (ICLR 2016)</a> </div>
@@ -373,7 +373,7 @@ <h1 class="nt"> Analysis with Q-Learning</h1>
                         current state. </li>
 
                     <div style=margin-top:10px>
-                        <img src="./L16/Blind_Cliffwalk_QL.png" width="35%"/>
+                        <img src="./L6/Blind_Cliffwalk_QL.png" width="35%"/>
                     </div>
                 </ul>
                 <div class="credit"><a href="https://arxiv.org/pdf/1511.05952.pdf"> Schaul et. al, Prioritized experience replay (ICLR 2016)</a> </div>
@@ -485,7 +485,7 @@ <h1 class="nt"> Value Network with Discrete Distribution </h1>
                     <li> $Q_\th(s, a) = \sum_i p_{\th, i}(s, a) z_i$. </li>
                     <li> Update rules of $Q$, $x_t$ is state at time-step $t$.
                         <div style=margin-top:10px>
-                            <img src="./L16/DRL.png" width="42%"/>
+                            <img src="./L6/DRL.png" width="42%"/>
                         </div>
                     </li>
                 </ul>
@@ -573,7 +573,7 @@ <h1 class="nt"> Ablation study of tricks in Rainbow </h1>
                 <ul>
                     <li> Prioritized replay, multi-step learning, distributional RL are the most important tricks in Rainbow. </li>
                     <div style=margin-top:10px>
-                        <img src="./L16/Rainbow.png" width="42%"/>
+                        <img src="./L6/Rainbow.png" width="42%"/>
                     </div>
                 </ul>
                 <div class="credit"><a href="https://arxiv.org/pdf/1710.02298.pdf">Hessel et. al, Rainbow: Combining Improvements in Deep Reinforcement Learning (AAAI 2018)</a> </div>

diff --git a/lectures/html/L7_exploration.html b/lectures/html/L7_exploration.html
@@ -67,7 +67,7 @@ <h1 class="nt">Motivation to Explore vs Exploit</h1>
                 <li>New restauraunt is always a risk (unknown food quality, service etc.) </li>
                 <li>But without going to the new restauraunt you never know! How do we balance this?</li>
             </ul>
-            <img src="./SP24_L7/exploration_vs_exploitation.png" alt="" width="50%">
+            <img src="./L7/exploration_vs_exploitation.png" alt="" width="50%">
         </div>
         <div class="step slide">
             <h1 class="nt">Why Exploration is Difficult</h1>
@@ -80,15 +80,15 @@ <h1 class="nt">Why Exploration is Difficult</h1>
                     </ul>
                 </li>
                 <li>Even exploration in low-dimensional space may be tricky when there are "alleys". Low probability of going through small gaps to then explore other states:</li>
-                <img src="./SP24_L7/simple_2d_map.png" alt="" width="30%" />
+                <img src="./L7/simple_2d_map.png" alt="" width="30%" />
             </ul>
         </div>
         <div class="step slide">
             <h1 class="nt">Exploration to Escape Local Minima in Reward</h1>
             <ul>
                 <li>Suppose your dense reward for the environment below is euclidean distance to the flag. The return maximizing sequence of actions are to go through the samll gap and reach the flag (global minimum)</li>
                 <li>But you will never know to do that unless you explore, and with this dense reward function your trained agent will likely headbutt into the blue wall (local minimum)</li>
-                <img src="./SP24_L7/simple_2d_map_headbutt.png" alt="" width="30%" />
+                <img src="./L7/simple_2d_map_headbutt.png" alt="" width="30%" />
             </ul>
         </div>
         <div class="step slide">
@@ -100,7 +100,7 @@ <h1 class="nt">Knowing what to explore is critical</h1>
             </ul>
             <ul class="substep">
                 <li >The agent will become a couch potato and stare at the TV all day.</li>
-                <img src="./SP24_L7/the-noisy-TV-problem.gif" alt="" width="100%">
+                <img src="./L7/the-noisy-TV-problem.gif" alt="" width="100%">
             </ul>
         </div>
 
@@ -145,8 +145,8 @@ <h1 class="nt">Multi-Armed Bandits</h1>
                 <li>In this game you have a few slots and can choose a slot to pull. You then receive a reward sampled from an unknown distribution</li>
             </ul>
             <div style="display: flex">
-                <img src="./SP24_L7/slots.png" alt="" width="30%" style="display:inline-block">
-            <img src="./SP24_L7/slot-dist.png" alt="" width="30%" style="display:inline-block">
+                <img src="./L7/slots.png" alt="" width="30%" style="display:inline-block">
+            <img src="./L7/slot-dist.png" alt="" width="30%" style="display:inline-block">
             </div>
         </div>
         <!-- ################################################################### -->
@@ -184,7 +184,7 @@ <h1 class="nt">Multi-Armed Bandits</h1>
                 <li>
                     Goal is to maximize cumulative reward $\sum_{t=1}^T r_t$
                 </li>
-                <img src="./SP24_L7/slot-dist.png" alt="" width="30%">
+                <img src="./L7/slot-dist.png" alt="" width="30%">
                 </li>
             </ul>
         </div>
@@ -275,7 +275,7 @@ <h1 class="nt">Total Regret Decomposition</h1>
         <!-- ################################################################### -->
         <div class="step slide">
             <h1 class="et">Desirable Total Regret Behavior</h1>
-            <img src="./SP24_L7/regret_as_function_of_time.png" width="1000" height="650"></img>
+            <img src="./L7/regret_as_function_of_time.png" width="1000" height="650"></img>
 
             <ul>
                 <li>What can you infer from this figure?</li>
@@ -331,7 +331,7 @@ <h1 class="vt">Decaying $\epsilon$-Greedy Algorithm</h1>
         <div class="step slide">
             <h1 class="et">The Principle of Optimism in the Face of Uncertainty</h1>
             <div class="row" style="flex: 0%">
-                <img src="./SP24_L7/optimism_before.png" width="900" height="600" />
+                <img src="./L7/optimism_before.png" width="900" height="600" />
             </div>
             <ul>
 
@@ -568,7 +568,7 @@ <h1 class="nt">Counting via Hashing: Autoencoders</h1>
                 <li>$\mathcal{L}(\{s_n\}_{n=1}^N) = \underbrace{-\frac{1}{N} \sum_{n=1}^N \log p(s_n)}_\text{reconstruction loss} + \underbrace{\frac{1}{N} \frac{\lambda}{K} \sum_{n=1}^N\sum_{i=1}^k \min \big \{ (1-b_i(s_n))^2, b_i(s_n)^2 \big\}}_\text{sigmoid activation being closer to binary}$</li>
                 <!-- <li>Intuition for autoencoder: Effectively mapping high dimensional state to lower dimensions (a code), and then decoding that into a binary </li> -->
 
-                <img src="./SP24_L7/autoencoder.png" width="30%">
+                <img src="./L7/autoencoder.png" width="30%">
             </ul>
             <div class="credit"><a href="https://arxiv.org/abs/1611.04717">Tang et. al, # Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning</a></div>
             <div class="credit"><a href="https://lilianweng.github.io/posts/2020-06-07-exploration-drl/">Lilliang Weng, Exploration Strategies in Deep Reinforcement Learning</a></div> 
@@ -643,7 +643,7 @@ <h1 class="nt">Quick Refresher on GMM</h1>
                 <li>Gaussian Mixture Model Training Process initializes $k$ different Gaussians and fits them to the data. Suppose for example our state space has 2 dimensions below.</li>
                 <li>The GMM is our density model and generates probabilities of seeing some input $p_t(s)$</li>
                 <li>Typically optimized via Expectation Maximization (EM)</li>
-                <img src="./SP24_L7/gaussian_mixture_model.gif"/>
+                <img src="./L7/gaussian_mixture_model.gif"/>
 
             </ul>
             <div class="ack">Gif from <a href="https://brilliant.org/wiki/gaussian-mixture-model/">https://brilliant.org/wiki/gaussian-mixture-model/</a>, which is also a easy resource to learn how EM works.</div>
@@ -884,7 +884,7 @@ <h1 class="nt">Random Network Distillation Performance</h1>
                 </div>
                 <div class="step slide">
                     <h1 class="nt">Random Network Distillation Performance</h1>
-                    <img src="./SP24_L7/rnd_fig.png" width="65%"/>
+                    <img src="./L7/rnd_fig.png" width="65%"/>
                     <div class="credit"><a href="https://arxiv.org/pdf/1606.01868.pdf">Burda et. al, Random Network Distillation</a></div>
                 </div>