Built site for gh-pages

mlsquare · Sep 12, 2024 · 6a345d0 · 6a345d0
1 parent 4f55c06
commit 6a345d0
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 5 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-3ad1f044
+a483289d
diff --git a/lectures/w07-l01.html b/lectures/w07-l01.html
@@ -429,7 +429,7 @@ <h3 class="anchored" data-anchor-id="how-many">How many?</h3>
 <section id="a-statistical-approach" class="level4">
 <h4 class="anchored" data-anchor-id="a-statistical-approach">A Statistical Approach</h4>
 <p>Let us simplify and consider a <em>regression problem</em>. We are trying to learn a function <span class="math inline">\(f: [0,1] \rightarrow R\)</span>. Imagine you are fitting a decision tree to approximate this function. A decision tree partitions the input space, and in each of the partitions certain statistics like mean and quintiles are computed. For a given instance, the prediction is given by, for example, the mean of all responses of the examples belonging to that partition. So, we can divide or cluster or partition the training data into <span class="math inline">\(K\)</span> subsets and compute some statistic in each these subsets. If unlablled data is available, running a clustering algorithms will give an idea about <span class="math inline">\(K\)</span>. If we assume that the labels (responses) of the k-th partition denoted by <span class="math inline">\(y_k\)</span> follow <span class="math inline">\(N(\mu_k, \sigma^2)\)</span>, we can estimate the <span class="math inline">\((1-\alpha)\)</span> level prediction interval (PI) for <span class="math inline">\(\mu_k\)</span> as <span class="math display">\[\bar{y}_k + z_{1-\frac{\alpha}{2}}\sigma \sqrt{1+\frac{1}{N}}\]</span></p>
-<p>where <span class="math inline">\(\bar{y}_k\)</span> is the sample mean, <span class="math inline">\(N\)</span> is the sample size, <span class="math inline">\(\sigma^2\)</span> is the noise variance, <span class="math inline">\(\alpha\)</span> controls the confidence level (or type-1 error of the corresponding hypothesis test). So the “design inputs” needed to solicit a sample size are: <span class="math inline">\(\alpha\)</span>, <span class="math inline">\(\sigma^2\)</span>. Sometimes, the precision needed for the estimate can be asserted in terms of the width of the interval (PIW). In this case, PIW is given as <span class="math inline">\(PIW = 2z_{1-\frac{\alpha}{2}}\sigma \sqrt{1+\frac{1}{N}}\)</span>. For large <span class="math inline">\(n\)</span>, <span class="math inline">\(PIW \approx 2z_{1-\frac{\alpha}{2}} \frac{\sigma}{\sqrt{N}}\)</span>. Now, we can express sample size as a function of <span class="math inline">\(PIW\)</span>, <span class="math inline">\(\alpha\)</span> as: <span class="math inline">\(N \approx \left(\frac{2\sigma z_{1-\frac{\alpha}{2}}}{PIW} \right)^2\)</span>. If there are <span class="math inline">\(K\)</span> partitions, we need to estimate that many <span class="math inline">\(\mu_k\)</span>s. So, the total sample size will be <span class="math inline">\(NK\)</span> assuming all partitions have same variance. If not, is is not hard to update the formula. In someways, the model complexity is captured by <span class="math inline">\(K\)</span>. In general, one does not know these numbers in advance and has to make an educated guess based on domain knowledge and refine the design inputs as the data collection drive is set in motion.</p>
+<p>where <span class="math inline">\(\bar{y}_k\)</span> is the sample mean, <span class="math inline">\(N\)</span> is the sample size, <span class="math inline">\(\sigma^2\)</span> is the noise variance, <span class="math inline">\(\alpha\)</span> controls the confidence level (or type-1 error of the corresponding hypothesis test). So the “design inputs” needed to solicit a sample size are: <span class="math inline">\(\alpha\)</span>, <span class="math inline">\(\sigma^2\)</span>. Sometimes, the precision needed for the estimate can be asserted in terms of the width of the interval (PIW). In this case, PIW is given as <span class="math inline">\(PIW = 2z_{1-\frac{\alpha}{2}}\sigma \sqrt{1+\frac{1}{N}}\)</span>. Now, we can express sample size as a function of <span class="math inline">\(PIW\)</span>, <span class="math inline">\(\alpha\)</span> as: <span class="math inline">\(N = \left(  (\frac{PIW}{2z_{1-\frac{\alpha}{2}}})^{2}-1 \right)^{-1}\)</span>. If there are <span class="math inline">\(K\)</span> partitions, we need to estimate that many <span class="math inline">\(\mu_k\)</span>s. So, the total sample size will be <span class="math inline">\(NK\)</span> assuming all partitions have same variance. If not, is is not hard to update the formula. In someways, the model complexity is captured by <span class="math inline">\(K\)</span>. In general, one does not know these numbers in advance and has to make an educated guess based on domain knowledge and refine the design inputs as the data collection drive is set in motion.</p>
 <p>What about the <strong>classification</strong> problem?</p>
 <p>Assume it is a binary classification problem. Approach is still identical. Even in the classification setting, estimating the mean and taking <em>argmax</em> to predict the label of the partition is still useful and applicable except that the <span class="math inline">\(PI\)</span> formula needs to be updated. For other types of <em>Tasks</em>, suitable estimate of the target varaible has to be chosen, and derive its PI, and use it to get an estimate of the sample size.</p>
 </section>