diff --git a/_freeze/21-inference-paired-means/execute-results/html.json b/_freeze/21-inference-paired-means/execute-results/html.json index c1c00e26..8376c1bc 100644 --- a/_freeze/21-inference-paired-means/execute-results/html.json +++ b/_freeze/21-inference-paired-means/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "015d419d2712b0ed5e2f4fb1b321bb29", "result": { "engine": "knitr", - "markdown": "# Inference for comparing paired means {#sec-inference-paired-means}\n\n\n\n\n\n::: {.chapterintro data-latex=\"\"}\nIn [Chapter -@sec-inference-two-means] analysis was done to compare the average population value across two different groups.\nRecall that one of the important conditions in doing a two-sample analysis is that the two groups are independent.\nHere, independence across groups means that knowledge of the observations in one group does not change what we would expect to happen in the other group.\nBut what happens if the groups are **dependent**?\nSometimes dependency is not something that can be addressed through a statistical method.\nHowever, a particular dependency, **pairing**, can be modeled quite effectively using many of the same tools we have already covered in this text.\n:::\n\nPaired data represent a particular type of experimental structure where the analysis is somewhat akin to a one-sample analysis (see [Chapter -@sec-inference-one-mean]) but has other features that resemble a two-sample analysis (see [Chapter -@sec-inference-two-means]).\nAs with a two-sample analysis, quantitative measurements are made on each of two different levels of the explanatory variable.\nHowever, because the observational unit is **paired** across the two groups, the two measurements are subtracted such that only the difference is retained.\n@tbl-pairedexamples presents some examples of studies where paired designs were implemented.\n\n\n::: {#tbl-pairedexamples .cell tbl-cap='Examples of studies where a paired design is used to measure the\ndifference in the measurement over two conditions.'}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Observational unit Comparison groups Measurement Value of interest
Car Smooth Turn vs Quick Spin amount of tire tread after 1,000 miles difference in tread
Textbook UCLA vs Amazon price of new text difference in price
Individual person Pre-course vs Post-course exam score difference in score
\n\n`````\n:::\n:::\n\n\n::: {.important data-latex=\"\"}\n**Paired data.**\n\nTwo sets of observations are *paired* if each observation in one set has a special correspondence or connection with exactly one observation in the other dataset.\n:::\n\nIt is worth noting that if mathematical modeling is chosen as the analysis tool, paired data inference on the difference in measurements will be identical to the one-sample mathematical techniques described in [Chapter -@sec-inference-one-mean].\nHowever, recall from [Chapter -@sec-inference-one-mean] that with pure one-sample data, the computational tools for hypothesis testing are not easy to implement and were not presented (although the bootstrap was presented as a computational approach for constructing a one sample confidence interval).\nWith paired data, the randomization test fits nicely with the structure of the experiment and is presented here.\n\n\n\n\n\n## Randomization test for the mean paired difference\n\nConsider an experiment done to measure whether tire brand Smooth Turn or tire brand Quick Spin has longer tread wear (in cm).\nThat is, after 1,000 miles on a car, which brand of tires has more tread, on average?\n\n### Observed data\n\nThe observed data represent 25 tread measurements (in cm) taken on 25 tires of Smooth Turn and 25 tires of Quick Spin.\nThe study used a total of 25 cars, so on each car, one tire was of Smooth Turn and one was of Quick Spin.\nThe mean tread for the Quick Spin tires was 0.308 cm and the mean tread for the Smooth Turn tires was 0.310 cm.\n@fig-tiredata presents the observed data, calculations on tread remaining (in cm).\n\nThe Smooth Turn manufacturer looks at the box plots and says:\n\n> *Clearly the tread on Smooth Turn tires is higher, on average, than the tread on Quick Spin tires after 1,000 miles of driving.*\n\nThe Quick Spin manufacturer is skeptical and retorts:\n\n> *But with only 25 cars, it seems that the variability in road conditions (sometimes one tire hits a pothole, etc.) could be what leads to the small difference in average tread amount.*\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Boxplots of the tire tread data (in cm) and the brand of tire from which\nthe original measurements came.\n](21-inference-paired-means_files/figure-html/fig-tiredata-1.png){#fig-tiredata fig-alt='Boxplots of the amount of tire tread for each of the two\nbrands of tires with data values superimposed over the boxplots.\nEach superimposed dot represents a car that drove with both types\nof tires. A grey line connects each car across the two boxplots\nindicating that Smooth Turn has more tire wear than Quick Spin.' width=90%}\n:::\n:::\n\n\nWe'd like to be able to systematically distinguish between what the Smooth Turn manufacturer sees in the plot and what the Quick Spin manufacturer sees in the plot.\nFortunately for us, we have an excellent way to simulate the natural variability (from road conditions, etc.) that can lead to tires being worn at different rates.\n\n### Variability of the statistic\n\nA randomization test will identify whether the differences seen in the box plot of the original data in @fig-tiredata could have happened just by chance variability.\nAs before, we will simulate the variability in the study under the assumption that the null hypothesis is true.\nIn this study, the null hypothesis is that average tire tread wear is the same across Smooth Turn and Quick Spin tires.\n\n- $H_0: \\mu_{diff} = 0,$ the average tread wear is the same for the two tire brands.\n- $H_A: \\mu_{diff} \\ne 0,$ the average tread wear is different across the two tire brands.\n\nWhen observations are paired, the randomization process randomly assigns the tire brand to each of the observed tread values.\nNote that in the randomization test for the two-sample mean setting (see @sec-rand2mean) the explanatory variable was *also* randomly assigned to the responses.\nThe change in the paired setting, however, is that the assignment happens *within* an observational unit (here, a car).\nRemember, if the null hypothesis is true, it will not matter which brand is put on which tire because the overall tread wear will be the same across pairs.\n\n@fig-tiredata4 and @fig-tiredata5 show that the random assignment of group (tire brand) happens within a single car.\nThat is, every single car will still have one tire of each type.\nIn the first randomization, it just so happens that the 4th car's tire brands were swapped and the 5th car's tire brands were not swapped.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![\"The 4th car: the tire brand was randomly permuted, and in the randomization\ncalculation, the measurements (in cm) ended up in different groups.\"\n](21-inference-paired-means_files/figure-html/fig-tiredata4-1.png){#fig-tiredata4 fig-alt='Line plot connecting the tread for the 4th car in the dataset. The first\nplot is the original data and the second plot is the permuted data where\nthe groups happened to get permuted randomly.' width=90%}\n:::\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![\"The 5th car: the tire brand was randomly permuted to stay the same! In the\nrandomization calculation, the measurements (in cm) ended up in the\noriginal groups.\"\n](21-inference-paired-means_files/figure-html/fig-tiredata5-1.png){#fig-tiredata5 fig-alt='Line plot connecting the tread for the 5th car in the dataset. The first\nplot is the original data and the second plot is the permuted data where\nthe groups happened to stay connected to the original measurements.' width=90%}\n:::\n:::\n\n\nWe can put the shuffled assignments for all the cars into one plot as seen in @fig-tiredataPerm.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![\"Tire tread data (in cm) with: the brand of tire from which the original\nmeasurements came (left) and shuffled brand assignment (right). As\nevidenced by the colors, some of the cars kept their original tire\nassignments and some cars swapped the tire assignments.\"\n](21-inference-paired-means_files/figure-html/fig-tiredataPerm-1.png){#fig-tiredataPerm fig-alt='Line plot connecting the tread for the all of the cars in the\ndataset. The first plot is the original data and the second plot is\nthe permuted data where some of the brands are connect to the\noriginal tread measurements and some of the brands have been swapped\nacross the two tread measurements, within a car.\n' width=100%}\n:::\n:::\n\n\nThe next step in the randomization test is to sort the brands so that the assigned brand value on the x-axis aligns with the assigned group from the randomization.\nSee @fig-tiredataPermSort which has the same randomized groups (right image in @fig-tiredataPerm and left image in @fig-tiredataPermSort) as seen previously.\nHowever, the right image in @fig-tiredataPermSort sorts the randomized groups so that we can measure the variability across groups as compared to the variability within groups.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Tire tread from (left) randomized brand assignment, (right) sorted\nby randomized brand.\n](21-inference-paired-means_files/figure-html/fig-tiredataPermSort-1.png){#fig-tiredataPermSort fig-alt='Scatterplot of tread on the y axis and tire brand on the\nx axis. The left panel has the observed tread values matched to\nthe original tire brand for the x axis location. The\nobservations are colored based on the permuted tire brand\nassigned in the randomization. The right panel has the tread\nvalues matched to the permuted tire brand, so some observations\nhave swapped orientation. The observations are also colored by\nthe permuted tire brand assigned in the randomization. In the\nright panel, the two tire brands seem equivalent with respect\nto tire wear.' width=100%}\n:::\n:::\n\n\n@fig-tiredatarand1 presents a second randomization of the data.\nNotice how the two observations from the same car are linked by a grey line; some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![A second randomization where the brand is randomly swapped (or not)\nacross the two tread wear measurements (in cm) from the same car.\n](21-inference-paired-means_files/figure-html/fig-tiredatarand1-1.png){#fig-tiredatarand1 fig-alt='Boxplots and scatterplot with tire brand on the x-axis and\ntreat on the y-axis. The points are assigned to the x-axis brand\ngiven by the permutation, but the plot differs from previous figures\nin that it is a second permutation of the brands. Again, the two\npermuted brands seem equivalent with respect to tire wear.' width=90%}\n:::\n:::\n\n\n@fig-tiredatarand2 presents yet another randomization of the data.\nAgain, the same observations are linked by a grey line, and some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![An additional randomization where the brand is randomly swapped (or not)\nacross the two tread wear measurements (in cm) from the same car.\n](21-inference-paired-means_files/figure-html/fig-tiredatarand2-1.png){#fig-tiredatarand2 fig-alt='Boxplots and scatterplot with tire brand on the x-axis and\ntreat on the y-axis. The points are assigned to the x-axis brand\ngiven by the permutation, but the plot differs from previous figures\nin that it is a second permutation of the brands. The additional\npermutation demonstrates that the boxplots\ncontinue to change for each permutation yet the tire tread is\nequivalent across the permuted groups.' width=90%}\n:::\n:::\n\n\n### Observed statistic vs. null statistics\n\nBy repeating the randomization process, we can create a distribution of the average of the differences in tire treads, as seen in @fig-pair-randomize.\nAs expected (because the differences were generated under the null hypothesis), the center of the histogram is zero.\nA line has been drawn at the observed difference which is well outside the majority of the null differences simulated from natural variability by mixing up which the tire received Smooth Turn and which received Quick Spin.\nBecause the observed statistic is so far away from the natural variability of the randomized differences, we are convinced that there is a difference between Smooth Turn and Quick Spin.\nOur conclusion is that the extra amount of average tire tread in Smooth Turn is due to more than just natural variability: we reject $H_0$ and conclude that $\\mu_{ST} \\ne \\mu_{QS}.$\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![Histogram of 1,000 mean differences with tire brand randomly assigned across\nthe two tread measurements (in cm) per pair.\n](21-inference-paired-means_files/figure-html/fig-pair-randomize-1.png){#fig-pair-randomize fig-alt='Histogram of the average difference in tire wear, Quick Spin\nminus Smooth Turn over 1000 different permutations. The histogram is\ncentered at 0 and spreads to approximately -0.0025 and 0.0025. A\nred line at approximately 0.002 indicates the observed difference in\naverage trend from the original data.' width=90%}\n:::\n:::\n\n\n## Bootstrap confidence interval for the mean paired difference\n\nFor both the bootstrap and the mathematical models applied to paired data, the analysis is virtually identical to the one-sample approach given in [Chapter -@sec-inference-one-mean].\nThe key to working with paired data (for bootstrapping and mathematical approaches) is to consider the measurement of interest to be the difference in measured values across the pair of observations.\n\n\n\n\n\n### Observed data\n\nIn an earlier edition of this textbook, we found that Amazon prices were, on average, lower than those of the UCLA Bookstore for UCLA courses in 2010.\nIt's been several years, and many stores have adapted to the online market, so we wondered, how is the UCLA Bookstore doing today?\n\nWe sampled 201 UCLA courses.\nOf those, 68 required books could be found on Amazon.\nA portion of the dataset from these courses is shown in @tbl-textbooksDF, where prices are in US dollars.\n\n::: {.data data-latex=\"\"}\nThe [`ucla_textbooks_f18`](http://openintrostat.github.io/openintro/reference/ucla_textbooks_f18.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n:::\n\n\n::: {#tbl-textbooksDF .cell tbl-cap='Four cases from the `ucla_textbooks_f18` dataset.'}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
subject course_num bookstore_new amazon_new price_diff
American Indian Studies M10 48.0 47.5 0.52
Anthropology 2 14.3 13.6 0.71
Arts and Architecture 10 13.5 12.5 0.97
Asian M60W 49.3 55.0 -5.69
\n\n`````\n:::\n:::\n\n\n\\index{paired data}\n\nEach textbook has two corresponding prices in the dataset: one for the UCLA Bookstore and one for Amazon.\nWhen two sets of observations have this special correspondence, they are said to be **paired**.\n\n### Variability of the statistic\n\nFollowing the example of bootstrapping the one-sample statistic, the observed *differences* can be bootstrapped in order to understand the variability of the average difference from sample to sample.\nRemember, the differences act as a single value to bootstrap.\nThat is, the original dataset would include the list of 68 price differences, and each resample will also include 68 price differences (some repeated through the bootstrap resampling process).\nThe bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in @sec-boot1mean.\n\nIn @fig-pairboot, two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated.\nThe bootstrap percentile confidence interval is computing using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\\$0.25, \\$7.87).\n\n::: {.guidedpractice data-latex=\"\"}\nUsing the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\\bar{x}_{diff}.$[^21-inference-paired-means-1]\n:::\n\n[^21-inference-paired-means-1]: The bootstrapped differences in sample means vary roughly from 0.7 to 7.5, a range of \\$6.80.\n Although the bootstrap distribution is not symmetric, we use the empirical rule (that with bell-shaped distributions, most observations are within two standard errors of the center), the standard error of the mean differences is approximately \\$1.70.\n You might note that the standard error calculation given in @sec-mathpaired is $SE(\\bar{x}_{diff}) = \\sqrt{s^2_{diff}/n_{diff}}\\\\ = \\sqrt{13.4^2/68} = \\$1.62$ (values from @sec-mathpaired), very close to the bootstrap approximation.\n\nThe bootstrap SE interval is found by computing the SE of the bootstrapped differences $(SE_{\\overline{x}_{diff}} = \\$1.64)$ and the normal multiplier of $z^{\\star} = 2.58.$ The averaged difference is $\\bar{x} = \\$3.58.$ The 99% confidence interval is: $\\$3.58 \\pm 2.58 \\times \\$ 1.64 = (\\$-0.65, \\$7.81).$\n\nThe confidence intervals seem to indicate that the UCLA bookstore price is, on average, higher than the Amazon price, as the majority of the confidence interval is positive.\nHowever, if the analysis required a strong degree of certainty (e.g., 99% confidence), and the bootstrap SE interval was most appropriate (given a second course in statistics the nuances of the methods can be investigated), the results of which book seller is higher is not well determined (because the bootstrap SE interval overlaps zero).\nThat is, the 99% bootstrap SE interval gives potential for UCLA to be lower, on average, than Amazon (because of the possible negative values for the true mean difference in price).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Bootstrap distribution for the average difference in new book\nprice at the UCLA bookstore versus Amazon. 99% confidence intervals\nare superimposed using blue dashed (bootstrap percentile interval) and\nred dotted (bootstrap SE interval) lines.\n](21-inference-paired-means_files/figure-html/fig-pairboot-1.png){#fig-pairboot fig-alt='Histogram showing the distribution of the average\nbootstrapped difference of price, UCLA minus Amazon. The center of\nthe distribution is given at approximately $3.75. Two bootstrap\nintervals are given. The percentile interval is approximately $0.25\nto $8.50. The SE interval is approximately -$0.50 to $7.75.' width=90%}\n:::\n:::\n\n\n\\clearpage\n\n## Mathematical model for the mean paired difference {#sec-mathpaired}\n\nThinking about the differences as a single observation on an observational unit changes the paired setting into the one-sample setting.\nThe mathematical model for the one-sample case is covered in @sec-one-mean-math.\n\n### Observed data\n\nTo analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations.\nIn the textbook data, we look at the differences in prices, which is represented as the `price_difference` variable in the dataset.\nHere the differences are taken as\n\n$$\\text{UCLA Bookstore price} - \\text{Amazon price}$$\n\nIt is important that we always subtract using a consistent order; here Amazon prices are always subtracted from UCLA prices.\nThe first difference shown in @tbl-textbooksDF is computed as $47.97 - 47.45 = 0.52.$ Similarly, the second difference is computed as $14.26 - 13.55 = 0.71,$ and the third is $13.50 - 12.53 = 0.97.$ A histogram of the differences is shown in @fig-diffInTextbookPricesF18.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Histogram of the difference in price for each book sampled.](21-inference-paired-means_files/figure-html/fig-diffInTextbookPricesF18-1.png){#fig-diffInTextbookPricesF18 fig-alt='Histogram of the difference in price for each book samples,\nUCLA minus Amazon. The prices range from -$10 to $80 with a strong\nright skew.' width=90%}\n:::\n:::\n\n\n### Variability of the statistic\n\nTo analyze a paired dataset, we simply analyze the differences.\n@tbl-textbooksSummaryStats provides the data summaries from the textbook data.\nNote that instead of reporting the prices separately for UCLA and Amazon, the summary statistics are given by the mean of the differences, the standard deviation of the differences, and the total number of pairs (i.e., differences).\nThe parameter of interest is also a single value, $\\mu_{diff},$ so we can use the same $t$-distribution techniques we applied in @sec-one-mean-math directly onto the observed differences.\n\n\n::: {#tbl-textbooksSummaryStats .cell tbl-cap='Summary statistics for the 68 price differences.'}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n\n \n \n \n \n \n\n
n Mean SD
68 3.58 13.4
\n\n`````\n:::\n:::\n\n\n::: {.workedexample data-latex=\"\"}\nSet up a hypothesis test to determine whether, on average, there is a difference between Amazon's price for a book and the UCLA bookstore's price.\nAlso, check the conditions for whether we can move forward with the test using the $t$-distribution.\n\n------------------------------------------------------------------------\n\nWe are considering two scenarios: there is no difference or there is some difference in average prices.\n\n- $H_0:$ $\\mu_{diff} = 0.$ There is no difference in the average textbook price.\n\n- $H_A:$ $\\mu_{diff} \\neq 0.$ There is a difference in average prices.\n\nNext, we check the independence and normality conditions.\nThe observations are based on a simple random sample, so assuming the textbooks are independent seems reasonable.\nWhile there are some outliers, $n = 68$ and none of the outliers are particularly extreme, so the normality of $\\bar{x}$ is satisfied.\nWith these conditions satisfied, we can move forward with the $t$-distribution.\n:::\n\n### Observed statistic vs. null statistics\n\nAs mentioned previously, the methods applied to a difference will be identical to the one-sample techniques.\nTherefore, the full hypothesis test framework is presented as guided practices.\n\n::: {.important data-latex=\"\"}\n**The test statistic for assessing a paired mean is a T.**\n\nThe T score is a ratio of how the sample mean difference varies from zero as compared to how the observations vary.\n\n$$T = \\frac{\\bar{x}_{diff} - 0 }{s_{diff}/\\sqrt{n_{diff}}}$$\n\nWhen the null hypothesis is true and the conditions are met, T has a t-distribution with $df = n_{diff} - 1.$\n\nConditions:\n\n- Independently sampled pairs.\n- Large samples and no extreme outliers.\n:::\n\n\n\n\n\n::: {.workedexample data-latex=\"\"}\nComplete the hypothesis test started in the previous Example.\n\n------------------------------------------------------------------------\n\nTo compute the test compute the standard error associated with $\\bar{x}_{diff}$ using the standard deviation of the differences $(s_{diff} = 13.42)$ and the number of differences $(n_{diff} = 68):$\n\n$$SE_{\\bar{x}_{diff}} = \\frac{s_{diff}}{\\sqrt{n_{diff}}} = \\frac{13.42}{\\sqrt{68}} = 1.63$$\n\nThe test statistic is the T score of $\\bar{x}_{diff}$ under the null condition that the actual mean difference is 0:\n\n$$T = \\frac{\\bar{x}_{diff} - 0}{SE_{\\bar{x}_{diff}}} = \\frac{3.58 - 0}{1.63} = 2.20$$\n\nTo visualize the p-value, the sampling distribution of $\\bar{x}_{diff}$ is drawn as though $H_0$ is true, and the p-value is represented by the two shaded tails in the figure below.\nThe degrees of freedom is $df = 68 - 1 = 67.$ Using statistical software, we find the one-tail area of 0.0156.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](21-inference-paired-means_files/figure-html/fig-textbooksF18HTTails-1.png){#fig-textbooksF18HTTails width=60%}\n:::\n:::\n\n\nDoubling this area gives the p-value: 0.0312.\nBecause the p-value is less than 0.05, we reject the null hypothesis.\nAmazon prices are, on average, lower than the UCLA Bookstore prices for UCLA courses.\n:::\n\nRecall that the margin of error is defined by the standard error.\nThe margin of error for $\\bar{x}_{diff}$ can be directly obtained from $SE(\\bar{x}_{diff}).$\n\n::: {.important data-latex=\"\"}\n**Margin of error for** $\\bar{x}_{diff}.$\n\nThe margin of error is $t^\\star_{df} \\times s_{diff}/\\sqrt{n_{diff}}$ where $t^\\star_{df}$ is calculated from a specified percentile on the t-distribution with *df* degrees of freedom.\n:::\n\n::: {.workedexample data-latex=\"\"}\nCreate a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.\n\n------------------------------------------------------------------------\n\nConditions have already verified and the standard error computed in a previous Example.\\\nTo find the confidence interval, identify $t^{\\star}_{67}$ using statistical software or the $t$-table $(t^{\\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula:\n\n$$\n\\begin{aligned}\n\\text{point estimate} \\ &\\pm \\ t^{\\star}_{67} \\ \\times \\ SE \\\\\n3.58 \\ &\\pm \\ 2.00 \\ \\times \\ 1.63 \\\\\n(0.32 \\ &, \\ 6.84)\n\\end{aligned}\n$$\n\nWe are 95% confident that the UCLA Bookstore is, on average, between \\$0.32 and \\$6.84 more expensive than Amazon for UCLA course books.\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nWe have convincing evidence that Amazon is, on average, less expensive.\nHow should this conclusion affect UCLA student buying habits?\nShould UCLA students always buy their books on Amazon?[^21-inference-paired-means-2]\n:::\n\n[^21-inference-paired-means-2]: The average price difference is only mildly useful for this question.\n Examine the distribution shown in @fig-diffInTextbookPricesF18.\n There are certainly a handful of cases where Amazon prices are far below the UCLA Bookstore's, which suggests it is worth checking Amazon (and probably other online sites) before purchasing.\n However, in many cases the Amazon price is above what the UCLA Bookstore charges, and most of the time the price isn't that different.\n Ultimately, if getting a book immediately from the bookstore is notably more convenient, e.g., to get started on reading or homework, it's likely a good idea to go with the UCLA Bookstore unless the price difference on a specific book happens to be quite large.\n For reference, this is a very different result from what we (the authors) had seen in a similar dataset from 2010.\n At that time, Amazon prices were almost uniformly lower than those of the UCLA Bookstore's and by a large margin, making the case to use Amazon over the UCLA Bookstore quite compelling at that time.\n Now we frequently check multiple websites to find the best price.\n\n\\index{paired}\n\nA small note on the power of the paired t-test (recall the discussion of power in @sec-pow). It turns out that the paired t-test given here is often more powerful than the independent t-test discussed in @sec-math2samp. That said, depending on how the data are collected, we don't always have mechanism for pairing the data and reducing the inherent variability across observations.\n\n\n\n\n\n\\clearpage\n\n## Chapter review {#sec-chp21-review}\n\n### Summary\n\nLike the two independent sample procedures in [Chapter -@sec-inference-two-means], the paired difference analysis can be done using a t-distribution.\nThe randomization test applied to the paired differences is slightly different, however.\nNote that when randomizing under the paired setting, each null statistic is created by randomly assigning the group to a numerical outcome **within** the individual observational unit.\nThe procedure for creating a confidence interval for the paired difference is almost identical to the confidence intervals created in [Chapter -@sec-inference-one-mean] for a single mean.\n\n### Terms\n\nWe introduced the following terms in the chapter.\nIf you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.\nWe are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.\nHowever, you should be able to easily spot them as **bolded text**.\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n\n
bootstrap CI paired difference paired difference CI T score paired difference
paired data paired difference t-test
\n\n`````\n:::\n:::\n\n\n\\clearpage\n\n## Exercises {#sec-chp21-exercises}\n\nAnswers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-21].\n\n::: {.exercises data-latex=\"\"}\n1. **Air quality.** Air quality measurements were collected in a random sample of 25 country capitals in 2013, and then again in the same cities in 2014.\n We would like to use these data to compare average air quality between the two years.\n Should we use a paired or non-paired test?\n Explain your reasoning.\n\n2. **True / False: paired.** Determine if the following statements are true or false.\n If false, explain.\n\n a. In a paired analysis we first take the difference of each pair of observations, and then we do inference on these differences.\n\n b. Two datasets of different sizes cannot be analyzed as paired data.\n\n c. Consider two sets of data that are paired with each other.\n Each observation in one dataset has a natural correspondence with exactly one observation from the other dataset.\n\n d. Consider two sets of data that are paired with each other.\n Each observation in one dataset is subtracted from the average of the other dataset's observations.\n\n3. **Paired or not? I.** In each of the following scenarios, determine if the data are paired.\n\n a. Compare pre- (beginning of semester) and post-test (end of semester) scores of students.\n\n b. Assess gender-related salary gap by comparing salaries of randomly sampled men and women.\n\n c. Compare artery thicknesses at the beginning of a study and after 2 years of taking Vitamin E for the same group of patients.\n\n d. Assess effectiveness of a diet regimen by comparing the before and after weights of subjects.\n\n4. **Paired or not? II.** In each of the following scenarios, determine if the data are paired.\n\n a. We would like to know if Intel's stock and Southwest Airlines' stock have similar rates of return.\n To find out, we take a random sample of 50 days, and record Intel's and Southwest's stock on those same days.\n\n b. We randomly sample 50 items from Target stores and note the price for each.\n Then we visit Walmart and collect the price for each of those same 50 items.\n\n c. A school board would like to determine whether there is a difference in average SAT scores for students at one high school versus another high school in the district.\n To check, they take a simple random sample of 100 students from each high school.\n\n5. **Sample size and pairing.** Determine if the following statement is true or false, and if false, explain your reasoning: If comparing means of two groups with equal sample sizes, always use a paired test.\n\n6. **High School and Beyond, randomization test.** The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects.\n Here we examine a simple random sample of 200 students from this survey.\n Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.\n Also provided below is a histogram of randomized averages of paired differences of scores (read - write), with the observed difference ($\\bar{x}_{read-write} = -0.545$) marked with a red vertical line.\n The randomization distribution was produced by doing the following 1000 times: for each student, the two scores were randomly assigned to either read or write, and the average was taken across all students in the sample.[^_21-ex-inference-paired-means-1]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-22-1.png){width=100%}\n :::\n :::\n\n a. Is there a clear difference in the average reading and writing scores?\n\n b. Are the reading and writing scores of each student independent of each other?\n\n c. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?\n\n d. Is the average of the observed difference in scores $(\\bar{x}_{read-write} = -0.545)$ consistent with the distribution of randomized average differences?\n Explain.\n\n e. Do these data provide convincing evidence of a difference between the average scores on the two exams?\n Estimate the p-value from the randomization test, and conclude the hypothesis test using words like \"score on reading test\" and \"score on writing test.\"\n\n7. **Forest management.** Forest rangers wanted to better understand the rate of growth for younger trees in the park.\n They took measurements of a random sample of 50 young trees in 2009 and again measured those same trees in 2019.\n The data below summarize their measurements, where the heights are in feet.\n\n ::: {.cell}\n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Year Mean SD n
2009 12.0 3.5 50
2019 24.5 9.5 50
Difference 12.5 7.2 50
\n \n `````\n :::\n :::\n\n Construct a 99% confidence interval for the average growth of (what had been) younger trees in the park over 2009-2019.\n\n8. **High School and Beyond, bootstrap interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey.\n The mean and standard deviation of the differences are $\\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points.\n The bootstrap distribution below was produced by bootstrapping from the sample of differences in reading and writing scores 1,000 times.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-24-1.png){width=90%}\n :::\n :::\n\n a. Find an approximate 95% bootstrap percentile confidence interval for the true average difference in scores (read - write).\n\n b. Find an approximate 95% bootstrap SE confidence interval for the true average difference in scores (read - write).\n\n c. Interpret both confidence intervals using words like \"population\" and \"score\".\n\n d. From the confidence intervals calculated above, does it appear that there is a discernible difference in reading and writing scores, on average?\n\n9. **Possible paired randomized differences.** Data were collected on five people.\n\n ::: {.cell}\n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Person 1 Person 2 Person 3 Person 4 Person 5
Observation 1 3 14 4 5 10
Observation 2 7 3 6 5 9
Difference -4 11 -2 0 1
\n \n `````\n :::\n :::\n\n Which of the following could be a possible randomization of the paired differences given above?\n If the set of values could not be a randomized set of differences, indicate why not.\n\n a. -2, 1, 1, 11, -2\n\n b. -4 11 -2 0 1\n\n c. -2, 2, -11, 11, -2, 2, 0, 1, -1\n\n d. 0, -1, 2, -4, 11\n\n e. 4, -11, 2, 0, -1\n\n10. **Study environment.** In order to test the effects of listening to music while studying versus studying in silence, students agree to be randomized to two treatments (i.e., study with music or study in silence).\n There are two exams during the semester, so the researchers can either randomize the students to have one exam with music and one with silence (randomly selecting which exam corresponds to which study environment) or the researchers can randomize the students to one study habit for both exams.\n\n The researchers are interested in estimating the true population difference of exam score for those who listen to music while studying as compared to those who study in silence.\n\n a. Describe the experiment which is consistent with a paired designed experiment.\n How is the treatment assigned, and how are the data collected such that the observations are paired?\n\n b. Describe the experiment which is consistent with an indpenedent samples experiment.\n How is the treatment assigned, and how are the data collected such that the observations are independent?\n\n11. **Global warming, randomization test.** Let's consider a limited set of climate data, examining temperature differences in 1950 vs 2022.\n We sampled 26 locations in the US from the National Oceanic and Atmospheric Administration's (NOAA) historical data, where the data was available for both years of interest.\n [@webpage:noaa19482018] The data are not a random sample, but they are selected to be a representative sample across the land area of the lower 48 United States.\n Using the hottest day of the year as a measure can make the results susceptible to outliers.\n Instead, to get a sense for how hot a year was, we calculate the 90$^{th}$ percentile; that is, we find the maximum temperature on the day that was hotter than 90% of the days that year.\n We want to know: is the 90$^{th}$ percentile high temperature greater in 2022 or in 1950?\n The difference in 90$^{th}$ percentile high temperature (high temperature for 2022 - high temperature for 1950) was calculated for each of the 26 locations.\n The average of the 26 differences was 2.52$^\\circ$F with a standard deviation of 2.95$^\\circ$F.\n We are interested in determining whether these data provide strong evidence that the 90$^{th}$ percentile high temperature is higher in 2022 than in 1950.[^_21-ex-inference-paired-means-2]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-26-1.png){width=90%}\n :::\n :::\n\n a. Create hypotheses appropriate for the following research question: is there an evident difference in the 90$^{th}$ percentile high temp across the two years (1950 and 2022)?\n\n b. Is the average of the observed difference in scores $(\\bar{x}_{2022-1950} = 2.53$\\^\\circ$F)$ consistent with the distribution of randomized average differences?\n Explain.\n\n c. Do these data provide convincing evidence of a difference between the 90$^{th}$ percentile high temperature?\n Estimate the p-value from the randomization test, and conclude the hypothesis test using words like \"90$^{th}$ percentile high temperature in 1950\" and \"90$^{th}$ percentile high temperature in 2022.\"\n\n12. **Global warming, bootstrap interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database.\n [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\\circ$F and 2.95$^\\circ$F.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-27-1.png){width=90%}\n :::\n :::\n\n a. Calculate a 90% bootstrap percentile confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022.\n\n b. Calculate a 90% bootstrap SE confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022.\n\n c. Interpret both intervals in context.\n\n d. Do the confidence intervals provide convincing evidence that there were hotter high temperatures in 2022 than in 1950 at NOAA stations?\n Explain your reasoning.\n\n13. **Global warming, mathematical test.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database.\n [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\\circ$F and 2.95$^\\circ$F.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-28-1.png){width=90%}\n :::\n :::\n\n a. Is there a relationship between the observations collected in 1950 and 2022?\n Or are the observations in the two groups independent?\n Explain.\n\n b. Write hypotheses for this research in symbols and in words.\n\n c. Check the conditions required to complete this test.\n A histogram of the differences is given.\n\n d. Calculate the test statistic and find the p-value.\n\n e. Use $\\alpha = 0.05$ to evaluate the test, and interpret your conclusion in context.\n\n f. What type of error might we have made?\n Explain in context what the error means.\n\n g. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the 90$^{th}$ percentile high temperture from 1950 to 2022 to include 0?\n Explain your reasoning.\n\n14. **High School and Beyond, mathematical test.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey.\n\n a. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?\n\n b. Check the conditions required to complete this test.\n\n c. The average observed difference in scores is $\\bar{x}_{read-write} = -0.545$, and the standard deviation of the differences is $s_{read-write} = 8.887$ points.\n Do these data provide convincing evidence of a difference between the average scores on the two exams?\n\n d. What type of error might we have made?\n Explain what the error means in the context of the application.\n\n e. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0?\n Explain your reasoning.\n\n15. **Global warming, mathematical interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database.\n [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\\circ$F and 2.95$^\\circ$F.\n\n a. Calculate a 90% confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022.\n We've already checked the conditions for you.\n\n b. Interpret the interval in context.\n\n c. Does the confidence interval provide convincing evidence that there were hotter high temperatures in 2022 than in 1950 at NOAA stations?\n Explain your reasoning.\n\n16. **High school and beyond, mathematical interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey.\n The mean and standard deviation of the differences are $\\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points.\n\n a. Calculate a 95% confidence interval for the average difference between the reading and writing scores of all students.\n\n b. Interpret this interval in context.\n\n c. Does the confidence interval provide convincing evidence that there is a real difference in the average scores?\n Explain.\n\n17. **Friday the 13th, traffic.** In the early 1990's, researchers in the UK collected data on traffic flow on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day.\n The histograms below show the distributions of numbers of cars passing by a specific intersection on Friday the 6th and Friday the 13th for many such date pairs.\n Also provided are some sample statistics, where the difference is the number of cars on the 6th minus the number of cars on the 13th.[^_21-ex-inference-paired-means-3]\n [@Scanlon:1993]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-29-1.png){width=100%}\n :::\n \n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
n Mean SD
sixth 10 128,385 7,259
thirteenth 10 126,550 7,664
diff 10 1,836 1,176
\n \n `````\n :::\n :::\n\n a. Are there any underlying structures in these data that should be considered in an analysis?\n Explain.\n\n b. What are the hypotheses for evaluating whether the number of people out on Friday the 6$^{\\text{th}}$ is different than the number out on Friday the 13$^{\\text{th}}$?\n\n c. Check conditions to carry out the hypothesis test from part (b) using mathematical models.\n\n d. Calculate the test statistic and the p-value.\n\n e. What is the conclusion of the hypothesis test?\n\n f. Interpret the p-value in this context.\n\n g. What type of error might have been made in the conclusion of your test?\n Explain.\n\n18. **Friday the 13th, accidents.** In the early 1990's, researchers in the UK collected data the number of traffic accident related emergency room (ER) admissions on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day.\n The histograms below show the distributions of numbers of ER admissions at specific emergency rooms on Friday the 6th and Friday the 13th for many such date pairs.\n Also provided are some sample statistics, where the difference is the ER admissions on the 6th minus the ER admissions on the 13th.[@Scanlon:1993]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-30-1.png){width=100%}\n :::\n \n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
n Mean SD
sixth 6 8 3
thirteenth 6 11 4
diff 6 -3 3
\n \n `````\n :::\n :::\n\n a. Conduct a hypothesis test using mathematical models to evaluate if there is a difference between the average numbers of traffic accident related emergency room admissions between Friday the 6$^{\\text{th}}$ and Friday the 13$^{\\text{th}}$.\n\n b. Calculate a 95% confidence interval using mathematical models for the difference between the average numbers of traffic accident related emergency room admissions between Friday the 6$^{\\text{th}}$ and Friday the 13$^{\\text{th}}$.\n\n c. The conclusion of the original study states, \"Friday 13th is unlucky for some. The risk of hospital admission as a result of a transport accident may be increased by as much as 52%. Staying at home is recommended.\" Do you agree with this statement?\n Explain your reasoning.\n\n[^_21-ex-inference-paired-means-1]: The [`hsb2`](http://openintrostat.github.io/openintro/reference/hsb2.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n\n[^_21-ex-inference-paired-means-2]: The [`us_temperature`](http://openintrostat.github.io/openintro/reference/us_temperature.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n\n[^_21-ex-inference-paired-means-3]: The [`friday`](http://openintrostat.github.io/openintro/reference/friday.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n\n\n:::\n", + "markdown": "# Inference for comparing paired means {#sec-inference-paired-means}\n\n\n\n\n\n::: {.chapterintro data-latex=\"\"}\nIn [Chapter -@sec-inference-two-means] analysis was done to compare the average population value across two different groups.\nRecall that one of the important conditions in doing a two-sample analysis is that the two groups are independent.\nHere, independence across groups means that knowledge of the observations in one group does not change what we would expect to happen in the other group.\nBut what happens if the groups are **dependent**?\nSometimes dependency is not something that can be addressed through a statistical method.\nHowever, a particular dependency, **pairing**, can be modeled quite effectively using many of the same tools we have already covered in this text.\n:::\n\nPaired data represent a particular type of experimental structure where the analysis is somewhat akin to a one-sample analysis (see [Chapter -@sec-inference-one-mean]) but has other features that resemble a two-sample analysis (see [Chapter -@sec-inference-two-means]).\nAs with a two-sample analysis, quantitative measurements are made on each of two different levels of the explanatory variable.\nHowever, because the observational unit is **paired** across the two groups, the two measurements are subtracted such that only the difference is retained.\n@tbl-pairedexamples presents some examples of studies where paired designs were implemented.\n\n\n::: {#tbl-pairedexamples .cell tbl-cap='Examples of studies where a paired design is used to measure the\ndifference in the measurement over two conditions.'}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
Observational unit Comparison groups Measurement Value of interest
Car Smooth Turn vs Quick Spin amount of tire tread after 1,000 miles difference in tread
Textbook UCLA vs Amazon price of new text difference in price
Individual person Pre-course vs Post-course exam score difference in score
\n\n`````\n:::\n:::\n\n\n::: {.important data-latex=\"\"}\n**Paired data.**\n\nTwo sets of observations are *paired* if each observation in one set has a special correspondence or connection with exactly one observation in the other dataset.\n:::\n\nIt is worth noting that if mathematical modeling is chosen as the analysis tool, paired data inference on the difference in measurements will be identical to the one-sample mathematical techniques described in [Chapter -@sec-inference-one-mean].\nHowever, recall from [Chapter -@sec-inference-one-mean] that with pure one-sample data, the computational tools for hypothesis testing are not easy to implement and were not presented (although the bootstrap was presented as a computational approach for constructing a one sample confidence interval).\nWith paired data, the randomization test fits nicely with the structure of the experiment and is presented here.\n\n\n\n\n\n## Randomization test for the mean paired difference\n\nConsider an experiment done to measure whether tire brand Smooth Turn or tire brand Quick Spin has longer tread wear (in cm).\nThat is, after 1,000 miles on a car, which brand of tires has more tread, on average?\n\n### Observed data\n\nThe observed data represent 25 tread measurements (in cm) taken on 25 tires of Smooth Turn and 25 tires of Quick Spin.\nThe study used a total of 25 cars, so on each car, one tire was of Smooth Turn and one was of Quick Spin.\nThe mean tread for the Quick Spin tires was 0.308 cm and the mean tread for the Smooth Turn tires was 0.310 cm.\n@fig-tiredata presents the observed data, calculations on tread remaining (in cm).\n\nThe Smooth Turn manufacturer looks at the box plots and says:\n\n> *Clearly the tread on Smooth Turn tires is higher, on average, than the tread on Quick Spin tires after 1,000 miles of driving.*\n\nThe Quick Spin manufacturer is skeptical and retorts:\n\n> *But with only 25 cars, it seems that the variability in road conditions (sometimes one tire hits a pothole, etc.) could be what leads to the small difference in average tread amount.*\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Boxplots of the tire tread data (in cm) and the brand of tire from which\nthe original measurements came.\n](21-inference-paired-means_files/figure-html/fig-tiredata-1.png){#fig-tiredata fig-alt='Boxplots of the amount of tire tread for each of the two\nbrands of tires with data values superimposed over the boxplots.\nEach superimposed dot represents a car that drove with both types\nof tires. A grey line connects each car across the two boxplots\nindicating that Smooth Turn has more tire wear than Quick Spin.' width=90%}\n:::\n:::\n\n\nWe'd like to be able to systematically distinguish between what the Smooth Turn manufacturer sees in the plot and what the Quick Spin manufacturer sees in the plot.\nFortunately for us, we have an excellent way to simulate the natural variability (from road conditions, etc.) that can lead to tires being worn at different rates.\n\n### Variability of the statistic\n\nA randomization test will identify whether the differences seen in the box plot of the original data in @fig-tiredata could have happened just by chance variability.\nAs before, we will simulate the variability in the study under the assumption that the null hypothesis is true.\nIn this study, the null hypothesis is that average tire tread wear is the same across Smooth Turn and Quick Spin tires.\n\n- $H_0: \\mu_{diff} = 0,$ the average tread wear is the same for the two tire brands.\n- $H_A: \\mu_{diff} \\ne 0,$ the average tread wear is different across the two tire brands.\n\nWhen observations are paired, the randomization process randomly assigns the tire brand to each of the observed tread values.\nNote that in the randomization test for the two-sample mean setting (see @sec-rand2mean) the explanatory variable was *also* randomly assigned to the responses.\nThe change in the paired setting, however, is that the assignment happens *within* an observational unit (here, a car).\nRemember, if the null hypothesis is true, it will not matter which brand is put on which tire because the overall tread wear will be the same across pairs.\n\n@fig-tiredata4 and @fig-tiredata5 show that the random assignment of group (tire brand) happens within a single car.\nThat is, every single car will still have one tire of each type.\nIn the first randomization, it just so happens that the 4th car's tire brands were swapped and the 5th car's tire brands were not swapped.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![\"The 4th car: the tire brand was randomly permuted, and in the randomization\ncalculation, the measurements (in cm) ended up in different groups.\"\n](21-inference-paired-means_files/figure-html/fig-tiredata4-1.png){#fig-tiredata4 fig-alt='Line plot connecting the tread for the 4th car in the dataset. The first\nplot is the original data and the second plot is the permuted data where\nthe groups happened to get permuted randomly.' width=90%}\n:::\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![\"The 5th car: the tire brand was randomly permuted to stay the same! In the\nrandomization calculation, the measurements (in cm) ended up in the\noriginal groups.\"\n](21-inference-paired-means_files/figure-html/fig-tiredata5-1.png){#fig-tiredata5 fig-alt='Line plot connecting the tread for the 5th car in the dataset. The first\nplot is the original data and the second plot is the permuted data where\nthe groups happened to stay connected to the original measurements.' width=90%}\n:::\n:::\n\n\nWe can put the shuffled assignments for all the cars into one plot as seen in @fig-tiredataPerm.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![\"Tire tread data (in cm) with: the brand of tire from which the original\nmeasurements came (left) and shuffled brand assignment (right). As\nevidenced by the colors, some of the cars kept their original tire\nassignments and some cars swapped the tire assignments.\"\n](21-inference-paired-means_files/figure-html/fig-tiredataPerm-1.png){#fig-tiredataPerm fig-alt='Line plot connecting the tread for the all of the cars in the\ndataset. The first plot is the original data and the second plot is\nthe permuted data where some of the brands are connect to the\noriginal tread measurements and some of the brands have been swapped\nacross the two tread measurements, within a car.\n' width=100%}\n:::\n:::\n\n\nThe next step in the randomization test is to sort the brands so that the assigned brand value on the x-axis aligns with the assigned group from the randomization.\nSee @fig-tiredataPermSort which has the same randomized groups (right image in @fig-tiredataPerm and left image in @fig-tiredataPermSort) as seen previously.\nHowever, the right image in @fig-tiredataPermSort sorts the randomized groups so that we can measure the variability across groups as compared to the variability within groups.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Tire tread from (left) randomized brand assignment, (right) sorted\nby randomized brand.\n](21-inference-paired-means_files/figure-html/fig-tiredataPermSort-1.png){#fig-tiredataPermSort fig-alt='Scatterplot of tread on the y axis and tire brand on the\nx axis. The left panel has the observed tread values matched to\nthe original tire brand for the x axis location. The\nobservations are colored based on the permuted tire brand\nassigned in the randomization. The right panel has the tread\nvalues matched to the permuted tire brand, so some observations\nhave swapped orientation. The observations are also colored by\nthe permuted tire brand assigned in the randomization. In the\nright panel, the two tire brands seem equivalent with respect\nto tire wear.' width=100%}\n:::\n:::\n\n\n@fig-tiredatarand1 presents a second randomization of the data.\nNotice how the two observations from the same car are linked by a grey line; some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![A second randomization where the brand is randomly swapped (or not)\nacross the two tread wear measurements (in cm) from the same car.\n](21-inference-paired-means_files/figure-html/fig-tiredatarand1-1.png){#fig-tiredatarand1 fig-alt='Boxplots and scatterplot with tire brand on the x-axis and\ntreat on the y-axis. The points are assigned to the x-axis brand\ngiven by the permutation, but the plot differs from previous figures\nin that it is a second permutation of the brands. Again, the two\npermuted brands seem equivalent with respect to tire wear.' width=90%}\n:::\n:::\n\n\n@fig-tiredatarand2 presents yet another randomization of the data.\nAgain, the same observations are linked by a grey line, and some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![An additional randomization where the brand is randomly swapped (or not)\nacross the two tread wear measurements (in cm) from the same car.\n](21-inference-paired-means_files/figure-html/fig-tiredatarand2-1.png){#fig-tiredatarand2 fig-alt='Boxplots and scatterplot with tire brand on the x-axis and\ntreat on the y-axis. The points are assigned to the x-axis brand\ngiven by the permutation, but the plot differs from previous figures\nin that it is a second permutation of the brands. The additional\npermutation demonstrates that the boxplots\ncontinue to change for each permutation yet the tire tread is\nequivalent across the permuted groups.' width=90%}\n:::\n:::\n\n\n### Observed statistic vs. null statistics\n\nBy repeating the randomization process, we can create a distribution of the average of the differences in tire treads, as seen in @fig-pair-randomize.\nAs expected (because the differences were generated under the null hypothesis), the center of the histogram is zero.\nA line has been drawn at the observed difference which is well outside the majority of the null differences simulated from natural variability by mixing up which the tire received Smooth Turn and which received Quick Spin.\nBecause the observed statistic is so far away from the natural variability of the randomized differences, we are convinced that there is a difference between Smooth Turn and Quick Spin.\nOur conclusion is that the extra amount of average tire tread in Smooth Turn is due to more than just natural variability: we reject $H_0$ and conclude that $\\mu_{ST} \\ne \\mu_{QS}.$\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![Histogram of 1,000 mean differences with tire brand randomly assigned across\nthe two tread measurements (in cm) per pair.\n](21-inference-paired-means_files/figure-html/fig-pair-randomize-1.png){#fig-pair-randomize fig-alt='Histogram of the average difference in tire wear, Quick Spin\nminus Smooth Turn over 1000 different permutations. The histogram is\ncentered at 0 and spreads to approximately -0.0025 and 0.0025. A\nred line at approximately 0.002 indicates the observed difference in\naverage trend from the original data.' width=90%}\n:::\n:::\n\n\n## Bootstrap confidence interval for the mean paired difference\n\nFor both the bootstrap and the mathematical models applied to paired data, the analysis is virtually identical to the one-sample approach given in [Chapter -@sec-inference-one-mean].\nThe key to working with paired data (for bootstrapping and mathematical approaches) is to consider the measurement of interest to be the difference in measured values across the pair of observations.\n\n\n\n\n\n### Observed data\n\nIn an earlier edition of this textbook, we found that Amazon prices were, on average, lower than those of the UCLA Bookstore for UCLA courses in 2010.\nIt's been several years, and many stores have adapted to the online market, so we wondered, how is the UCLA Bookstore doing today?\n\nWe sampled 201 UCLA courses.\nOf those, 68 required books could be found on Amazon.\nA portion of the dataset from these courses is shown in @tbl-textbooksDF, where prices are in US dollars.\n\n::: {.data data-latex=\"\"}\nThe [`ucla_textbooks_f18`](http://openintrostat.github.io/openintro/reference/ucla_textbooks_f18.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n:::\n\n\n::: {#tbl-textbooksDF .cell tbl-cap='Four cases from the `ucla_textbooks_f18` dataset.'}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n
subject course_num bookstore_new amazon_new price_diff
American Indian Studies M10 48.0 47.5 0.52
Anthropology 2 14.3 13.6 0.71
Arts and Architecture 10 13.5 12.5 0.97
Asian M60W 49.3 55.0 -5.69
\n\n`````\n:::\n:::\n\n\n\\index{paired data}\n\nEach textbook has two corresponding prices in the dataset: one for the UCLA Bookstore and one for Amazon.\nWhen two sets of observations have this special correspondence, they are said to be **paired**.\n\n### Variability of the statistic\n\nFollowing the example of bootstrapping the one-sample statistic, the observed *differences* can be bootstrapped in order to understand the variability of the average difference from sample to sample.\nRemember, the differences act as a single value to bootstrap.\nThat is, the original dataset would include the list of 68 price differences, and each resample will also include 68 price differences (some repeated through the bootstrap resampling process).\nThe bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in @sec-boot1mean.\n\nIn @fig-pairboot, two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated.\nThe bootstrap percentile confidence interval is computing using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\\$0.25, \\$7.87).\n\n::: {.guidedpractice data-latex=\"\"}\nUsing the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\\bar{x}_{diff}.$[^21-inference-paired-means-1]\n:::\n\n[^21-inference-paired-means-1]: The bootstrapped differences in sample means vary roughly from 0.7 to 7.5, a range of \\$6.80.\n Although the bootstrap distribution is not symmetric, we use the empirical rule (that with bell-shaped distributions, most observations are within two standard errors of the center), the standard error of the mean differences is approximately \\$1.70.\n You might note that the standard error calculation given in @sec-mathpaired is $SE(\\bar{x}_{diff}) = \\sqrt{s^2_{diff}/n_{diff}}\\\\ = \\sqrt{13.4^2/68} = \\$1.62$ (values from @sec-mathpaired), very close to the bootstrap approximation.\n\nThe bootstrap SE interval is found by computing the SE of the bootstrapped differences $(SE_{\\overline{x}_{diff}} = \\$1.64)$ and the normal multiplier of $z^{\\star} = 2.58.$ The averaged difference is $\\bar{x} = \\$3.58.$ The 99% confidence interval is: $\\$3.58 \\pm 2.58 \\times \\$ 1.64 = (\\$-0.65, \\$7.81).$\n\nThe confidence intervals seem to indicate that the UCLA bookstore price is, on average, higher than the Amazon price, as the majority of the confidence interval is positive.\nHowever, if the analysis required a strong degree of certainty (e.g., 99% confidence), and the bootstrap SE interval was most appropriate (given a second course in statistics the nuances of the methods can be investigated), the results of which book seller is higher is not well determined (because the bootstrap SE interval overlaps zero).\nThat is, the 99% bootstrap SE interval gives potential for UCLA to be lower, on average, than Amazon (because of the possible negative values for the true mean difference in price).\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Bootstrap distribution for the average difference in new book\nprice at the UCLA bookstore versus Amazon. 99% confidence intervals\nare superimposed using blue dashed (bootstrap percentile interval) and\nred dotted (bootstrap SE interval) lines.\n](21-inference-paired-means_files/figure-html/fig-pairboot-1.png){#fig-pairboot fig-alt='Histogram showing the distribution of the average\nbootstrapped difference of price, UCLA minus Amazon. The center of\nthe distribution is given at approximately $3.75. Two bootstrap\nintervals are given. The percentile interval is approximately $0.25\nto $8.50. The SE interval is approximately -$0.50 to $7.75.' width=90%}\n:::\n:::\n\n\n\\clearpage\n\n## Mathematical model for the mean paired difference {#sec-mathpaired}\n\nThinking about the differences as a single observation on an observational unit changes the paired setting into the one-sample setting.\nThe mathematical model for the one-sample case is covered in @sec-one-mean-math.\n\n### Observed data\n\nTo analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations.\nIn the textbook data, we look at the differences in prices, which is represented as the `price_difference` variable in the dataset.\nHere the differences are taken as\n\n$$\\text{UCLA Bookstore price} - \\text{Amazon price}$$\n\nIt is important that we always subtract using a consistent order; here Amazon prices are always subtracted from UCLA prices.\nThe first difference shown in @tbl-textbooksDF is computed as $47.97 - 47.45 = 0.52.$ Similarly, the second difference is computed as $14.26 - 13.55 = 0.71,$ and the third is $13.50 - 12.53 = 0.97.$ A histogram of the differences is shown in @fig-diffInTextbookPricesF18.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Histogram of the difference in price for each book sampled.](21-inference-paired-means_files/figure-html/fig-diffInTextbookPricesF18-1.png){#fig-diffInTextbookPricesF18 fig-alt='Histogram of the difference in price for each book samples,\nUCLA minus Amazon. The prices range from -$10 to $80 with a strong\nright skew.' width=90%}\n:::\n:::\n\n\n### Variability of the statistic\n\nTo analyze a paired dataset, we simply analyze the differences.\n@tbl-textbooksSummaryStats provides the data summaries from the textbook data.\nNote that instead of reporting the prices separately for UCLA and Amazon, the summary statistics are given by the mean of the differences, the standard deviation of the differences, and the total number of pairs (i.e., differences).\nThe parameter of interest is also a single value, $\\mu_{diff},$ so we can use the same $t$-distribution techniques we applied in @sec-one-mean-math directly onto the observed differences.\n\n\n::: {#tbl-textbooksSummaryStats .cell tbl-cap='Summary statistics for the 68 price differences.'}\n::: {.cell-output-display}\n`````{=html}\n\n \n \n \n \n \n \n \n\n \n \n \n \n \n\n
n Mean SD
68 3.58 13.4
\n\n`````\n:::\n:::\n\n\n::: {.workedexample data-latex=\"\"}\nSet up a hypothesis test to determine whether, on average, there is a difference between Amazon's price for a book and the UCLA bookstore's price.\nAlso, check the conditions for whether we can move forward with the test using the $t$-distribution.\n\n------------------------------------------------------------------------\n\nWe are considering two scenarios: there is no difference or there is some difference in average prices.\n\n- $H_0:$ $\\mu_{diff} = 0.$ There is no difference in the average textbook price.\n\n- $H_A:$ $\\mu_{diff} \\neq 0.$ There is a difference in average prices.\n\nNext, we check the independence and normality conditions.\nThe observations are based on a simple random sample, so assuming the textbooks are independent seems reasonable.\nWhile there are some outliers, $n = 68$ and none of the outliers are particularly extreme, so the normality of $\\bar{x}$ is satisfied.\nWith these conditions satisfied, we can move forward with the $t$-distribution.\n:::\n\n### Observed statistic vs. null statistics\n\nAs mentioned previously, the methods applied to a difference will be identical to the one-sample techniques.\nTherefore, the full hypothesis test framework is presented as guided practices.\n\n::: {.important data-latex=\"\"}\n**The test statistic for assessing a paired mean is a T.**\n\nThe T score is a ratio of how the sample mean difference varies from zero as compared to how the observations vary.\n\n$$T = \\frac{\\bar{x}_{diff} - 0 }{s_{diff}/\\sqrt{n_{diff}}}$$\n\nWhen the null hypothesis is true and the conditions are met, T has a t-distribution with $df = n_{diff} - 1.$\n\nConditions:\n\n- Independently sampled pairs.\n- Large samples and no extreme outliers.\n:::\n\n\n\n\n\n::: {.workedexample data-latex=\"\"}\nComplete the hypothesis test started in the previous Example.\n\n------------------------------------------------------------------------\n\nTo compute the test compute the standard error associated with $\\bar{x}_{diff}$ using the standard deviation of the differences $(s_{diff} = 13.42)$ and the number of differences $(n_{diff} = 68):$\n\n$$SE_{\\bar{x}_{diff}} = \\frac{s_{diff}}{\\sqrt{n_{diff}}} = \\frac{13.42}{\\sqrt{68}} = 1.63$$\n\nThe test statistic is the T score of $\\bar{x}_{diff}$ under the null condition that the actual mean difference is 0:\n\n$$T = \\frac{\\bar{x}_{diff} - 0}{SE_{\\bar{x}_{diff}}} = \\frac{3.58 - 0}{1.63} = 2.20$$\n\nTo visualize the p-value, the sampling distribution of $\\bar{x}_{diff}$ is drawn as though $H_0$ is true, and the p-value is represented by the two shaded tails in the figure below.\nThe degrees of freedom is $df = 68 - 1 = 67.$ Using statistical software, we find the one-tail area of 0.0156.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](21-inference-paired-means_files/figure-html/fig-textbooksF18HTTails-1.png){#fig-textbooksF18HTTails width=60%}\n:::\n:::\n\n\nDoubling this area gives the p-value: 0.0312.\nBecause the p-value is less than 0.05, we reject the null hypothesis.\nAmazon prices are, on average, lower than the UCLA Bookstore prices for UCLA courses.\n:::\n\nRecall that the margin of error is defined by the standard error.\nThe margin of error for $\\bar{x}_{diff}$ can be directly obtained from $SE(\\bar{x}_{diff}).$\n\n::: {.important data-latex=\"\"}\n**Margin of error for** $\\bar{x}_{diff}.$\n\nThe margin of error is $t^\\star_{df} \\times s_{diff}/\\sqrt{n_{diff}}$ where $t^\\star_{df}$ is calculated from a specified percentile on the t-distribution with *df* degrees of freedom.\n:::\n\n::: {.workedexample data-latex=\"\"}\nCreate a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.\n\n------------------------------------------------------------------------\n\nConditions have already verified and the standard error computed in a previous Example.\\\nTo find the confidence interval, identify $t^{\\star}_{67}$ using statistical software or the $t$-table $(t^{\\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula:\n\n$$\n\\begin{aligned}\n\\text{point estimate} \\ &\\pm \\ t^{\\star}_{67} \\ \\times \\ SE \\\\\n3.58 \\ &\\pm \\ 2.00 \\ \\times \\ 1.63 \\\\\n(0.32 \\ &, \\ 6.84)\n\\end{aligned}\n$$\n\nWe are 95% confident that the UCLA Bookstore is, on average, between \\$0.32 and \\$6.84 more expensive than Amazon for UCLA course books.\n:::\n\n::: {.guidedpractice data-latex=\"\"}\nWe have convincing evidence that Amazon is, on average, less expensive.\nHow should this conclusion affect UCLA student buying habits?\nShould UCLA students always buy their books on Amazon?[^21-inference-paired-means-2]\n:::\n\n[^21-inference-paired-means-2]: The average price difference is only mildly useful for this question.\n Examine the distribution shown in @fig-diffInTextbookPricesF18.\n There are certainly a handful of cases where Amazon prices are far below the UCLA Bookstore's, which suggests it is worth checking Amazon (and probably other online sites) before purchasing.\n However, in many cases the Amazon price is above what the UCLA Bookstore charges, and most of the time the price isn't that different.\n Ultimately, if getting a book immediately from the bookstore is notably more convenient, e.g., to get started on reading or homework, it's likely a good idea to go with the UCLA Bookstore unless the price difference on a specific book happens to be quite large.\n For reference, this is a very different result from what we (the authors) had seen in a similar dataset from 2010.\n At that time, Amazon prices were almost uniformly lower than those of the UCLA Bookstore's and by a large margin, making the case to use Amazon over the UCLA Bookstore quite compelling at that time.\n Now we frequently check multiple websites to find the best price.\n\n\\index{paired}\n\nA small note on the power of the paired t-test (recall the discussion of power in @sec-pow). It turns out that the paired t-test given here is often more powerful than the independent t-test discussed in @sec-math2samp. That said, depending on how the data are collected, we don't always have mechanism for pairing the data and reducing the inherent variability across observations.\n\n\n\n\n\n\\clearpage\n\n## Chapter review {#sec-chp21-review}\n\n### Summary\n\nLike the two independent sample procedures in [Chapter -@sec-inference-two-means], the paired difference analysis can be done using a t-distribution.\nThe randomization test applied to the paired differences is slightly different, however.\nNote that when randomizing under the paired setting, each null statistic is created by randomly assigning the group to a numerical outcome **within** the individual observational unit.\nThe procedure for creating a confidence interval for the paired difference is almost identical to the confidence intervals created in [Chapter -@sec-inference-one-mean] for a single mean.\n\n### Terms\n\nWe introduced the following terms in the chapter.\nIf you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.\nWe are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.\nHowever, you should be able to easily spot them as **bolded text**.\n\n\n::: {.cell}\n::: {.cell-output-display}\n`````{=html}\n\n\n \n \n \n \n \n \n \n \n \n \n\n
bootstrap CI paired difference paired difference CI T score paired difference
paired data paired difference t-test
\n\n`````\n:::\n:::\n\n\n\\clearpage\n\n## Exercises {#sec-chp21-exercises}\n\nAnswers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-21].\n\n::: {.exercises data-latex=\"\"}\n1. **Air quality.** Air quality measurements were collected in a random sample of 25 country capitals in 2013, and then again in the same cities in 2014.\n We would like to use these data to compare average air quality between the two years.\n Should we use a paired or non-paired test?\n Explain your reasoning.\n\n2. **True / False: paired.** Determine if the following statements are true or false.\n If false, explain.\n\n a. In a paired analysis we first take the difference of each pair of observations, and then we do inference on these differences.\n\n b. Two datasets of different sizes cannot be analyzed as paired data.\n\n c. Consider two sets of data that are paired with each other.\n Each observation in one dataset has a natural correspondence with exactly one observation from the other dataset.\n\n d. Consider two sets of data that are paired with each other.\n Each observation in one dataset is subtracted from the average of the other dataset's observations.\n\n3. **Paired or not? I.** In each of the following scenarios, determine if the data are paired.\n\n a. Compare pre- (beginning of semester) and post-test (end of semester) scores of students.\n\n b. Assess gender-related salary gap by comparing salaries of randomly sampled men and women.\n\n c. Compare artery thicknesses at the beginning of a study and after 2 years of taking Vitamin E for the same group of patients.\n\n d. Assess effectiveness of a diet regimen by comparing the before and after weights of subjects.\n\n4. **Paired or not? II.** In each of the following scenarios, determine if the data are paired.\n\n a. We would like to know if Intel's stock and Southwest Airlines' stock have similar rates of return.\n To find out, we take a random sample of 50 days, and record Intel's and Southwest's stock on those same days.\n\n b. We randomly sample 50 items from Target stores and note the price for each.\n Then we visit Walmart and collect the price for each of those same 50 items.\n\n c. A school board would like to determine whether there is a difference in average SAT scores for students at one high school versus another high school in the district.\n To check, they take a simple random sample of 100 students from each high school.\n\n5. **Sample size and pairing.** Determine if the following statement is true or false, and if false, explain your reasoning: If comparing means of two groups with equal sample sizes, always use a paired test.\n\n6. **High School and Beyond, randomization test.** The National Center of Education Statistics conducted a survey of high school seniors, collecting test data on reading, writing, and several other subjects.\n Here we examine a simple random sample of 200 students from this survey.\n Side-by-side box plots of reading and writing scores as well as a histogram of the differences in scores are shown below.\n Also provided below is a histogram of randomized averages of paired differences of scores (read - write), with the observed difference ($\\bar{x}_{read-write} = -0.545$) marked with a red vertical line.\n The randomization distribution was produced by doing the following 1000 times: for each student, the two scores were randomly assigned to either read or write, and the average was taken across all students in the sample.[^_21-ex-inference-paired-means-1]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-22-1.png){width=100%}\n :::\n :::\n\n a. Is there a clear difference in the average reading and writing scores?\n\n b. Are the reading and writing scores of each student independent of each other?\n\n c. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?\n\n d. Is the average of the observed difference in scores $(\\bar{x}_{read-write} = -0.545)$ consistent with the distribution of randomized average differences?\n Explain.\n\n e. Do these data provide convincing evidence of a difference between the average scores on the two exams?\n Estimate the p-value from the randomization test, and conclude the hypothesis test using words like \"score on reading test\" and \"score on writing test.\"\n\n7. **Global warming, randomization test.** Let's consider a limited set of climate data, examining temperature differences in 1950 vs 2022.\n We sampled 26 locations in the US from the National Oceanic and Atmospheric Administration's (NOAA) historical data, where the data was available for both years of interest.\n [@webpage:noaa19482018] The data are not a random sample, but they are selected to be a representative sample across the land area of the lower 48 United States.\n Using the hottest day of the year as a measure can make the results susceptible to outliers.\n Instead, to get a sense for how hot a year was, we calculate the 90$^{th}$ percentile; that is, we find the maximum temperature on the day that was hotter than 90% of the days that year.\n We want to know: is the 90$^{th}$ percentile high temperature greater in 2022 or in 1950?\n The difference in 90$^{th}$ percentile high temperature (high temperature for 2022 - high temperature for 1950) was calculated for each of the 26 locations.\n The average of the 26 differences was 2.52$^\\circ$F with a standard deviation of 2.95$^\\circ$F.\n We are interested in determining whether these data provide strong evidence that the 90$^{th}$ percentile high temperature is higher in 2022 than in 1950.[^_21-ex-inference-paired-means-2]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-23-1.png){width=90%}\n :::\n :::\n\n a. Create hypotheses appropriate for the following research question: is there an evident difference in the 90$^{th}$ percentile high temp across the two years (1950 and 2022)?\n\n b. Is the average of the observed difference in scores $(\\bar{x}_{2022-1950} = 2.53$\\^\\circ$F)$ consistent with the distribution of randomized average differences?\n Explain.\n\n c. Do these data provide convincing evidence of a difference between the 90$^{th}$ percentile high temperature?\n Estimate the p-value from the randomization test, and conclude the hypothesis test using words like \"90$^{th}$ percentile high temperature in 1950\" and \"90$^{th}$ percentile high temperature in 2022.\"\n\n8. **High School and Beyond, bootstrap interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey.\n The mean and standard deviation of the differences are $\\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points.\n The bootstrap distribution below was produced by bootstrapping from the sample of differences in reading and writing scores 1,000 times.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-24-1.png){width=90%}\n :::\n :::\n\n a. Find an approximate 95% bootstrap percentile confidence interval for the true average difference in scores (read - write).\n\n b. Find an approximate 95% bootstrap SE confidence interval for the true average difference in scores (read - write).\n\n c. Interpret both confidence intervals using words like \"population\" and \"score\".\n\n d. From the confidence intervals calculated above, does it appear that there is a discernible difference in reading and writing scores, on average?\n\n9. **Global warming, bootstrap interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database.\n [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\\circ$F and 2.95$^\\circ$F.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-25-1.png){width=90%}\n :::\n :::\n\n a. Calculate a 90% bootstrap percentile confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022.\n\n b. Calculate a 90% bootstrap SE confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022.\n\n c. Interpret both intervals in context.\n\n d. Do the confidence intervals provide convincing evidence that there were hotter high temperatures in 2022 than in 1950 at NOAA stations?\n Explain your reasoning.\n\n10. **High School and Beyond, mathematical test.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey.\n\n a. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam?\n\n b. Check the conditions required to complete this test.\n\n c. The average observed difference in scores is $\\bar{x}_{read-write} = -0.545$, and the standard deviation of the differences is $s_{read-write} = 8.887$ points.\n Do these data provide convincing evidence of a difference between the average scores on the two exams?\n\n d. What type of error might we have made?\n Explain what the error means in the context of the application.\n\n e. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0?\n Explain your reasoning.\n\n11. **Global warming, mathematical test.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database.\n [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\\circ$F and 2.95$^\\circ$F.\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-26-1.png){width=90%}\n :::\n :::\n\n a. Is there a relationship between the observations collected in 1950 and 2022?\n Or are the observations in the two groups independent?\n Explain.\n\n b. Write hypotheses for this research in symbols and in words.\n\n c. Check the conditions required to complete this test.\n A histogram of the differences is given.\n\n d. Calculate the test statistic and find the p-value.\n\n e. Use $\\alpha = 0.05$ to evaluate the test, and interpret your conclusion in context.\n\n f. What type of error might we have made?\n Explain in context what the error means.\n\n g. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the 90$^{th}$ percentile high temperture from 1950 to 2022 to include 0?\n Explain your reasoning.\n\n12. **High school and beyond, mathematical interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey.\n The mean and standard deviation of the differences are $\\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points.\n\n a. Calculate a 95% confidence interval for the average difference between the reading and writing scores of all students.\n\n b. Interpret this interval in context.\n\n c. Does the confidence interval provide convincing evidence that there is a real difference in the average scores?\n Explain.\n\n13. **Global warming, mathematical interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database.\n [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\\circ$F and 2.95$^\\circ$F.\n\n a. Calculate a 90% confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022.\n We've already checked the conditions for you.\n\n b. Interpret the interval in context.\n\n c. Does the confidence interval provide convincing evidence that there were hotter high temperatures in 2022 than in 1950 at NOAA stations?\n Explain your reasoning.\n\n14. **Possible paired randomized differences.** Data were collected on five people.\n\n ::: {.cell}\n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Person 1 Person 2 Person 3 Person 4 Person 5
Observation 1 3 14 4 5 10
Observation 2 7 3 6 5 9
Difference -4 11 -2 0 1
\n \n `````\n :::\n :::\n\n Which of the following could be a possible randomization of the paired differences given above?\n If the set of values could not be a randomized set of differences, indicate why not.\n\n a. -2, 1, 1, 11, -2\n\n b. -4 11 -2 0 1\n\n c. -2, 2, -11, 11, -2, 2, 0, 1, -1\n\n d. 0, -1, 2, -4, 11\n\n e. 4, -11, 2, 0, -1\n\n15. **Study environment.** In order to test the effects of listening to music while studying versus studying in silence, students agree to be randomized to two treatments (i.e., study with music or study in silence).\n There are two exams during the semester, so the researchers can either randomize the students to have one exam with music and one with silence (randomly selecting which exam corresponds to which study environment) or the researchers can randomize the students to one study habit for both exams.\n\n The researchers are interested in estimating the true population difference of exam score for those who listen to music while studying as compared to those who study in silence.\n\n a. Describe the experiment which is consistent with a paired designed experiment.\n How is the treatment assigned, and how are the data collected such that the observations are paired?\n\n b. Describe the experiment which is consistent with an indpenedent samples experiment.\n How is the treatment assigned, and how are the data collected such that the observations are independent?\n\n16. **Friday the 13th, traffic.** In the early 1990's, researchers in the UK collected data on traffic flow on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day.\n The histograms below show the distributions of numbers of cars passing by a specific intersection on Friday the 6th and Friday the 13th for many such date pairs.\n Also provided are some sample statistics, where the difference is the number of cars on the 6th minus the number of cars on the 13th.[^_21-ex-inference-paired-means-3]\n [@Scanlon:1993]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-28-1.png){width=100%}\n :::\n \n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
n Mean SD
sixth 10 128,385 7,259
thirteenth 10 126,550 7,664
diff 10 1,836 1,176
\n \n `````\n :::\n :::\n\n a. Are there any underlying structures in these data that should be considered in an analysis?\n Explain.\n\n b. What are the hypotheses for evaluating whether the number of people out on Friday the 6$^{\\text{th}}$ is different than the number out on Friday the 13$^{\\text{th}}$?\n\n c. Check conditions to carry out the hypothesis test from part (b) using mathematical models.\n\n d. Calculate the test statistic and the p-value.\n\n e. What is the conclusion of the hypothesis test?\n\n f. Interpret the p-value in this context.\n\n g. What type of error might have been made in the conclusion of your test?\n Explain.\n\n17. **Friday the 13th, accidents.** In the early 1990's, researchers in the UK collected data the number of traffic accident related emergency room (ER) admissions on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day.\n The histograms below show the distributions of numbers of ER admissions at specific emergency rooms on Friday the 6th and Friday the 13th for many such date pairs.\n Also provided are some sample statistics, where the difference is the ER admissions on the 6th minus the ER admissions on the 13th.[@Scanlon:1993]\n\n ::: {.cell}\n ::: {.cell-output-display}\n ![](21-inference-paired-means_files/figure-html/unnamed-chunk-29-1.png){width=100%}\n :::\n \n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
n Mean SD
sixth 6 8 3
thirteenth 6 11 4
diff 6 -3 3
\n \n `````\n :::\n :::\n\n a. Conduct a hypothesis test using mathematical models to evaluate if there is a difference between the average numbers of traffic accident related emergency room admissions between Friday the 6$^{\\text{th}}$ and Friday the 13$^{\\text{th}}$.\n\n b. Calculate a 95% confidence interval using mathematical models for the difference between the average numbers of traffic accident related emergency room admissions between Friday the 6$^{\\text{th}}$ and Friday the 13$^{\\text{th}}$.\n\n c. The conclusion of the original study states, \"Friday 13th is unlucky for some. The risk of hospital admission as a result of a transport accident may be increased by as much as 52%. Staying at home is recommended.\" Do you agree with this statement?\n Explain your reasoning.\n\n18. **Forest management.** Forest rangers wanted to better understand the rate of growth for younger trees in the park.\n They took measurements of a random sample of 50 young trees in 2009 and again measured those same trees in 2019.\n The data below summarize their measurements, where the heights are in feet.\n\n ::: {.cell}\n ::: {.cell-output-display}\n `````{=html}\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Year Mean SD n
2009 12.0 3.5 50
2019 24.5 9.5 50
Difference 12.5 7.2 50
\n \n `````\n :::\n :::\n\n Construct a 99% confidence interval for the average growth of (what had been) younger trees in the park over 2009-2019.\n\n[^_21-ex-inference-paired-means-1]: The [`hsb2`](http://openintrostat.github.io/openintro/reference/hsb2.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n\n[^_21-ex-inference-paired-means-2]: The [`us_temperature`](http://openintrostat.github.io/openintro/reference/us_temperature.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n\n[^_21-ex-inference-paired-means-3]: The [`friday`](http://openintrostat.github.io/openintro/reference/friday.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.\n\n\n:::\n", "supporting": [ "21-inference-paired-means_files" ], diff --git a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-23-1.png b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-23-1.png new file mode 100644 index 00000000..ad347a31 Binary files /dev/null and b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-23-1.png differ diff --git a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-25-1.png b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-25-1.png new file mode 100644 index 00000000..b948614c Binary files /dev/null and b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-25-1.png differ diff --git a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-26-1.png b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-26-1.png index ad347a31..0511d22e 100644 Binary files a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-26-1.png and b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-26-1.png differ diff --git a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-28-1.png b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-28-1.png index 0511d22e..02124c4f 100644 Binary files a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-28-1.png and b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-28-1.png differ diff --git a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-29-1.png b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-29-1.png index 02124c4f..b8bd7357 100644 Binary files a/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-29-1.png and b/_freeze/21-inference-paired-means/figure-html/unnamed-chunk-29-1.png differ diff --git a/exercises/_21-ex-inference-paired-means.qmd b/exercises/_21-ex-inference-paired-means.qmd index 7fd181b0..7e22e418 100644 --- a/exercises/_21-ex-inference-paired-means.qmd +++ b/exercises/_21-ex-inference-paired-means.qmd @@ -97,107 +97,7 @@ e. Do these data provide convincing evidence of a difference between the average scores on the two exams? Estimate the p-value from the randomization test, and conclude the hypothesis test using words like "score on reading test" and "score on writing test." -7. **Forest management.** Forest rangers wanted to better understand the rate of growth for younger trees in the park. - They took measurements of a random sample of 50 young trees in 2009 and again measured those same trees in 2019. - The data below summarize their measurements, where the heights are in feet. - - ```{r} - library(tidyverse) - library(kableExtra) - - tribble( - ~Year, ~Mean, ~SD, ~n, - "2009", 12, 3.5, 50, - "2019", 24.5, 9.5, 50, - "Difference", 12.5, 7.2, 50 - ) %>% - kbl(linesep = "", booktabs = TRUE, align = "lccc") %>% - kable_styling(bootstrap_options = c("striped", "condensed"), - latex_options = "HOLD_position", - full_width = FALSE) %>% - column_spec(1:4, width = "5em") - ``` - - Construct a 99% confidence interval for the average growth of (what had been) younger trees in the park over 2009-2019. - -8. **High School and Beyond, bootstrap interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey. - The mean and standard deviation of the differences are $\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points. - The bootstrap distribution below was produced by bootstrapping from the sample of differences in reading and writing scores 1,000 times. - - ```{r} - library(tidyverse) - library(openintro) - library(patchwork) - - set.seed(1234) - hsb2 %>% - mutate(diff = read - write) %>% - specify(response = diff) %>% - generate(1000, type = "bootstrap") %>% - calculate(stat = "mean") %>% - ggplot(aes(x = stat)) + - geom_histogram(binwidth = 0.2, fill = IMSCOL["green", "full"]) + - labs( - title = "1,000 means of bootstrapped differences", - x = "Mean of bootstrapped difference scores\n(read - write)", - y = "Count" - ) - ``` - - a. Find an approximate 95% bootstrap percentile confidence interval for the true average difference in scores (read - write). - - b. Find an approximate 95% bootstrap SE confidence interval for the true average difference in scores (read - write). - - c. Interpret both confidence intervals using words like "population" and "score". - - d. From the confidence intervals calculated above, does it appear that there is a discernible difference in reading and writing scores, on average? - -9. **Possible paired randomized differences.** Data were collected on five people. - - ```{r} - library(knitr) - library(kableExtra) - library(tidyverse) - - tribble( - ~col1, ~col2, ~col3, ~col4, ~col5, ~col6, - "Observation 1", 3, 14, 4, 5, 10, - "Observation 2", 7, 3, 6, 5, 9, - "Difference", -4, 11, -2, 0, 1, - ) %>% - kbl(linesep = "", booktabs = TRUE, col.names = c("", "Person 1", "Person 2", "Person 3", "Person 4", "Person 5")) %>% - kable_styling( - bootstrap_options = c("striped", "condensed"), - latex_options = "HOLD_position", - full_width = FALSE - ) - ``` - - Which of the following could be a possible randomization of the paired differences given above? - If the set of values could not be a randomized set of differences, indicate why not. - - a. -2, 1, 1, 11, -2 - - b. -4 11 -2 0 1 - - c. -2, 2, -11, 11, -2, 2, 0, 1, -1 - - d. 0, -1, 2, -4, 11 - - e. 4, -11, 2, 0, -1 - -10. **Study environment.** In order to test the effects of listening to music while studying versus studying in silence, students agree to be randomized to two treatments (i.e., study with music or study in silence). - There are two exams during the semester, so the researchers can either randomize the students to have one exam with music and one with silence (randomly selecting which exam corresponds to which study environment) or the researchers can randomize the students to one study habit for both exams. - - The researchers are interested in estimating the true population difference of exam score for those who listen to music while studying as compared to those who study in silence. - - a. Describe the experiment which is consistent with a paired designed experiment. - How is the treatment assigned, and how are the data collected such that the observations are paired? - - b. Describe the experiment which is consistent with an indpenedent samples experiment. - How is the treatment assigned, and how are the data collected such that the observations are independent? - -11. **Global warming, randomization test.** Let's consider a limited set of climate data, examining temperature differences in 1950 vs 2022. +7. **Global warming, randomization test.** Let's consider a limited set of climate data, examining temperature differences in 1950 vs 2022. We sampled 26 locations in the US from the National Oceanic and Atmospheric Administration's (NOAA) historical data, where the data was available for both years of interest. [@webpage:noaa19482018] The data are not a random sample, but they are selected to be a representative sample across the land area of the lower 48 United States. Using the hottest day of the year as a measure can make the results susceptible to outliers. @@ -238,7 +138,39 @@ c. Do these data provide convincing evidence of a difference between the 90$^{th}$ percentile high temperature? Estimate the p-value from the randomization test, and conclude the hypothesis test using words like "90$^{th}$ percentile high temperature in 1950" and "90$^{th}$ percentile high temperature in 2022." -12. **Global warming, bootstrap interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database. +8. **High School and Beyond, bootstrap interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey. + The mean and standard deviation of the differences are $\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points. + The bootstrap distribution below was produced by bootstrapping from the sample of differences in reading and writing scores 1,000 times. + + ```{r} + library(tidyverse) + library(openintro) + library(patchwork) + + set.seed(1234) + hsb2 %>% + mutate(diff = read - write) %>% + specify(response = diff) %>% + generate(1000, type = "bootstrap") %>% + calculate(stat = "mean") %>% + ggplot(aes(x = stat)) + + geom_histogram(binwidth = 0.2, fill = IMSCOL["green", "full"]) + + labs( + title = "1,000 means of bootstrapped differences", + x = "Mean of bootstrapped difference scores\n(read - write)", + y = "Count" + ) + ``` + + a. Find an approximate 95% bootstrap percentile confidence interval for the true average difference in scores (read - write). + + b. Find an approximate 95% bootstrap SE confidence interval for the true average difference in scores (read - write). + + c. Interpret both confidence intervals using words like "population" and "score". + + d. From the confidence intervals calculated above, does it appear that there is a discernible difference in reading and writing scores, on average? + +9. **Global warming, bootstrap interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database. [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\circ$F and 2.95$^\circ$F. ```{r} @@ -272,7 +204,22 @@ d. Do the confidence intervals provide convincing evidence that there were hotter high temperatures in 2022 than in 1950 at NOAA stations? Explain your reasoning. -13. **Global warming, mathematical test.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database. +10. **High School and Beyond, mathematical test.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey. + + a. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam? + + b. Check the conditions required to complete this test. + + c. The average observed difference in scores is $\bar{x}_{read-write} = -0.545$, and the standard deviation of the differences is $s_{read-write} = 8.887$ points. + Do these data provide convincing evidence of a difference between the average scores on the two exams? + + d. What type of error might we have made? + Explain what the error means in the context of the application. + + e. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? + Explain your reasoning. + +11. **Global warming, mathematical test.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database. [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\circ$F and 2.95$^\circ$F. ```{r} @@ -308,22 +255,17 @@ g. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the 90$^{th}$ percentile high temperture from 1950 to 2022 to include 0? Explain your reasoning. -14. **High School and Beyond, mathematical test.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey. - - a. Create hypotheses appropriate for the following research question: is there an evident difference in the average scores of students in the reading and writing exam? - - b. Check the conditions required to complete this test. +12. **High school and beyond, mathematical interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey. + The mean and standard deviation of the differences are $\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points. - c. The average observed difference in scores is $\bar{x}_{read-write} = -0.545$, and the standard deviation of the differences is $s_{read-write} = 8.887$ points. - Do these data provide convincing evidence of a difference between the average scores on the two exams? + a. Calculate a 95% confidence interval for the average difference between the reading and writing scores of all students. - d. What type of error might we have made? - Explain what the error means in the context of the application. + b. Interpret this interval in context. - e. Based on the results of this hypothesis test, would you expect a confidence interval for the average difference between the reading and writing scores to include 0? - Explain your reasoning. + c. Does the confidence interval provide convincing evidence that there is a real difference in the average scores? + Explain. -15. **Global warming, mathematical interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database. +13. **Global warming, mathematical interval.** We considered the change in the 90$^{th}$ percentile high temperature in 1950 versus 2022 at 26 sampled locations from the NOAA database. [@webpage:noaa19482018] The mean and standard deviation of the reported differences are 2.53$^\circ$F and 2.95$^\circ$F. a. Calculate a 90% confidence interval for the average difference of 90$^{th}$ percentile high temperature between 1950 and 2022. @@ -334,17 +276,52 @@ c. Does the confidence interval provide convincing evidence that there were hotter high temperatures in 2022 than in 1950 at NOAA stations? Explain your reasoning. -16. **High school and beyond, mathematical interval.** We considered the differences between the reading and writing scores of a random sample of 200 students who took the High School and Beyond Survey. - The mean and standard deviation of the differences are $\bar{x}_{read-write} = -0.545$ and $s_{read-write}$ = 8.887 points. +14. **Possible paired randomized differences.** Data were collected on five people. - a. Calculate a 95% confidence interval for the average difference between the reading and writing scores of all students. + ```{r} + library(knitr) + library(kableExtra) + library(tidyverse) - b. Interpret this interval in context. + tribble( + ~col1, ~col2, ~col3, ~col4, ~col5, ~col6, + "Observation 1", 3, 14, 4, 5, 10, + "Observation 2", 7, 3, 6, 5, 9, + "Difference", -4, 11, -2, 0, 1, + ) %>% + kbl(linesep = "", booktabs = TRUE, col.names = c("", "Person 1", "Person 2", "Person 3", "Person 4", "Person 5")) %>% + kable_styling( + bootstrap_options = c("striped", "condensed"), + latex_options = "HOLD_position", + full_width = FALSE + ) + ``` - c. Does the confidence interval provide convincing evidence that there is a real difference in the average scores? - Explain. + Which of the following could be a possible randomization of the paired differences given above? + If the set of values could not be a randomized set of differences, indicate why not. + + a. -2, 1, 1, 11, -2 + + b. -4 11 -2 0 1 + + c. -2, 2, -11, 11, -2, 2, 0, 1, -1 + + d. 0, -1, 2, -4, 11 + + e. 4, -11, 2, 0, -1 + +15. **Study environment.** In order to test the effects of listening to music while studying versus studying in silence, students agree to be randomized to two treatments (i.e., study with music or study in silence). + There are two exams during the semester, so the researchers can either randomize the students to have one exam with music and one with silence (randomly selecting which exam corresponds to which study environment) or the researchers can randomize the students to one study habit for both exams. + + The researchers are interested in estimating the true population difference of exam score for those who listen to music while studying as compared to those who study in silence. -17. **Friday the 13th, traffic.** In the early 1990's, researchers in the UK collected data on traffic flow on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day. + a. Describe the experiment which is consistent with a paired designed experiment. + How is the treatment assigned, and how are the data collected such that the observations are paired? + + b. Describe the experiment which is consistent with an indpenedent samples experiment. + How is the treatment assigned, and how are the data collected such that the observations are independent? + +16. **Friday the 13th, traffic.** In the early 1990's, researchers in the UK collected data on traffic flow on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day. The histograms below show the distributions of numbers of cars passing by a specific intersection on Friday the 6th and Friday the 13th for many such date pairs. Also provided are some sample statistics, where the difference is the number of cars on the 6th minus the number of cars on the 13th.[^_21-ex-inference-paired-means-3] [@Scanlon:1993] @@ -404,7 +381,7 @@ g. What type of error might have been made in the conclusion of your test? Explain. -18. **Friday the 13th, accidents.** In the early 1990's, researchers in the UK collected data the number of traffic accident related emergency room (ER) admissions on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day. +17. **Friday the 13th, accidents.** In the early 1990's, researchers in the UK collected data the number of traffic accident related emergency room (ER) admissions on Friday the 13th with the goal of addressing issues of how superstitions regarding Friday the 13th affect human behavior and and whether Friday the 13th is an unlucky day. The histograms below show the distributions of numbers of ER admissions at specific emergency rooms on Friday the 6th and Friday the 13th for many such date pairs. Also provided are some sample statistics, where the difference is the ER admissions on the 6th minus the ER admissions on the 13th.[@Scanlon:1993] @@ -453,6 +430,29 @@ c. The conclusion of the original study states, "Friday 13th is unlucky for some. The risk of hospital admission as a result of a transport accident may be increased by as much as 52%. Staying at home is recommended." Do you agree with this statement? Explain your reasoning. +18. **Forest management.** Forest rangers wanted to better understand the rate of growth for younger trees in the park. + They took measurements of a random sample of 50 young trees in 2009 and again measured those same trees in 2019. + The data below summarize their measurements, where the heights are in feet. + + ```{r} + library(tidyverse) + library(kableExtra) + + tribble( + ~Year, ~Mean, ~SD, ~n, + "2009", 12, 3.5, 50, + "2019", 24.5, 9.5, 50, + "Difference", 12.5, 7.2, 50 + ) %>% + kbl(linesep = "", booktabs = TRUE, align = "lccc") %>% + kable_styling(bootstrap_options = c("striped", "condensed"), + latex_options = "HOLD_position", + full_width = FALSE) %>% + column_spec(1:4, width = "5em") + ``` + + Construct a 99% confidence interval for the average growth of (what had been) younger trees in the park over 2009-2019. + [^_21-ex-inference-paired-means-1]: The [`hsb2`](http://openintrostat.github.io/openintro/reference/hsb2.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package. [^_21-ex-inference-paired-means-2]: The [`us_temperature`](http://openintrostat.github.io/openintro/reference/us_temperature.html) data used in this exercise can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.