统计测试代写 Final Instructions: You don’t actually have to carry out calculations. For example, if you were asked for a 95% confidence interval for a mean
- You don’t actually have to carry out calculations. For example, if you were asked for a 95% confidence interval for a mean whose pointestimate is 3, and whose standard error is 1.5 with degrees of freedom is 5, you would receive full credit for the answer 3 ± t5(0.975) 1.5.
- The default confidence level is 95%.
- There are a total of 120 points possible.
- You have four hours to complete this exam.
- If we were taking the exam in person, you would have 110 minutes to finish, so there should be ample time. However, pace yourself. Do not spend so much time on earlier problems that you do not get to the later ones. Don’t write more than necessary. It’s OK to abbreviate words and to spell out Greek letters (e.g. pop for population, and mu1 for µ1).
- Please be as clear and concise as possible.
Notes About this Practice Final: 统计测试代写
- These problems are designed to give you an idea of the scope and flavor of the type of problems that may appear on the final, as well as a sense of what course material is most important. However, your review should be comprehensive, not limited to these problems. Please also consult the Final Review outline.
- The final covers the entire course. Please review the Practice Midterm, the Midterm Review outline, and your actual graded midterm.
- I recommend working through these practice problems on your own at first, then asking for help or clarification.
- The actual exam will be somewhat shorter than this practice exam (75 points rather than 120).
Recall the cuckoo egg length study from the practice midterm. The study compared lengths of cuckoo eggs among six different host species. The research question is to determine if cuckoo egg lengths differ among the host species and to compare egg lengths among host species(HS=hedge sparrow; MP=meadow pipit; PW=pied wagtail; TP=tree pipit). Below is R output from a oneway analysis of variance of the data.
> eggs_aov<-aov(Length~Host, data=eggs) > anova(eggs_aov) Analysis of Variance Table Response: Length Df Sum Sq Mean Sq F value Pr(>F) Host 5 55.794 11.159 14.398 3.334e-10 *** Residuals 85 65.876 0.775 > # Group sample sizes. > with(eggs, unlist(lapply(split(Length, Host), length))) MP TP HS Robin PW Wren 16 15 14 16 15 15 > # Group sample means. > with(eggs, unlist(lapply(split(Length, Host), mean))) MP TP HS Robin PW Wren 21.50000 23.09000 23.12143 22.57500 22.90333 21.13000
(a) (8 points) Write a 95% confidence interval to compare the average population mean cuckoo egg lengths in meadow pipits’ and tree pipits’ nests to the population mean cuckoo egg length in robins’ nests.
(b) (4 points) Using the R anova() output above, state the residual sum of squares and degrees of freedom for the equal means model.
(c) (3 points) Suppose that after collecting the data, researchers noticed that the sample means for tree pipits and meadow pipits were substantially different and that the sample mean for hedge sparrows was noticeably larger than the sample mean for wrens. Would it be appropriate to use the Bonferroni correction to write confidence intervals for these two comparisons? State yes/no and give a brief justification (one sentence or less).
(d) (4 points) Suppose that the researchers had pre-planned to compare all pairs of means. Below is output from TukeyHSD() giving 95% Tukey-Kramer confidence intervals for all pairwise differences in population means. Put a star(*) by each confidence interval that indicates the corresponding two means differ.
> TukeyHSD(eggs_aov) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Length ~ Host, data = eggs) $‘Host‘ diff lwr upr p adj TP-MP 1.59000000 0.6674733 2.5125267 0.0000397 HS-MP 1.62142857 0.6820506 2.5608065 0.0000386 Robin-MP 1.07500000 0.1674747 1.9825253 0.0108225 PW-MP 1.40333333 0.4808066 2.3258601 0.0003838 Wren-MP -0.37000000 -1.2925267 0.5525267 0.8500457 HS-TP 0.03142857 -0.9224500 0.9853071 0.9999988 Robin-TP -0.51500000 -1.4375267 0.4075267 0.5827767 PW-TP -0.18666667 -1.1239548 0.7506214 0.9920583 Wren-TP -1.96000000 -2.8972881 -1.0227119 0.0000005 Robin-HS -0.54642857 -1.4858065 0.3929494 0.5381888 PW-HS -0.21809524 -1.1719738 0.7357833 0.9850979 Wren-HS -1.99142857 -2.9453071 -1.0375500 0.0000005 PW-Robin 0.32833333 -0.5941934 1.2508601 0.9038121 Wren-Robin -1.44500000 -2.3675267 -0.5224733 0.0002347 Wren-PW -1.77333333 -2.7106214 -0.8360452 0.0000054
(e) (2 points) If the preplanned pairwise comparisons of interest were between wrens and each of the other host species, what multiple comparison procedure would be most appropriate?
2. 统计测试代写[This problem covers material in Chapters 7 and 8.] In a study onmercury levels in fish, water samples and fish were collected from 53 lakes in Florida. In the data set, AvgMercury is the average mercury concentration (parts per million) in muscle tissue of the fish sampled from the lake. Alkalinity is mg/L of calcium chloride in the water sample collected from the lake. Below is a scatterplot of log(AvgMercury) vs. Alkalinity with fitted regression line and confidence band.
R output from the regression is below.
> lakes_lm<-lm(log(AvgMercury)~Alkalinity, data=lakes) > summary(lakes_lm) Call: lm(formula = log(AvgMercury) ~ Alkalinity) Residuals: Min 1Q Median 3Q Max -2.06553 -0.27948 0.08225 0.29231 1.79197 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.321099 0.114715 -2.799 0.00722 ** Alkalinity -0.015703 0.002152 -7.295 1.86e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.593 on 51 degrees of freedom Multiple R-squared: 0.5107,Adjusted R-squared: 0.5011 F-statistic: 53.22 on 1 and 51 DF, p-value: 1.859e-09
(a) (7 points) Write a 95% confidence interval for the intercept parameter β0 in the regression model.
(b) (11 points) A 95% confidence interval for β1 is (−0.02, −0.01). Write a statistical conclusion reporting this result.
(c) (8 points) Below is R predict() output. Write a statistical conclusion for a confidence interval for the median average mercury concentration expected in fish muscle tissue from a lake with an alkalinity of 100 mg/L of calcium chloride.
> predict(lakes_lm, data.frame(Alkalinity=100), interval="confidence", + se.fit=TRUE) $fit fit lwr upr 1 -1.891373 -2.206977 -1.57577 $se.fit  0.1572056 $df  51 $residual.scale  0.5929642
(d) (3 points) Use the output from summary() and predict() above to give an expression for the standard error of a prediction of the log average mercury concentration of fish in a lake with an alkalinity of 100 mg/L of calcium chloride.
(e) (4 points) State the full and reduced models tested by the F-statistic 53.224 in the output below.
> anova(lakes_lm) Analysis of Variance Table Response: log(AvgMercury) Df Sum Sq Mean Sq F value Pr(>F) Alkalinity 1 18.714 18.7138 53.224 1.859e-09 *** Residuals 51 17.932 0.3516
(f) (4 points) A residual plot and normal Q-Q plot are shown below. For each of the two plots, state the assumption it is used to check and your assessment of the plausibility of the assumption based on the plot.
A study was conducted to compare waste between two suppliers of a Levi-Strauss clothing manufacturing plant. The firm’s quality controldepartment collects weekly data on percentage waste relative to what can be achieved by computer layouts of patterns on cloth. It is possible to have negative values, which indicate that the plant employees beat the computer in controlling waste. Below is a side-by-side boxplot of waste for the two suppliers (plants) and R output from a Wilcoxon rank-sum test.
> wilcox.test(Waste~Plant, data=waste, exact=FALSE, correct=FALSE) Wilcoxon rank sum test data: Waste by Plant W = 131.5, p-value = 0.009484 alternative hypothesis: true location shift is not equal to 0
(a) (4 points) State the null hypothesis tested by the statistic W = 131.5 in the above output. Define any notation you use.
(b) (7 points) Write a statistical conclusion reporting the result of the rank-sum test.
(c) (3 points) Would a two-sample t-test be an appropriate procedure for these data? Why or why not? Answer in one sentence or less.
A study was performed to compare two methods of assembling a type of electronic device. Fifteen employees of an electronics manufacturer were recruited for the study. Each employee used each method, and the time required for each method was recorded.
(a) (7 points) Below is R output from a t-test on the difference between method A and method B for each employee. Write a statistical conclusion reporting the results of the hypothesis test.
> t.test(diff, alternative="greater") One Sample t-test data: diff t = 1.9725, df = 14, p-value = 0.03432 alternative hypothesis: true mean is greater than 0 95 percent confidence interval: 0.1244319 Inf sample estimates: mean of x 1.162405
(b) (6 points) The sample standard deviation of the differences is 2.28. Write a two-sided confidence interval for the mean difference µ.
(c) (2 points) State the p-value of a two-sided test of µ = 0.
(d) (3 points) Would a two-sample t-test be a reasonable analysis for these data? Why or why not? Answer in one sentence or less.
Light bulbs burn out quicker if they’re turned off and on morefrequently. A study was done comparing two rates of light bulb use. Group A was turned off and on once every 12 hours, whereas Group B was turned off and on once every two hours. The lifetime in hours was recorded for each bulb. Output from t.test is given below. Note that the lifetimes were log-transformed for the analysis.
> head(bulbs) Lifetimes Group 1 2646.2759 A 2 16334.1731 A 3 7497.3504 A 4 32666.1658 A 5 415.2038 A 6 998.5829 A > t.test(log(Lifetimes)~Group, data=bulbs, alternative="greater") Welch Two Sample t-test data: log(Lifetimes) by Group t = 2.0195, df = 97.956, p-value = 0.02308 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 0.1105225 Inf sample estimates: mean in group A mean in group B 7.500358 6.878584
(a) (2 points) State the null and alternative hypotheses of the above test. Please define any notation you use.
(b) (8 points) Write a statistical conclusion reporting the results of the above hypothesis test.
(c) (9 points) A two-sided 90% confidence interval for the difference in mean log lifetime for the groups is (0.011, 1.233). Give a statistical conclusion reporting this confidence interval.
(d) (3 points) What is the p-value of the t-test whose alternative hypothesis is HA : µA < µB, where µA and µB are the population means of groups A and B respectively?
For this question, assume that a parametric procedure is one thatrequires an assumption of normality, whereas a nonparametricprocedure does not. For each of the studies described, state one parametric and one nonparametric procedure that you would consider for analysing the data.
(a) (4 points) A city conducts a study comparing two types of traffic control at intersections to identify the type of intersection associated with fewer accidents. City engineers identify 12 intersections of the first type, and 10 of the second type. The number of accidents at each of the 22 intersections for the past five years is recorded.
(b) (4 points) An insurance company suspects an automobile repair garage of inflating the charge of repairing cars after they’ve been involved in an accident. Ten cars were taken to the garage for a cost estimate. The same ten cars were taken to another garage for an estimate. The research question is if the cost estimates from the suspect garage are higher than from the other garage.