统计考试代考 Statistics代写 R代写

Statistics 411/511

Practice Final

统计考试代考 1.Recall the cuckoo egg length study from the practice midterm. The study compared lengths of cuckoo eggs among six different host species.

Final Instructions:

This exam is open-book, open-notes, and you may use any other materials except another
You don’t actually have to carry out calculations. For example, if you were asked for a 95% confidence interval for a mean whose point estimate is 3, and whose standard error is 1.5 with degrees of freedom is 5, you would receive full credit for the answer 3 ± t5(0.975) 1.5.
The default confidence level is 95%.
There are a total of 120 points possible.
You have four hours to complete this exam. The clock starts when you access the exam in Gradescope, and you can’t pause the clock.
If we were taking the exam in person, you would have 110 minutes to finish, so there should be ample time. However, pace yourself. Do not spend so much time on earlier problems that you do not get to the later ones. Don’t write more than necessary. It’s OK to abbreviate words and to spell out Greek letters (e.g. pop for population, and mu1 for µ1).
Please be as clear and concise as possible.

Notes About this Practice Final: 统计考试代考

These problems are designed to give you an idea of the scope and flavor of the type of problems that may appear on the final, as well as a sense of what course material is most important. However, your review should be comprehensive, not limited to these problems. Please also consult the Final Review outline.
The final covers the entire course. Please review the Practice Midterm, the Midterm Review outline, and your actual graded midterm.
I recommend working through these practice problems on your ownat first, then asking for help or clarification.
The actual exam will be somewhat shorter than this practice exam (75 points rather than 120).

1.

Recall the cuckoo egg length study from the practice midterm. The study compared lengths of cuckoo eggs among six different host species. The research question is to determine if cuckoo egg lengths differ among the host species and to compare egg lengths among host species(HS=hedge sparrow; MP=meadow pipit; PW=pied wagtail; TP=tree pipit). Below is R output from a oneway analysis of variance of the data.

> eggs_aov<-aov(Length~Host, data=eggs)
> anova(eggs_aov)
Analysis of Variance Table

Response: Length
          Df Sum Sq Mean Sq F value Pr(>F)
Host 5 55.794 11.159 14.398 3.334e-10 ***
Residuals 85 65.876 0.775

> # Group sample sizes.
> with(eggs, unlist(lapply(split(Length, Host), length)))
  MP TP HS Robin PW Wren
  16 15 14 16 15 15
> # Group sample means.
> with(eggs, unlist(lapply(split(Length, Host), mean)))
  MP TP HS Robin PW Wren
21.50000 23.09000 23.12143 22.57500 22.90333 21.13000

(a) (8 points) Write a 95% confidence interval to compare the average population mean cuckoo egg lengths in meadow pipits’ and tree pipits’ nests to the population mean cuckoo egg length in robins’ nests.

(b) (4 points) Using the R anova () output above, state the residual sum of squares and degrees of freedom for the equal means model. 统计考试代考

(c) (3 points) Suppose that after collecting the data, researchers noticed that the sample means for tree pipits and meadow pipits were substantially different and that the sample mean for hedge sparrows was noticeably larger than the sample mean for wrens. Would it be appropriate to use the Bonferroni correction to write confidence intervals for these two comparisons? State yes/no and give a brief justification (one sentence or less).

(d) (4 points) Suppose that the researchers had pre-planned to compare all pairs of means. Below is output from TukeyHSD () giving 95% Tukey-Kramer confidence intervals for all pairwise differences in population means. Put a star(*) by each confidence interval that indicates the corresponding two means differ.

> TukeyHSD(eggs_aov)
  Tukey multiple comparisons of means
  95% family-wise confidence level

Fit: aov(formula = Length ~ Host, data = eggs)

$‘Host‘
         diff lwr upr p adj

TP-MP 1.59000000 0.6674733 2.5125267 0.0000397
HS-MP 1.62142857 0.6820506 2.5608065 0.0000386
Robin-MP 1.07500000 0.1674747 1.9825253 0.0108225
PW-MP 1.40333333 0.4808066 2.3258601 0.0003838
Wren-MP -0.37000000 -1.2925267 0.5525267 0.8500457
HS-TP 0.03142857 -0.9224500 0.9853071 0.9999988
Robin-TP -0.51500000 -1.4375267 0.4075267 0.5827767
PW-TP -0.18666667 -1.1239548 0.7506214 0.9920583
Wren-TP -1.96000000 -2.8972881 -1.0227119 0.0000005
Robin-HS -0.54642857 -1.4858065 0.3929494 0.5381888
PW-HS -0.21809524 -1.1719738 0.7357833 0.9850979
Wren-HS -1.99142857 -2.9453071 -1.0375500 0.0000005
PW-Robin 0.32833333 -0.5941934 1.2508601 0.9038121
Wren-Robin -1.44500000 -2.3675267 -0.5224733 0.0002347
Wren-PW -1.77333333 -2.7106214 -0.8360452 0.0000054

(e) (2 points) If the preplanned pairwise comparisons of interest were between wrens and each of the other host species, what multiple comparison procedure would be most appropriate?

2.[This problem covers material in Chapters 7 and 8.] 统计考试代考

In a study on mercury levels in fish, water samples and fish were collected from 53 lakes in Florida. In the data set, AvgMercury is the average mercury concentration (parts per million) in muscle tissue of the fish sampled from the lake. Alkalinity is mg/L of calcium chloride in the water sample collected from the lake. Below is a scatterplot of log(AvgMercury) vs. Alkalinity with fitted regression line and confidence band.

R output from the regression is below.

> lakes_lm<-lm(log(AvgMercury)~Alkalinity, data=lakes)
> summary(lakes_lm)

Call:
lm(formula = log(AvgMercury) ~ Alkalinity)

Residuals:
    Min 1Q Median 3Q Max
-2.06553 -0.27948 0.08225 0.29231 1.79197

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.321099 0.114715 -2.799 0.00722 **
Alkalinity -0.015703 0.002152 -7.295 1.86e-09 ***
———
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.593 on 51 degrees of freedom
Multiple R-squared: 0.5107,Adjusted R-squared: 0.5011
F-statistic: 53.22 on 1 and 51 DF, p-value: 1.859e-09

(a) (7 points) Write a 95% confidence interval for the intercept parameter β0 in the regression model.

(b) (11 points) A 95% confidence interval for β1 is (−0.02, −0.01). Write a statistical conclusion reporting this result.

(c) (8 points) Below is R predict() output. Write a statistical conclusion for a confidence interval for the median average mercury concentration expected in fish muscle tissue from a lake with an alkalinity of 100 mg/L of calcium chloride. 统计考试代考

> predict(lakes_lm, data.frame(Alkalinity=100), interval="confidence",
+ se.fit=TRUE)
$fit
    fit lwr upr
1 -1.891373 -2.206977 -1.57577

$se.fit
[1] 0.1572056

$df
[1] 51

$residual.scale
[1] 5929642

(d) (3 points) Use the output from summary() and predict() above to give an expression for the standard error of a prediction of the log average mercury concentration of fish in a lake with an alkalinity of 100 mg/L of calcium chloride.

(e) (4 points) State the full and reduced models tested by the F-statistic 53.224 in the output below.

> anova(lakes_lm)
Analysis of Variance Table

Response: log(AvgMercury)
          Df Sum Sq Mean Sq F value Pr(>F)

Alkalinity 1 18.714 18.7138 53.224 1.859e-09 ***
Residuals 51 17.932 0.3516

(f) (4 points) A residual plot and normal Q-Q plot are shown below. For each of the two plots, state the assumption it is used to check and your assessment of the plausibility of the assumption based on the plot.

3.

A study was conducted to compare waste between two suppliers of a Levi-Strauss clothing manufacturing plant. The firm’s quality control department collects weekly data on percentage waste relative to what can be achieved by computer layouts of patterns on cloth. It is possible to have negative values, which indicate that the plant employees beat the computer in controlling waste. Below is a side-by-side boxplot of waste for the two suppliers (plants) and R output from a Wilcoxon rank-sum test.

> wilcox.test(Waste~Plant, data=waste, exact=FALSE, correct=FALSE)

Wilcoxon rank sum test

data: Waste by Plant
W = 131.5, p-value = 0.009484
alternative hypothesis: true location shift is not equal to 0

(a) (4 points) State the null hypothesis tested by the statistic W = 131.5 in the above output. Define any notation you use.

(b) (7 points) Write a statistical conclusion reporting the result of the rank-sum test.

(c) (3 points) Would a two-sample t-test be an appropriate procedure for these data? Why or why not? Answer in one sentence or less.

4. 统计考试代考

A study was performed to compare two methods of assembling a type of electronic device. Fifteen employees of an electronics manufacturer were recruited for the study. Each employee used each method, and the time required for each method was recorded.

(a) (7 points) Below is R output from a t-test on the difference between method A and method B for each employee. Write a statistical conclusion reporting the results of the hypothesis test.

> t.test(diff, alternative="greater")

One Sample t-test

data: diff
t = 1.9725, df = 14, p-value = 0.03432
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
  0.1244319 Inf
sample estimates:
mean of x
  1.162405

(b) (6 points) The sample standard deviation of the differences is 2.28. Write a two-sided confidence interval for the mean difference µ.

(d) (3 points) Would a two-sample t-test be a reasonable analysis for these data? Why or why not? Answer in one sentence or less.

5. 统计考试代考

Light bulbs burn out quicker if they’re turned off and on more frequently. A study was done comparing two rates of light bulb use. Group A was turned off and on once every 12 hours, whereas Group B was turned off and on once every two hours. The lifetime in hours wasrecorded for each bulb. Output from t.test is given below. Note that the lifetimes were log-transformed for the analysis.

> head(bulbs)
  Lifetimes Group
1 2646.2759   A
2 16334.1731  A
3 7497.3504   A
4 32666.1658  A
5 415.2038    A
6 998.5829    A
> t.test(log(Lifetimes)~Group, data=bulbs, alternative="greater")

Welch Two Sample t-test

data: log(Lifetimes) by Group
t = 2.0195, df = 97.956, p-value = 0.02308
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 0.1105225 Inf
sample estimates:
mean in group A mean in group B
  7.500358 6.878584

(a) (2 points) State the null and alternative hypotheses of the above test. Please define any notation you use.

(b) (8 points) Write a statistical conclusion reporting the results of the above hypothesis test.

(c) (9 points) A two-sided 90% confidence interval for the difference in mean log lifetime for the groups is (0.011, 1.233). Give a statistical conclusion reporting this confidence interval.

(d) (3 points) What is the p-value of the t-test whose alternative hypothesis is H_A : µ_A < µ_B, where µ_A and µ_B are the population means of groups A and B respectively?

6. 统计考试代考

For this question, assume that a parametric procedure is one that requires an assumption of normality, whereas a nonparametric procedure does not. For each of the studies described, state one parametric and one nonparametric procedure that you would consider for analysing the data.

(a) (4 points) A city conducts a study comparing two types of traffic control at intersections to identify the type of intersection associated with fewer accidents. City engineers identify 12 intersections of the first type, and 10 of the second type. The number of accidents at each of the 22 intersections for the past five years is recorded.

(b) (4 points) An insurance company suspects an automobile repair garage of inflating the charge of repairing cars after they’ve been involved in an accident. Ten cars were taken to the garage for a cost estimate. The same ten cars were taken to another garage for an estimate. The research question is if the cost estimates from the suspect garage are higher than from the other garage.

合作平台：随笔代写论文代写写手招聘英国留学生代写