Take Home Midterm (MAT 4374/5182)
现代计算统计学代写 Instructions1) Please submit your solutions to this assignment in one PDF file in Brightspace. Only one file will be accepted.
Instructions
1) Please submit your solutions to this assignment in one PDF file in Brightspace. Only one file will be accepted.
2) You can submit a PDF file more than once. However, only the last submission will be saved. If you want to modify your submitted assignment, that is fine as long as it is before the deadline.
3) Late submissions of the assignment are not going to be marked.
4) Please use R markdown to write the solutions for this part.
5) You can submit hand written solutions for the mathematical parts of the assignment, but please combine images of your hand-written solutions with the PDF produced with R markdown as one PDF.
(See https://imagetopdf.com/ as a possible solution to combine images as one PDF). Alternatively, you can insert your image in the R markdown file.
6) Do not work in groups.
1. 现代计算统计学代写
Let X have a log-normal(µ, σ) distribution. Here is a stochastic representation of the log-normal distribution: let Z have a standard normal distribution, and let X = eσ Z+µ, then we say that X has a log-normal(µ, σ) distribution. Note that ln(X) has a normal distribution with mean µ and standard deviation σ.
Consider µ = 0, and σ = 1. Perform a simulation study to compare the efficiency of the estimation of the geometric mean and of the mean? Use n = 15, 20, 25, 30, 35, 100. We will say that the estimator with the smallest mean squared error is the most efficient estimator. Also give the estimated MSE and estimated standard error of the MSE for both estimators.
(b) Using the result from (a), write an R function that generates a random sample of size n from the distribution with the absolute sine pdf using the acceptance-rejection method.
(c) For the algorithm implemented in part (b), what is the expected number of iterations required to generate a value from the distribution with the absolute sine pdf using the acceptance-rejection method.
(d) Use your function from part (b) to generate a random sample of size n = 10000 the distribution with the above pdf. Give the corresponding density histogram and superimpose the true pdf. Does there appear to be a good fit?
3. 现代计算统计学代写
Consider a random variable X with a modified geometric distribution with paramater p. Its pmf
p(x) = (1 − p)x p, x = 0, 1, 2, . . .
We interpret X as the number of failures required to observe the first success for independent Bernoulli trial with a common probability of success p. We learned how generate values from this distribution using the inverse transformation technique in Suggested Exercises 4.
(a) Write a function that generates n values from a modified geometric distribution with parameter p using the inverse transformation technique. (Do not use “‘rgeom““.)
This means that Y has a (negative binomial) Neg-B(µ, k) distribution. We had generated values from this distribution in Assignment 2 using the inversive transformation technique with a iterative search and also using its compound distribution representation. In this problem, consider k as a positive integer, and write a function to generate n values from a Neg-B(µ, k) distribution using your function from part (a) and the representation of Y as a sum of k independent modified geometric random variables.
(c) Use your function in part (b) to generate n = 10, 000 values from a Neg-B(µ = 20, k = 6) distribution. Produce a bar graph of the observed frequencies and superimpose the expected frequencies.
4. 现代计算统计学代写
Two groups of workers have the same qualifications for a particular type of work. Their experience in salaries is summarized by the following 2 × 5 contingency table. The salaries are in thousands of dollars
c.table<-array(data = c(6,5,11,9,16,8,14,6,13,2), dim = c(2,5), dimnames = list(Group = c("1", "2"), Salary = c("27-29", "29-31","31-33","33-35","35 and over"))) c.table
## Salary ## Group 27-29 29-31 31-33 33-35 35 and over ## 1 6 11 16 14 13 ## 2 5 9 8 6 2
Let pi = (pi1, pi2, pi3, pi4, pi5) be the salary distribution for group i, for i = 1, 2.
We would like to test for the homogeneity of these distributions, that is H0 : p1 = p2.
(b) From part (a), we see that Pearson’s chi-square test statistic for independence between two categorical variables is the same as the test statistic for homogeneity of the two categorical variables. In fact, it can also
be shown that the approximate sampling distribution of the test statistic under H0 is a χ2((2−1)(5−1) = 4) distribution. This means that we can use the chisq.test to conduct our test for homogeneity.
chisq.test(c.table) ## ## Pearson’s Chi-squared test ## ## data: c.table ## X-squared = 4.7523, df = 4, p-value = 0.3137
Instead of relying on the χ2 approximation, we will implement a parametric bootstrap to compute a simulated p-value. Using the design of this study, i.e. two independent multinomial experiments (one with size 60 and the other 30), compute a simulated p-value using B = 2000 replications.