Assignment 2
现代计算统计学作业代写 Instructions 1) Please submit your solutions to this assignment in one PDF file in Brightspace. Only one file will be accepted.
Instructions
1) Please submit your solutions to this assignment in one PDF file in Brightspace. Only one file will be accepted.
2) You can submit a PDF file more than once. However, only the last submission will be saved. If you want to modify your submitted assignment, that is fine as long as it is before the deadline.
3) Late submissions of the assignment are not going to be marked.
4) Please use R markdown for your assignment, unless you are using another language such as Mathematica, Maple, etc.
5) You can submit hand written solutions for the mathematical parts of the assignment, but please combine images of your hand-written solutions with the PDF produced with R markdown as one PDF.
(See https://imagetopdf.com/ as a possible solution to combine images as one PDF). Alternatively, you can insert your image in the R markdown file.
6) You can work in groups of up to 4 members. Please, only one member of the group should submit the assignment, and the name and student number of each group should be on the assignment.
7) It is not necessary to submit the questions with your assignment. We only need your answers.
1. 现代计算统计学作业代写
(a) Using the eigen-decomposition method, generate 1000 observations from a multivariate normal distribution with mean vector µ = E[X] = (0, 1, 2) and covariance matrix
Use the pairs.panels plot to graph an array of scatter plots for each pair of variables. For each pair of variables, check that the correlations approximately agree with the theoretical parameters.
(b) For each observation x ∈ R3 in your sample in part (a), compute its Mahalanobis distance from the mean, i.e. compute
D = (x − µ)’ Σ−1(x − µ),
where µ is the vector of means, and Σ is the variance-covariance matrix. Give a density histogram using the 1000 Mahalanobis distances, and overlay the pdf for a chi2(3) distribution. What does this plot suggest.
2. 现代计算统计学作业代写
Let X have a Poisson(µ) distribution, where µ > 0. Its p.m.f. function is
(a)
The negative binomial distribution does not have a closed form quantile function, and its support is countably infinite. So to implement the inverse transformation technique to generate values from this distribution, we can try to use a recursive search. Find the factor C(x) which is a function of x, such that
p(x) = c(x)p(x − 1), x = 1, 2, . . .
Note: Γ(x) = (x − 1) Γ(x − 1) for x ≥ 1.
(b)
Using the recursive equation from (a), generate n = 10, 000 values from a Neg-B(µ = 10, k = 5) distribution. Produce a bar graph of the observed frequencies and superimpose the expected frequencies.
(c) 现代计算统计学作业代写
The problem with the above inverse transformation technique is that it will become more computionally expensive as the values of µ increases. A very interesting (and useful) fact about the negative binomial distribution is that it can be written as a mixture of gamma and Poisson distributions. In other words, it is a compound distribution. Consider a Poisson model with gamma-distributed mean:
X ∼ Poisson(µ),
and µ ∼ gamma(shape = k,scale = µ/k). Then, X ∼ Neg-B(µ, k). Use this result to generate n = 10, 000 values from a Neg-B(µ = 10, k = 5) distribution. Produce a bar graph of the observed frequencies and superimpose the expected frequencies. (You can use the functions rgamma and rpois.)
Use the system.time function in R to compare the computational work of the algorithms in (b) and (c). Try different values of µ = 5, 10, 20, 30. You can can keep k = 2. What do you notice as µ increases?
(d)
Use the system.time function in R to compare the computational work of the algorithms in (b) and (c). Try different values of µ = 5, 10, 20, 30. You can can keep k = 2. What do you notice as µ increases?
Display the results using a matrix, where µ = (5, 10, 20, 30) is in the first column, and the times for the recursive algorithm from (b) are in the second column, and the times for the algorithm using the compound distribution from (c) are in the 3rd column.
3. 现代计算统计学作业代写
Let X be a count variable with mean µ and variance σ2 . Its index of dispersion is D = σ2/µ. If the count variables have a common Poisson distribution, then D = 1. However, if D > 1, the distribution is over-dispersed compared to the assumption of a common Poisson distribution. R.A. Fisher proposed a test for Poisson homogeneity that D = 1, based on
which has an approximate N(0, 1) for large n. We will call the test based on this statistic the “Katz Test”. The larger the value of K, the stronger the evidence against homogeneity in favour of overdispersion (i.e. it is a one-tailed test).
(a) Use a simulation to estimate the empirical size of the index of dispersion test and of the Katz test. Consider the following cases: µ = 1, 3, 5, 10, and n = 10, 20, 50, 100. Discuss your results.
(b) We will use a negative-binomial(µ, k) distribution to estimate the power of these tests to identify overdispersion. Use a simulation to estimate the size of the index of dispersion test and of the Katz test. Consider the cases, where we keep µ = 2, but we consider different values for the dispersion parameter k = 1, 10, 100, 1000. You can use the function rnbinom to generate values from a negative binomial distribution, where k is the argument size, and µ is the argument mean. Discuss your results.