research代写 project代写 papers代写 covariance matrix代写

Total of 100 Marks

research代写 his is a research-led project that is working oﬀ very recent research papers so I encourage you to start this project as

This is a research-led project that is working oﬀ very recent research papers so I encourage you to start this project as soon as possible for a number of reasons: (1) To ensure you are not caught by a large amount of work at the end of the semester; (2) To spot any issues in the project and inform me (typos, diﬃculties, etc) so that I can adjust the project accordingly. If you spot something that seems incorrect or unclear, please check the ‘Last updated’ date at the bottom of this page to ensure you have the latest version and inform me of the typo, unclear question, etc. so I can fix it (if it has not already been fixed).research代写

Sample correlation matrices

Question 1 [50 marks]research代写

Let x₁,. .. , x_n be a sequence of independent random vectors from a p-dimensional normal distri- bution N_p(µ, ⌃) with mean vector µ and p ⇥ p covariance matrix ⌃ = (o_ij ). The corresponding (population) correlation matrix R_n = (r_ij ) is defined by

for all 1 i 6= j p. Given a random sample x₁,. .., x_n, we construct the n ⇥ p data matrix X = (x_i_j ) = (x₁,. .., x_n)⁰ then the Pearson correlation coeﬃcient betweeen (x₁_i,. .., x_n_i )⁰ and (x₁_j,. .., x_n_j )⁰ is given by

where .The sample correlation matrix is defined as Rˆn := (rˆ_i_j ). From a data analysis point-of-view, the advantage of working with sample correlation matrices (instead of sample covariance matrices) is that they are invariant under scaling and shifting. Over the last 15 years a number of interesting results have been obtained about sample correlation matrices Rˆn in the high–dimensional regime where p, n →∞ such that p/n → y < ∞. These results are mostly in the case where the (population) correlation matrix R_n = I.research代写

(a)Showthat (in the case R_n = I) the limiting spectral distribution of the eigenvalues of Rˆn is Marcenko-Pastur. How do p and n relate to the parameters of this distribution?[5]

(b)Showthat (in the case R_n = I) the largest entry of Rˆn given by research代写

satisfies

and the limiting cumulative distribution function is

P (nL² – 4 log n + log(log n) y ) ! e^–^K^e–y/2

as n ! 1. What is the parameter K equal to? See [I].[5] research代写

(c)Showthat (in the case R_n = I) the largest eigenvalue of Rˆn satisfies the Tracy-Widom law; see [G].

(d)Showthat (in the case R_n = I) the quantity log |Rˆn| satisfies a CLT; see [E] and [D]. Check what happens to the CLT when R_n has an AR₁ structure.research代写

(e)Showthat the quantity log |Rˆn| satisfies the CLT from [A] in the following cases: (i) R_n = I;(ii) R_n has a compound symmetry structure (all entries of R_n are equal to ↵ 2 [0, 1/2) except the diagonal which contains 1’s) and do this using the result of Corollary 1 of [A]; (iii) R_nhas an AR₁ structure; (iv) R_n is a banded correlation matrix.

(f)In the context of hypothesis testing: (i) explain why it is good to understand the power of the test; (ii) give an counterexample of what could go wrong if you don’t consider the power; (iii) using your counterexample and a simulation study to show why the results of [A] are useful.research代写

In the above questions (a)-(e): (i) assume that we are in the high-dimensional regime; (ii) ensure that you check the result for at least at three diﬀerent values of y_n = p/n; (iii) answers can be argued through appropriate simulation studies and plots.

The detection-of-correlations problem

Question 2 [50 marks] research代写

Anomaly detection is extremely important in data science. We will now consider the detection- of-correlations problem which is concerned with detecting unusual correlations in observations. research代写Humans are often very good at this task. For example, given a single time series or image, we can usually spot some unusual correlations (see Figures 1 & 2 in [B]). However, getting an algorithm to achieve this can sometimes be quite hard.

(a)Westart by setting up a first test Suppose we are observing a time series X₁, X₂,. .. , X_n. Under the null hypothesis, the X_i ’s are i.i.d. standard normal random variables. The alternate is that the time series contains an anomaly in the form of temporal correlations over an (unknown)interval S = {i + 1,. .., i + p} of, say, known length p< n. Here, i 2 {0, 1,. .., n – p} is thus unknown. We want to generate realisations of this time series where the anomalous region S is such that (X_i₊₁,. .. , X_i₊_p) ⇠ (Y_i₊₁,. .., Y_i₊_p) where (Y_i : i 2 Z) is an autoregressive process of order h (AR_h) with zero mean and unit variance, that is,

research代写

where (“_i : i 2 Z) are i.i.d. standard normal random variables and ₁,. .. , _h 2 R are the coeﬃcients of the process.Write a code that generates realisations of this time series with(and without) anomalies. See [B] Section 1.3 for further details and Figure 1. Generate and plot three examples of realisations: (i) without anomalies; (ii) a realisation where n = 500,S = {201,. .., 250}, h = 1, and | ₁| > 0 but chosen so you can only faintly see the anomaly.

(iii) a realisation where n = 500, S = {201,. .. , 250}, h = 1,

and | ₁| > 0 but chosen so you can clearly see the anomaly. Clearly indicate your choices of ₁ in your plots for (ii) and (iii).[5]

(b)Consider the previous question using a change point analysis approach (e.g., see §2 p3631 of [B] and possibly [H]). Clearly write up how your chosen approach works(ideally 1/2 page, 1 page max.) [10 points] then implement the approach and comment how well it works on the three cases generated in (a) [10 points].

(c)Setup your second (image) test case in the form of Section 1.4 of [B]. Generate three example figures: (i) without an anomaly; (ii) a very faint anomaly; (iii) the case seen on the rightin Figure 2 of [B]. Use the example dimensions given in Figure 2.research代写

(d)Considerthe approaches in [B] or [C], choose one and describe how it works (ideally 1/2 page max.) [5 points]; Implement this approach (identifying the bounding box of an anomalous region is suﬃcient) and check the performance on your three test cases generated in (c) [10 points]; Can you comment on the limitations of detection (e.g., how large/strong does the anomaly have to be)? [5 points] [20]

References

[A]Jiang (2019). Determinant of sample correlation matrix with application. Annals ofProbability.

[B]Arias-Castro, Bubeck, Lugosi, and Verzelen (2018).Detecting Markov random fields hidden in white Bernoulli.

[C]Arias-Castro, Bubeck, and Lugosi (2015). Detecting positive correlations in a multivariate sample.Bernoulli.

[D]Jiang and Qi (2015). Likelihood ratio tests for high-dimensional normal distributions. Scandanavian Journal of Statistics.research代写

[E]Jiang and Yang (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Annals ofStatistics.

[F]Jiang, Jiang, and Yang (2012).Likelihood ratio tests for covariance matrices of high-dimensional normal distributions.Journal of Statistical Planning and Inference.

[G]]Bao,Pan and Zhou (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation Electronic Journal of Probability.

[H]Bodnar, Bodnar, and Okhrin (2009). Surveillance of the covariance matrix based on the properties of the singular Wishart distribution. Computational Statistics and DataAnalysis.

[I]Jiang (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Annals of Applied Probability.

合作平台：天才代写幽灵代写写手招聘 Essay代写