当前位置:天才代写 > 作业代写,留学生作业代写-北美、澳洲、英国等靠谱代写 > 统计代写价格 STAT3017代写 Big Data Statistics代写

统计代写价格 STAT3017代写 Big Data Statistics代写

2022-08-15 12:26 星期一 所属: 作业代写,留学生作业代写-北美、澳洲、英国等靠谱代写 浏览:457

统计代写价格

STAT3017/7017 Final Project

Big Data Statistics – Final Project

统计代写价格 We will now consider the detection- of-correlations problem which is concerned with detecting unusual correlations in observations.

Total  of 100 Marks

This is a research-led project that is working off very recent research papers so I encourage you to start this project as soon as possible for a number of reasons: (1) To ensure you are not caught by a large amount of work at the end of the semester; (2) To spot any issues in the project and inform me (typos, difficulties, etc) so that I can adjust the project accordingly. If you spot something that seems incorrect or unclear, please check the ‘Last updated’ date at the bottom of this page to ensure you have the latest version and inform me of the typo, unclear question, etc.  so I can fix it (if it has  not already been fixed).

Sample correlation matrices

Question 1 [50 marks]  统计代写价格

Let x1,. .. , xn be a sequence of independent random vectors from a p-dimensional normal distri- bution Np(µ, ⌃) with mean vector µ and p ⇥ p covariance matrix ⌃ = (oij ). The corresponding (population) correlation matrix Rn = (rij ) is defined by

for all 1 i 6= j p. Given a random sample x1,. .., xn, we construct the n ⇥ p data matrix X =  (xij ) =  (x1,. .., xn)0 then  the  Pearson  correlation  coefficient  betweeen  (x1i,. .., xni )0 and (x1j,. .., xnj )0 is given by

统计代写价格
统计代写价格

where     The  sample  correlation  matrix   is  defined  as  From a data analysis point-of-view, the advantage of working with sample correlation matrices (instead of sample covariance matrices) is that they are invariant under scaling and shifting. Over the last 15 years a number of interesting results have been obtained about sample correlation matrices Rˆn  in the highdimensional regime  where p, n !1 such that p/n ! y < 1. These results are mostly in the case where the (population) correlation matrix Rn = I.

(a)Showthat (in the case Rn I) the limiting spectral distribution of the eigenvalues of Rˆn  is Marcenko-Pastur. How do p and n relate to the parameters of this distribution?

统计代写价格
统计代写价格

(b)Showthat (in the case Rn I) the largest entry of Rˆn  given by

satisfies

and the limiting cumulative distribution function is

统计代写价格
统计代写价格

as n → ∞. What is the parameter K equal to? See [I].

(c)Showthat (in the case Rn I) the largest eigenvalue of Rˆn  satisfies the Tracy-Widom law; see [G].

(d)Showthat (in the case Rn I) the quantity log |Rˆn| satisfies a CLT; see [E] and [D]. Check what happens to the CLT when Rn has an AR1 structure.

(e)Showthat the quantity log |Rˆn| satisfies the CLT from [A] in the following cases:

(i) Rn I;(ii) Rn has a compound symmetry structure (all entries of Rn are equal to  2 [0, 1/2) except the diagonal which contains 1’s) and do this using the result of Corollary 1 of [A]; (iii) Rnhas an AR1 structure; (iv) Rn is a banded correlation matrix. 统计代写价格

(f)In the context of hypothesis testing: (i) explain why it is good to understand the power of the test; (ii) give an counterexample of what could go wrong if you don’t consider the power; (iii) using your counterexample and a simulation study to show why the results of [A] are useful.

In the above questions (a)-(e): (i) assume that we are in the high-dimensional regime; (ii) ensure that you check the result for at least at three different values of yn = p/n; (iii) answers can be argued through appropriate simulation studies and plots.

The detection-of-correlations problem

Question 2 [50 marks]  统计代写价格

Anomaly detection is extremely important in data science. We will now consider the detection- of-correlations problem which is concerned with detecting unusual correlations in observations. Humans are often very good at this task. For example, given a single time series or image, we can usually spot some unusual correlations (see Figures 1 & 2 in [B]). However, getting an algorithm to achieve this can sometimes be quite hard.

(a)Westart by setting up a first test  Suppose we are observing a time series X1, X2,. .. , Xn.

Under the null hypothesis, the Xi ’s are i.i.d. standard normal random variables. The alternate is that the time series contains an anomaly in the form of temporal correlations over an (unknown)interval S {i + 1,. .., i p} of, say, known length p< n. Here, i 2 {0, 1,. .., n  p} is thus unknown. We want to generate realisations of this time series where the anomalous region S is such that (Xi+1,. .. , Xi+p) (Yi+1,. .., Yi+p) where (Yi : i 2 Z) is an autoregressive process of order h (ARh) with zero mean and unit variance, that is,

统计代写价格
统计代写价格

where (εi : i ∈ Z) are i.i.d. standard normal random variables and 1,. .. , h 2 R are the coefficients of the process.  Write a code that generates realisations of this time series with(and without) anomalies. See [B] Section 1.3 for further details and Figure 1. Generate and plot three examples of realisations:  (i) without anomalies; (ii) a realisation where n = 500,S = {201,. .., 250}, h = 1, and | 1| > 0 but chosen so you can only faintly see the anomaly.(iii) a realisation where n = 500, S {201,. .. , 250}h = 1, and | 1| > 0 but chosen so you can clearly see the anomaly.  Clearly indicate your choices of   Ψ 1 in your plots for (ii) and (iii).

(b)Consider the previous question using a change point analysis approach (e.g., see §2 p3631       of [B] and possibly [H]).

Clearly write up how  your chosen approach works(ideally 1/2 page,   1 page max.)  [10 points] then implement the approach and comment how  well it works on  the three cases generated in (a) [10 points].

(c)Setup your second (image) test case in the form of Section 1.4 of [B].  Generate  three  example figures: (i) without an anomaly; (ii) a very faint anomaly; (iii) the case seen on the  rightin Figure 2 of [B]. Use the example dimensions given in Figure 2. 统计代写价格

(d)Considerthe approaches in [B] or [C], choose one and describe how it works (ideally 1/2 page max.) [5 points]; Implement this approach (identifying the bounding box of an anomalous region is sufficient) and check the performance on your three test cases generated in (c) [10 points]; Can you comment on the limitations of detection (e.g., how large/strong does the anomaly have to be)?

References 统计代写价格

[A]Jiang (2019). Determinant of sample correlation matrix with application. Annals ofProbability.

[B]Arias-Castro, Bubeck, Lugosi, and Verzelen (2018).Detecting Markov random fields hidden in white

Bernoulli.

[C]Arias-Castro,  Bubeck,  and  Lugosi  (2015).   Detecting  positive  correlations  in  a  multivariate  sample.Bernoulli.

[D]Jiang and Qi (2015). Likelihood ratio tests for high-dimensional normal distributions. Scandanavian Journal of Statistics.

[E]Jiang and Yang (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Annals ofStatistics.  统计代写价格

[F]Jiang, Jiang, and Yang (2012).Likelihood ratio tests for covariance matrices of high-dimensional normal distributions.

Journal of Statistical Planning and Inference.

[G]Bao,Pan and Zhou (2012). Tracy-Widom law for the extreme eigenvalues of sample correlation

Electronic Journal of Probability统计代写价格

[H]Bodnar, Bodnar, and Okhrin (2009). Surveillance of the covariance matrix based on the properties of the singular Wishart distribution. Computational Statistics and DataAnalysis.

[I]Jiang (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Annals of Applied Probability.

统计代写价格
统计代写价格

 

更多代写:投资学网课代上  GRE保分价格  网课final代考  网课代管  网课代考加拿大  网站代写

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

 

天才代写-代写联系方式