当前位置:天才代写 > 算法代写 > algorithm competition website代写 Project代写 kaggle代写

algorithm competition website代写 Project代写 kaggle代写

2020-12-28 13:42 星期一 所属: 算法代写 浏览:701

algorithm competition website代写

Project Grading Rubric

algorithm competition website代写 ata source and describeThe data came from kaggle, an algorithm competition website.The specific website

2018-11-26

Data source and describe algorithm competition website代写

The data came from kaggle, an algorithm competition website.The specific website is: https://www.kaggle.com/loganalive/echocardiogram-uci .This dataset consists of 132 instances of patients for 12 variables describing the patient’s heart attack and condition.I selected8 important variables as the data for this analysis, which are specifically described as follows:

1 survival: The number of months a person survived from heart disease was recorded. 2 still-alive: s the patient still alive? 0 means dead , 1 means still alive.algorithm competition website代写

  • age:age in years when heart attack
  • pericardial-e”usion:Pericardial e”usion is fluid around the  0=no fluid, 1=fluid.
  • fractionalshortening:a measure of contracility around the heart ,lower numbers are increasingly
  • epss: E-point septal separation, another measure of contractility. Larger numbers are increasingly algorithm competition website代写
  • lvdd:left ventricular end-diastolic  Large hearts tend to be sick hearts. 8 wall-motion-index: a measure of how the segments of the left ventricle are movin

Question algorithm competition website代写

All the patients had experienced heart attacks in the past.Some are still alive, some are dead for a shorttime.I want to know which variables have an important e”ect on this di”erence in outcome.Pericardiale”usion is a response to damage to the parietal pericardium and can cause other complications.Whetherpericardial e”usion significantly a”ects the time of death in patients with heart disease.

#Load the required packages and data 

library(ggplot2)

library(survival)

library("survminer")

echocardiogram<-read.csv(file="echocardiogram.csv")
#Data cleaning and variable selection 

echocardiogram<-na.omit(echocardiogram[,c(1:7,9)])
algorithm competition website代写
algorithm competition website代写

Variable analysis algorithm competition website代写

As can be seen from the figure below, there is no significant di”erence in survival time between patientswith pericardial e”usion and those without e”usion. The age of patients with pericardial e”usion is more concentrated, between 50 and 70 years old.

ggplot(echocardiogram, aes( x = age, y = survival, color = factor(pericardialeffusion),

size = age, shape = factor(pericardialeffusion)))+

geom_point(alpha = 0.9)+

scale_color_manual(values=c('green','red'))

The following figure shows that the median survival month of patients with pericardial e”usion is lower than that of patients without pericardial e”usion in the surviving group.There was no significant di”erence between the median survival months in the death group.

ggplot(data=echocardiogram, aes(x=factor(alive),y=survival))+geom_boxplot(aes(fill=factor(pericard
algorithm competition website代写
algorithm competition website代写

The survival curve of patients with heart disease was drawn only considering whether there waspericardial e”usion or not. As can be seen from the picture, the survival rate of heart patients withpericardial e”usion is lower than that of heart patients with pericardial e”usion over time.algorithm competition website代写

fit1 <- survfit(Surv(survival,alive)~factor(pericardialeffusion),data=echocardiogram)

plot(fit1,mark=19,lwd=2,col=c(4,2), ylab="Survival Proportion", xlab="Months")

legend(10,0.4,c("no fluid","fluid"),fill=c(4,2))

Cox PH regression

Now we’re going to describe how some of these factors come together to influence survival.To answerthis question, we will perform a multivariate Cox regression analysis. In multivariate Cox analysis, age andwallmotion index were significantly correlated (p<0.05).Pericardial e”usion was not significant (P = 0.20,which was larger than 0.05).Similarly, fractional shortening, epss, and LVDD have P values greater than 0.05, indicating that they contributed less to the di”erence in survival time.

The age p value was 0.000384,the risk ratio HR = exp (coef) =1.08116, and the 95% confidence interval (1.0355824, 1.129), indicating astrong relationship between the age of patients and the risk of death.The pericardial e”usion p value was0.204897, the risk ratio HR = exp (coef) =1.6994, and the 95% confidence interval was (0.7486333, 3.857),indicating that there was also a strong relationship between patient pericardial e”usion and the reducedrisk of death, but the fluctuation range was large.The risk ratio of covariates can be explained as themultiplier e”ect on risk.For example, fluid (pericardial e”usion= 1) that remains the same for other covariables increases the risk to 1.6994 times, that is, a 70 percent increase.We conclude that fluid patients are more likely to die.

algorithm competition website代写
algorithm competition website代写

As can be seen from the figure below, the survival curve of heart patients with pericardial e”usion is lower than that of heart patients without pericardial e”usion, which indicates that heart patients with pericardiale”usion are more likely to die. The confidence interval of pericardial e”usion survival curve is very large, covering the confidence interval of non-e”usion survival curve, which indicates that the reliability of the results has some problems.

df<-with(echocardiogram,data.frame(pericardialeffusion=c(0,1),age=rep(mean(age),2),

fractionalshortening=rep(mean(fractionalshortening,na.rm=TRUE),2),

epss=rep(mean(epss,na.rm=TRUE),2),

lvdd=rep(mean(lvdd,na.rm=TRUE),2),

wallmotion.index=rep(mean(wallmotion.index,na.rm=TRUE),2)))

curvesfit<-survfit(fit2,newdata=df)

ggsurvplot(curvesfit,data=df,conf.int=TRUE,legend.labs=c("no fluid","fluid"),

ggtheme=theme_minimal())
algorithm competition website代写
algorithm competition website代写

Model Fitting, using AIC to pick the right set of covariates

We use stepwise regression to optimize our model variables. After stepwise regression, we retained threevariables: epss, wallmotion index and age.Compared with the unoptimized model, the P values ofLikelihood ratio test, Wald test and Score (logrank) test are all reduced by an order of magnitude. Afteroptimization, the P values of the three overall tests of the model are far less than 0.05, which indicates that the model is significant on the whole. The P values of all three variables were less than 0.05,indicating that three variables were significantly correlated with survival time.algorithm competition website代写

fit3<-step(fit2)

Start: AIC=261.56

Surv(survival, alive) ~ age + pericardialeffusion + fractionalshortening +

epss + lvdd + wallmotion.index

Df AIC

– lvdd 1 259.81

– epss 1 260.29

– pericardialeffusion 1 261.06

– fractionalshortening 1 261.37algorithm competition website代写

<none> 261.56

– wallmotion.index 1 265.36

– age 1 271.71

Step: AIC=259.81

Surv(survival, alive) ~ age + pericardialeffusion + fractionalshortening +

epss + wallmotion.index

Df AIC

– pericardialeffusion 1 259.52

<none> 259.81

– fractionalshortening 1 259.94

– epss 1 259.98

– wallmotion.index 1 263.62

– age 1 270.54

Step: AIC=259.52

Surv(survival, alive) ~ age + fractionalshortening + epss + wallmotion.index

Df AIC

– fractionalshortening 1 258.96algorithm competition website代写

<none> 259.52

– epss 1 260.27

– wallmotion.index 1 263.86

– age 1 270.43

Step: AIC=258.96

Surv(survival, alive) ~ age + epss + wallmotion.index

Df AIC

<none> 258.96

– epss 1 261.05

– wallmotion.index 1 264.97

– age 1 270.83

summary(fit3)

Call:

coxph(formula = Surv(survival, alive) ~ age + epss + wallmotion.index,algorithm competition website代写

data = echocardiogram)

n= 107, number of events= 32

coef exp(coef) se(coef) z Pr(>|z|)

age 0.08139 1.08479 0.02154 3.778 0.000158 ***

epss 0.04566 1.04672 0.02165 2.109 0.034907 *

wallmotion.index 1.18291 3.26384 0.39005 3.033 0.002424 **

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1algorithm competition website代写

exp(coef) exp(-coef) lower .95 upper .95

age 1.085 0.9218 1.040 1.132

epss 1.047 0.9554 1.003 1.092

wallmotion.index 3.264 0.3064 1.520 7.010algorithm competition website代写

Concordance= 0.766 (se = 0.054 )

Rsquare= 0.242 (max possible= 0.929 )

Likelihood ratio test= 29.62 on 3 df, p=1.657e-06

Wald test = 28.7 on 3 df, p=2.587e-06

Score (logrank) test = 32.16 on 3 df, p=4.84e-07

Residual analysis algorithm competition website代写

COX regression assumes that Hazard Ratio (HR) does not change over time, that is, it satisfies theproportional risk assumption (PH assumption). Normally, Schoenfeld residuals should be independent oftime. If the residuals have a tendency to correlate with time, they will become evidence of violation of thePH hypothesis. From the following results, we can see that the P values of the three variables are greater than 0.05, indicating that each variable satisfies the PH test, while the overall P value of the model is 0.678, and the overall model satisfies the PH test.

cox.zph(fit3)

rho chisq p

age -0.162 0.741 0.389

epss 0.148 0.560 0.454

wallmotion.index -0.154 0.731 0.392 algorithm competition website代写

GLOBAL NA 1.518 0.678

In general, we assume a linear relationship between covariates and-log(s(t)). Nonlinear diagnosis of model covariates can be achieved by drawing Martingale residual diagram. Nonlinear diagnosis is generally aimed at continuous variables in the model. As shown in the figure below, although the variables have a local non-linear trend, the overall trend is linear.

ggcoxfunctional(fit3,data = echocardiogram)
algorithm competition website代写
algorithm competition website代写

The residual values are uniformly distributed around 0, indicating that the model conditions are satisfied.

ggcoxdiagnostics(fit3,type = "deviance",

linear.predictions = FALSE,ggtheme = theme_bw())

ggcoxdiagnostics(fit3,type = "dfbeta",

linear.predictions = FALSE,ggtheme = theme_bw())
algorithm competition website代写
algorithm competition website代写

Extension

Heart disease patients are always concerned about how long they can survive. It is di cult to predict thesurvival time of patients accurately. We can reduce the di culty of judging whether a patient can survivefor more than a year. According to survival and survival variables, we can judge whether patients survive for more than one year. I can derive a new binary variable, aliveat1. 0 refers to patients who died withinone year or followed up for less than one year. 1 represents that the patient has survived for more thanone year. We can predict whether a patient will survive for more than a year based on the variablesavailable. Decision trees or random forests are my priority models. However, there may still be a problem. There are only 122 samples in the data set, which is a big challenge to the accuracy of the model.

algorithm competition website代写
algorithm competition website代写

其他代写:考试助攻 计算机代写 java代写 algorithm代写 assembly代写 function代写paper代写 金融经济统计代写 web代写 数学代写 编程代写 report代写 数学代写 finance代写 algorithm代写 matlab代写

合作平台:天才代写 幽灵代写 写手招聘 Essay代写

 

天才代写-代写联系方式