当前位置:天才代写 > R语言代写,r语言代做-无限次修改 > R语言作业代写 R语言Final Report代写

R语言作业代写 R语言Final Report代写

2021-06-18 17:28 星期五 所属: R语言代写,r语言代做-无限次修改 浏览:881

R语言作业代写

Final Report:

(Due May 8 but possibly extended to May 10-12)

SALES OF ORTHOPEDIC EQUIPMENT

R语言作业代写 The objective of this study is to find ways to increase sales of orthopedic products from our company to all hospitals in the United States.  

 

The objective of this study is to find ways to increase sales of orthopedic products from our company to all hospitals in the United States.   Find those who have high consumption of such equipment but where our sales are zero. Come up with a selected group where you think our efforts will be rewarded. (a few hospitals 5 or 10 or 15). Estimate the potential or expected sales on those hospitals.

The following description of the dataset includes variable names and some summaries of variable.

At the bottom of the file is also some additional R code.

Dataset: hospitalUSA.csv
VARIABLES:

     ZIP :  US POSTAL CODE
     HID :  HOSPITAL ID
    CITY :  CITY NAME
   STATE :  STATE NAME
    BEDS :  NUMBER OF HOSPITAL BEDS
   RBEDS :  NUMBER OF REHAB BEDS
   OUT-V :  NUMBER OF OUTPATIENT VISITS
     ADM :  ADMINISTRATIVE COST(In $1000's per year)
     SIR :  REVENUE FROM INPATIENT   

   SALES : SALES OF EQUIP in $1000's per year
     HIP :  NUMBER OF HIP OPERATIONS 
    KNEE :  NUMBER OF KNEE OPERATIONS 
      TH :  TEACHING HOSPITAL?  0, 1
  TRAUMA :  DO THEY HAVE A TRAUMA UNIT?  0, 1
   REHAB :  DO THEY HAVE A REHAB UNIT?  0, 1
    HIP2 :  NUMBER HIP OPERATIONS Year 2
   KNEE2 :  NUMBER KNEE OPERATIONS Year 2
  FEMUR2 :  NUMBER FEMUR OPERATIONS Year 2

Overview of the Analysis

Part 1. Select your market segment-s. R语言作业代写

1.Dataset:

hospitalUSA.csv set.seed(????) set.sed(23456) use you Rutgers id (5 last digits)

Every student of has his/her own data (it is enough to select about 3000-3500 hospitals at random). Set the zero values on SALES to missing values.

Separate the variables into the following groups:

Response:             SALES, SALES=0 => SALES=NA

Demographics:      BEDS, RBEDS, OUTV, ADM, SIR, TH, TRAUMA, REHAB

Operation numbers:  HIP, KNEE, HIP2, KNEE2, FEMUR2

2.Transformations:  R语言作业代写

Look at each individual variable and decide “if and which” transformation is appropriate. Some transformations are log(1+c*x) where the constant c changes from variable to variable ( 0.1,0.01,0.001,…) or sqrt transformation or any other.

Typical transformations should be of the type below but not exactly, so you need to try several possibilities for each variable until the histogram looks acceptable.

HIP = sqrt(HIP)  or    SALES = log(1+0.1*SALES)

R语言作业代写
R语言作业代写

3.Dimension reduction.

Use the factor method to summarize the demographic variables and the operation variables and come out with a final reduced list of factor variables (perhaps 3 or 4). Use the rotated factors in order to find a good interpretation of the factors and try to make a good story.

library(psych)

## Scree plot

barplot(fa(rr1,nfactors=15)$Vaccounted[5,],col=7)

abline(h=0.75,lwd=3,col=2)

### Run factor analysis using correlation matrix

### Use Varimax rotation

fit <- fa(rr1,nfactors=6,rotate="varimax")

## After checking output assign variables to

##  factors

apply(fit$loadings,1,function(x) which.max(abs(x)))-> fn

4.Market segmentations.  R语言作业代写

i) Independent variables are used to divide the list of hospitals (all possible clients = the market) into subsets which we call market segments or clusters.
Use cluster analysis to find the market segments or clusters. Since we are summarizing the variables with factors then use the factors. One way of choosing the number of clusters is to move the data into R and apply the silhouette function with pam to calculate the silhouette statistic and of cluster it to decide the number clusters. Then move the cluster variable back to SAS if you prefer.

ii) Once the clusters are chosen, we must study the summary statistics for each cluster and try to describe their content. Interpretation is very important at this stage. You do a boxplot of SALES or transformed SALES VS CLUSTER_NUMBER and choose clusters with the highest SALES and focus on the top cluster or clusters.

iii)      Finally, we select the cluster or clusters that agree with our objectives. These are clusters with high sales and with good characteristics, such as high number of operations, etc.

In this study you are looking for segments with over all high sales but where there are hospitals were the company’s sales is NA so they are not yet our customers. Some segments will have mostly low sales. This means that those hospitals have few patients who would need our products, so we are not interested in them.

Part 2. Estimating potential gain in sales.  R语言作业代写

Potential gain in sales is the difference between current sales and the average of sales to similar hospitals. If you are analyzing a very small cluster (N <100) then we might assume that the sales are homogeneous and the “average sales to similar hospitals” is just the average sale to that cluster. But if the cluster is larger than 100, we will need to redo the clustering with more clusters.   Methods: 1. Pam, 2. Kmeans, 3.HC using Ward.

Selecting the numbers of clusters 15-30 using silhouette criteria , second der, GAP statistic.

We are interested in hospital with no current sales that is NA sales. For these hospitals your estimate of the potential gains is the average sales for that cluster.

All these parts could be performed using R. The R analysis would apply the methods for robust clustering (pam) and for classification and regression trees (rpart)  4th method.

PAM: compare the clusters given by PAM with those from SAS, are they similar?

RPART: The idea here is to take the SALES variable that was defined earlier as a response. Run the tree method and select one good node that have very high sales and find hospitals on that group that have SALES=NA and estimate a potential sale gain.

Use the rpart package in R. The rpart function is similar to lm in the sense that it accepts “predict” for new data. Please use more than 14 clusters not 14 or less.

R语言作业代写
R语言作业代写

 

其他代写:essay代写 作业代写  作业加急  加拿大代写 英国代写 homework代写 Exercise代写 course代写 app代写 algorithm代写  assignment代写 analysis代写 code代写 assembly代写 CS代写 Data Analysis代写 data代写   北美代写  北美作业代写  澳大利亚代写 

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

 

天才代写-代写联系方式