当前位置:天才代写 > 金融经济统计代写 > case study > Case Study代写 Red Wine Quality Data代写 dataset代写 inputs代写

Case Study代写 Red Wine Quality Data代写 dataset代写 inputs代写

2020-11-01 11:51 星期日 所属: case study 浏览:11

Case Study代写

A Case Study: Red Wine Quality Data

Case Study代写 The red wine quality data is used for the case study. The dataset is available at Machine learning data set repositories, UCI,

Pick a dataset of your interest Case Study代写

The red wine quality data is used for the case study. The dataset is available at Machine learning data set repositories, UCI, http://archive.ics.uci.edu/ml/datasets/Wine+Quality. The physicochemical, that is the inputs, and sensory, the output, variables are available in the data set.

The inputs include objective tests. The available varibles are: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol.

The output, or the response varaibel that is of interest, is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent).

Pose the initial questions (3 to 5) that you would like to answer Case Study代写

RQ1: How are the results of all kinds of objective tests and the wine quality distributed? RQ2: How do qualities change according to different alcohol and total sulfur dioxide?

RQ3: Given the different pH level, does the wine quality differ?

Assess the fitness of the data

Can we answer all those questions by using only the land registry data? The answer is yes. This could be a empirical data driven study indeed. According to the empirical distributions in RQ1, the continuous variables can be empirically divided into factor variables or dummy variables for the better understanding of the whole data set and the wine quality study.Case Study代写

Answer the initial questions by visualizing the dataset using R

Load data:
## ‘data.frame’: 1599 obs. of 12 variables:  
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 …
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 …
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 …
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 …
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 … Case Study代写  
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 …  
## $ density : num 0.998 0.997 0.997 0.998 0.998 …  
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 …  
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 …  
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 …  
## $ quality : int 5 5 5 6 5 5 5 7 7 5 …  
To answer RQ1:

How are the results of all kinds of objective tests and the wine quality distributed? One might first plot the histogram for the variables.

Case Study代写
Case Study代写

The range and overall distribution can be easily shown by the above histograms.

Approximately, for alcohol, the range is from 8 to 14; For chlorides, the range is from 0 to 0.6; For citric acid,the range is from 0 to 1; For density, the range is from 0.9901 to 1.0037; For fixed acidity, the range is from 4 to 16; For free sulfur dioxide, the range is from 0 to 72; For pH, the range is from 2 to 4; For alcohol, the quality is from 3 to 8; For residual sugar, the range is from 0.9 to 15.5; For sulphates, the range is from 0.33 to 2; For total sulfur dioxide, the range is from 6 to 289; For volatile. acidity, the range is from 0.12 to 1.58.Case Study代写

Manipulate data:

From the histograms, residual sugar, free sulfur dioxide and total sulfur dioxide have large range and the distribution are significantly positive skewed. Therefore, one might conduct logarithm transformation to these variables.

Case Study代写
Case Study代写

After manipulating these variables, the overall distributions are concentrated according to the plots above. Visualization:

Densities:
Case Study代写
Case Study代写

The densities for all input variables are approximately bell shaped. This is a piece of good news for analyzing the data set for the further question in that the assumptions for the statistcial analysis techniques will not be viiolated. Thus, it would make the final results appropriate.

For the output variables, the pie chart for the wine quality is shown as above. The most of the wine has quality scores 5 and 6. Only a samll number of wine are extremely poor or good.

Case Study代写
Case Study代写

The boxplots for all variables are plotted as above. The variables fixed.acidity and residual.suguar have relatively large numbers of outliers.Case Study代写

RQ2: How do qualities change according to different alcohol and total sulfur dioxide?
Case Study代写
Case Study代写

Through the scatter plots with a simple linear regression model fit, the higher alcohol level or higher the total sulfur dioxide level, the higher the wine quality.

RQ3: Given different pH level, does the wine quality differ?
Case Study代写
Case Study代写
The above plots show that whether the pH level is low or high,

the quality of the wine will not be affected as the distributions remain the same.Case Study代写

60% of the wine quality can be predicted correctly according to the above tree model and the rules. That is, one might start from the alcohol, for low alcohol wine, it the sulpates, the quality is 1, otherwise, if the total sulfur dioxide is large, the quaility if 4; otherwise if it has low volatile acidity, the quality is 2 if the volatule acidicty is high, the quality id 3. If the wine has high alcohol, if the sulphates is low and the volatile acidity is low, the quality is 6; if the sulphates is low and the volatile acidity is high, the quality is 4; if the sulphatesis high and the volatile alcohol is low, the quality is 7; if the sulphates is high and the volatile alcohol is high, the quality is 8.Case Study代写

Case Study代写
Case Study代写

更多其他:C++代写 java代写 r代写 代码代写 金融代写  物理代写 考试助攻 C语言代写 finance代写 lab代写 计算机代写 code代写 data代写 report代写 数学代写 作业代写

合作平台:天才代写 幽灵代写 写手招聘 Essay代写

 


天才代写-代写联系方式