当前位置:天才代写 > R语言代写,r语言代做-无限次修改 > Data Mining r代写 for Business Analytics

Data Mining r代写 for Business Analytics

2018-10-09 08:00 星期二 所属: R语言代写,r语言代做-无限次修改 浏览:1160

Data Mining r代写 Download this file from our course Blackboard site:

a) athlete_events.csv Part I: Data Exploration

Homework #1

Fall 2018

AD699: Data Mining for Business Analytics

 

Topics: Data Exploration and Visualization

 

A note about submissions: Unlike Olympic figure skating and ski jumping, AD699 does not award style points. There are some fancy tools within R for generating reports, such as RMarkdown, but learning them is not within the scope of this course. The most important thing here is to answer the questions that ask for written answers, and to show screenshots where screenshots are asked for.

 

This assignment is due by 11:59 p.m. on Monday, September 17th.

 

Step 1:

 

Download this file from our course Blackboard site:

 

a) athlete_events.csv Part I: Data Exploration

1. Bring this file into your R environment. Assign the name athletes to this file. Show the code that you used to do this. (Remember to first set your working directory to the folder that contains your files).

 

2. How many rows and how many columns does athletes contain? How do you know this?

 

3. Are there any missing values in the athletes data set? If so, how do you know this? (Note: There are MANY ways that you could answer this question, and any valid way is completely fine).

 

4. Remove all rows in the athletes data set that contain any missing values, and store the results of this operation in a new variable called athletes2. What are the dimensions of athletes2?

 

5. Based on the data in athletes2, what is the mean age of an Olympic medalist? What is the median age? Show the code that you used to find this out, along with a screenshot of your results.

 

6. How many Olympic medalists were male, and how many were female? (Hint: Use the table function to help you with this). Show the code that you used to find this out, along with a screenshot of your results.

 

 

 

7. How old was the youngest Olympic medal winner in the dataset? How old was the oldest Olympic medal winner in the dataset? Show the code that you used to find this out, along with a screenshot of your results.

 

 

Part II: Data Visualization

 

1. Filter the dataset so that it only contains information for your particular Olympiad. Student Olympiad assignments can be found in Blackboard, in the same folder that contains this assignment prompt. Assign a new variable name to  this dataset that only contains your Olympiad. Show the code that you used to find this out, along with a screenshot of your results.

 

2. Using ggplot, create a histogram that depicts the distribution of medal winners for your Olympiad by age. Show the code that you used to accomplish this, along with a screenshot of your results.

 

3. Now, modify your histogram by specifying a number of binwidths that you chose (i.e. not the default number). Specify a color for the bins in your histogram, and specify another color to use for the borders of the bins. Give your histogram a descriptive title. Show the code that you used to accomplish this, along with a screenshot of your results.

 

4. Imagine that your boss is a smart person, but has no idea what a histogram is — how would you explain this plot to your boss? Write a one or two sentence description of what your histogram shows you.

 

5. Which six NOCs received the greatest numbers of medals? Show the code that you used to find this out, along with a screenshot of your results. Create a  filtered dataset that only contains medalists from these six NOCs. Show the code that you used to accomplish this, along with a screenshot of your results.

 

6. Using ggplot, create a scatterplot that depicts the heights (on the x-axis) and the weights (on the y-axis) of the athletes from the six NOCs with the most medals. Give your plot a descriptive title. Show the code that you used to accomplish  this, along with a screenshot of your results. Write a one or two sentence description of what this scatterplot shows you (again, explain it to your boss).

 

7. Now, add to the scatterplot that you just created by including a categorical variable (gender). Show the code that you used to accomplish this, along with a screenshot of your results. Write a one or two sentence description of what this

 

scatterplot shows you (again, explain it to your boss).

 

8. Include yet another categorical variable on your scatterplot — NOC. Use shape to represent NOC. Show the code that you used to accomplish this, along with a screenshot of your results. Write one or two sentences about something that this plot tells you (you don’t need to summarize the entire plot for this — you can just pick a couple data points and describe them here).

 

9. Again using ggplot, create a barplot that compares the total number of bronze, silver, and gold medals among the top six NOCs. What do you notice about these totals? If every Olympic competition generates one gold, one silver, and one bronze, why might your bars be different heights? (Hint: think about how you created this subset of the original dataset).

代写CS&Finance|建模|代码|系统|报告|考试

编程类:C代写,JAVA代写 ,数据库代写,WEB代写,Python代写,Matlab代写,GO语言,R代写

金融类统计,计量,风险投资,金融工程,R语言,Python语言,Matlab,建立模型,数据分析,数据处理

服务类:Lab/Assignment/Project/Course/Qzui/Midterm/Final/Exam/Test帮助代写代考辅导

天才写手,代写CS,代写finance,代写statistics,考试助攻

E-mail:850190831@qq.com   微信:BadGeniuscs  工作时间:无休息工作日-早上8点到凌晨3点


如果您用的手机请先保存二维码到手机里面,识别图中二维码。如果用电脑,直接掏出手机果断扫描。

Data Mining r代写
Data Mining r代写

 

 

天才代写-代写联系方式