当前位置:天才代写 > 作业代写,留学生作业代写-北美、澳洲、英国等靠谱代写 > CISC 5950代写 Big Data Programming代写

CISC 5950代写 Big Data Programming代写

2021-07-22 17:11 星期四 所属: 作业代写,留学生作业代写-北美、澳洲、英国等靠谱代写 浏览:556

0

CISC 5950

Big Data Programming

CISC 5950 — Project 1

CISC 5950代写 In this project, we are going to design our own Hadoop MapReduce-based program to analyze the data. The project consist of two parts.

In CISC 5950, we have learned the following topics,

1.Set up a 3-node cluster with Hadoop Distributed File System and run examples.

2.On top of HDFS, set up the cluster with MapReduce programming framework.

3.Run examples of MapReduce programs.

4.Scheuling on the Cloud.

In this project, we are going to design our own Hadoop MapReduce-based program to analyze the data. The project consist of two parts.

 

NY Parking Violations  CISC 5950代写

The NYC Department of Finance collects data on every parking ticket issued in NYC ( 10M per year!). This data is made publicly available to aid in ticket resolution and to guide policy-makers.

You can fifind the data from the Link of NYC Parking Data.

 

0

 

The above fifigure shows several records, where each row represents a parking ticket and the columns are the details of the tickets.

To start the project, you have to,

1.Start the 3-node cluster

2.Set up the HDFS

3.Store the data in HDFS

4.Set up the MapReduce framework along with the scheduler for resource management.

By analyzing the data, we need to answer the following,

  • When are tickets most likely to be issued?
  • What are the most common years and types of cars to be ticketed?
  • Where are tickets most commonly issued?
  • Which color of the vehicle is most likely to get a ticket?

NBA Shot Logs  CISC 5950代写

https://www.kaggle.com/dansbecker/nba-shot-logs

This is the DATA (https://www.kaggle.com/dansbecker/nba-shot-logs ) on shots taken during the 2014-2015 season, who took the shot, where on the flfloor was the shot taken from, who was the nearest defender, how far away was the nearest defender, time on the shot clock, and much more. The column titles are generally self-explanatory.

The above fifigure shows several records, where each row represents a shot and the columns are the details of the shot, e.g. the game ID, who is the defender, what is the distance between them.

 

CISC 5950代写
CISC 5950代写

 

By analyzing the data, we need to answer the following,

  • For each pair of the players (A, B), we defifine the fear sore of A when facing B is the hit rate, such that B is closet defender when A is shoting. Based on the fear sore, for each player, please fifind out who is his ”most unwanted defender”.

 

  • For each player, we defifine the comfortable zone of shooting is a matrix of,

{SHOT DIST, CLOSE DEF DIST, SHOT CLOCK}

Please develop a MapReduce-based algorithm to classify each player’s records into 4 comfortable zones. Considering the hit rate, which zone is the best for James Harden, Chris Paul, Stephen Curry and Lebron James.

 

Bonus Question  CISC 5950代写

The biggest challenge when using K-Means is to decide on the number of clusters. Having more clusters creates some small classes with very few records, while having less clusters leads to classes that are too general.

Based on a K-Means algorithm above, try to answer the following question,

  • Given a Black vehicle parking illegally at 34510, 10030, 34050 (street codes). What is the probability that it will get an ticket? (very rough prediction).
  • At 10 am, I want to go to Lincoln Center and I just want to walk within 0.5 mile. Where should I park? (Divided into zones).

 

Grading Rubric  CISC 5950代写

You should complete the lab in groups of 4 students.

(70%) P1: NY Parking Violations (17.5% * 4);

(20%) P2: NBA Shot Logs (10% * 2);

(10%) Two Reports the your design and experiments, please as detail as possible and must

include your screenshots; In addition, you also need to write two README fifiles for P1 and P2.

(10%) Bonus Question (5% + 5%);

Submission  CISC 5950代写

You are expected to email me a zip(or tar) fifile by the deadline (Nov. 8th, 2020). The zip fifile should include two (or three) folders,

  • Part1: your codes, report and README
  • Part2: your codes, report and README
  • Bonus: your codes, report and README

Userful Links

1.Analysis of NYC Parking Tickets.

2.Preliminary Data Visualization.

3.Exploring 42.3M NYC Parking Tickets.

4.NY Parking Violations Issued .

5.Insights From Raw NBA Shot Log Data.

6.Investigating the hot hand phenomenon in the NBA (CODE).

7.Parallel K-Means Clustering Based on MapReduce.

8.NBA 16-17 regular season shot log.

9.The Fear Factor.

10.The Best And Worst Defenders.

11.NBA Classifification.

12.Stephen Curry’s Decision Tree.

13.Points per Match (ATL vs WAS only).

14.MapReduce-kmeans.

 

cisc-5950代写
cisc-5950代写

 

其他代写: CS代写 Data Analysis代写 data代写 澳大利亚代写 assignment代写 analysis代写 code代写 assembly代写 homework代写 Exercise代写 加拿大代写 英国代写 作业代写 app代写 algorithm代写 作业加急 北美代写  北美作业代写 essay代写

合作平台:essay代写 论文代写 写手招聘 英国留学生代写

 

 

    关键字:

天才代写-代写联系方式