﻿ 大数据计算作业代写 COMP5434代写 - 数据库代写

# 大数据计算作业代写 COMP5434代写

2021-07-29 17:26 星期四 所属： 数据库代写 浏览：88

## COMP5434 (Fall 2019) Big Data Computing

Individual Assignment 2       Due Date: 10:00am, 2nd December, 2019

and follow our requirements in Section 2.

### 1. Problem statement 大数据计算作业代写

A sample input file is given below. Each line corresponds to a point-of-interest (POI), which contains a keyword, coordinate values x and y (separated by white space).大数据计算作业代写

 park 3 5 lake 2 3 mall 1 4 大数据计算作业代写 park 2 4 lake 9 8 mall 2 7

We measure the distance between two points p1=(x1,y1) and p2=(x2,y2) by:

_________________

dist(p1, p2) = Ö(x1 – x2)2 + (y1 – y2)2

Each keyword k is associated with a group G(k) of points.

[Example] The group of “park” contains two points: (3,5) and (2,4).

There are 2 questions in this programming assignment.
You should write a MapReduce program to solve each of them.大数据计算作业代写

#### Question Q1: Find the centroid (i.e., the mean position of points) of each group.

[Example]

Input: the sample input above

Output:

lake  5.5  5.5

mall  1.5  5.5

park  2.5  4.5

#### Question Q2: Find the diameter (i.e., the maximum distance between any two points inside a group) of each group.

[Example]

Input: the sample input above

Output:

lake  8.602 大数据计算作业代写

mall  3.162

park  1.414

### 2. Requirements 大数据计算作业代写

1. Though MapReduce support multiple languages, in this assignment, you should use Java (Java 8) for implementation.
2. You submission should be organized as follows

<YourStudentID> // your folder name, [Example] 19001234g

— Q1.java              // source file for question 1

— Q1.jar                // jar file for question 1, compiled and archived from Q1.java 大数据计算作业代写

— Q2.java              // source file for question 2

— Q2.jar                // jar file for question 2, compiled and archived from Q2.java

1. Archive the above structure as <YourStudentID>.zip and submit this .zip file in blackboard. [Example]zip
2. Make sure that you can compile your source file and run with the latest Hadoop version’s (i.e., Hadoop 3.2.1) pseudo-distributed mode.大数据计算作业代写
3. Your jar file should be directly runnable on Linux platform with the following call:

bin/hadoop jar Q1.jar Q1 <input path> <output path>

bin/hadoop jar Q2.jar Q2 <input path> <output path>

1. Your output result should preserve double precision.
2. You should only use one MapReduce round to solve each sub-question.
3. [Hint] You may use the Ubuntu image we provided for this assignment.

-The Y drive in COMP Lab: Y:\Subject\COMP5434
Note: These files will get expired on November 7!

20 marks will be given if your program can be compiled.

-for each .java file, 10 marks

80 marks will be given if your program is correct. We will test the correctness of your program by using 8 test cases (4 for each sub-question). 大数据计算作业代写

-For each test case, 10 marks

Notice this is an individual assignment. Plagiarism will result in 0 mark!