MAST10010: Data Analysis 1 Assignment 1
Due Date: Monday September 2nd, 12 noon.
- You must submit an electronic copy of your assignmentvia the LMS (TurnitIn).
- Clearly label your assignmentwith your name and student ID number, your tutors name, and your tutorial group (day and time). Where possible, please submit your answers as a word processed document (you may leave gaps to hand-write symbols and equations, if necessary).
- Deadlinesfor handing in your assignments are enforced and late assignments will not be accepted unless there are extenuating circumstances. You must talk to the lec- turer (Dr Julia Polak) personally about late assignments. Individual tutors have no discretion to accept late
This assignment consists of ftve (5) questions worth a total of 35 marks. It contributes 5% towards your ftnal grade.
You must use Minitab to produce any graphs, tables and descriptive statistics.
- must include your name/student number, which can be added by right-clicking the graphand selecting Add → Footnote or Add →
- must be relevant. You may look at many graphs, but you should only include the most relevant graph (if requested) for each
- should be clear: ensure that labels and titles are correct and appropriate; you can add gridlines/change symbols/colour as appropriate to make the graph clearer. There are some marks awarded for improving upon the default from
Must be relevant: you will be penalised for including statistics which are not relevant to the questions asked.
- must be in the context of thedata.
- should be supported by relevant statisticswhere
- should be concise and informative. Word limits, where given, must be strictly ad- hered to (all word limits are a maximum, you will be penalised for going over this limit!). You may usedot-points.
The Sudoku Data
For questions 1 to 4 you will need to access the data file 2019 Asst1 (file can be down- loaded from the Assignments link as a Minitab worksheet and also as an Excel file).
This file contains some of the data collected in the Sudoku Activity conducted in the computer lab session in week 2 of this semester. The purpose of the study was to inves- tigate factors that may affect time to complete a Sudoku puzzle. In this assignment we will concern ourselves with a subset of this data, those who successfully completed their assigned Sudoku puzzle.
To begin this assignment you will ftrst need to ensure that column C5 contains only numbers, and convert it to a numerical variable. For example, the second row has “7 hours”, but this needs to be changed to “7”. When using Minitab, after all the values
have been changed, use Data → Change Data Type → Text to Numeric. . . and select C5, then store it in C5, which should change “C5-T” into “C5”. Alternatively, if you clean the data using Excel, then you should be able to import the data into Minitab after doing so, and the variable should be numerical.
You will also need to create a new variable Time (secs) to complete the puzzle.
Setting up your new worksheet:
- Cleanthe data (as described above) to get a numerical column containing the hours of sleep the night before. Since not all tutorial groups asked this question, there are some missing values (there should 123 values and 27 missing values). Check that you have done this correctly by comparing your statistics with the
- Create a new column called Time (in secs) using the time data in columns C2 and C3.
Descriptive Statistics: Hours of sleep
|Variable N||N* Mean StDev||Minimum Q1||Median Q3 Maximum|
|How many hours did you s 123||27 6.955 1.771||1.000 6.000||7.000 8.000 12.000|
Question 1: Exploring the distribution of time taken to complete a Su- doku puzzle [2 + 4 + 2 = 8 marks]
- Produce a dotplot and relevant summary statistics for Time (secs), time to success- fully complete a Sudoku
- Describe the distribution of Time(secs).
- Comment on whether the higher times (300 seconds or more) haveanything in common. Restrict your comments to two sentences at
[Hint: use the brush command.]
Question 2: Investigating the effect of Type of Puzzle
[3 + 5 = 8 marks]
- Producea comparative boxplot and summary statistics, comparing the time taken for the four (4) different puzzle types. Cut and paste your plot and summary statistics into your
- Comment on the effect of Type of Puzzle on time to successfully complete a Sudoku puzzle.
Your comments must be less than 150 words.
Question 3: Investigating the effect of other variables
[3 + 3 + 2 = 8 marks]
- Show that those who had no previous experience with Sudoku puzzles were about 5 times more likely to have taken at least 300 seconds to complete their puzzlecompared to those who were experienced with Sudoku puzzles. Include a relevant table to support your answer (table can either be constructed by hand or using Minitab — either is acceptable).
- Produce a graph which shows the relationship between Time (secs) and Hours ofsleep the night before. Comment briefly on the relationship between sleep the night before and time to complete a sudoku
- Last year, in one tutorial group half of the students who completed Puzzle 3 had no previous experience, whereas those completing Puzzles 1, 2 & 4 all had previous experience.In no more than two sentences, explain how this would affect the results, and
Question 4: Study Design and Sudoku [2 + 2 + 2 = 6 marks]
Suppose you wanted to repeat the study but this time, each subject would do all 4 puzzles. Imagine you have 96 students for the study.
- For such a study, describe how the randomisation would be Illustrate this allocation (e.g. with a diagram/table) for the first five (5)students.
- What would be the specific purpose(s) of randomisation in this context? (Answer in one or twosentences)
- Give one advantage and one disadvantage of this design over the
Question 5: Chocs and Blocks [2 + 3 = 5 marks]
A study was set up to compare estimates produced from a judgment sample with those produced using a random sample. Two hundred students were recruited for the study, each student producing both a judgment estimate (first) and then a random estimate.
A very large number of irregularly shaped pieces of wood, all cut from the same block, were produced. Each piece was numbered. Students used an interactive digital image of the pieces, available online, to select their samples. Weights for their chosen pieces were recorded, along with the mean and standard deviation of their sample.
For the judgement samples, students were asked to ‘use their judgement only to select their sample, and not use any technical devices’. The random samples were generated automatically from the online site.
Instructions were given that the sample size for each method was 10 and that pooling samples with other students was not allowed.
Students used the judgment method first and were then informed of the true population mean before using the random sampling method.
- Is this study, as designed, an example of a paired studydesign? Explain.
Your answer should be no more than 50 words. . . two or three sentences at most.
- Assumethat the true mean weight of the population of wooden pieces is 5 (µX = 33.5) and the variability in these weights is given by sigma = 20.61 (σX = 20.61). Assume also that the number of blocks in the population was extremely large.
Give the mean and standard deviation for the total weight (T ) for a random sample of 10 pieces of wood sampled from this large population. You must show all working and reasoning.