Data analysis
Mid-term project
数据分析课业代写 The mid-term project tests your understanding of both introductory programming and how to perform basic statistical analyses in R.
The mid-term project tests your understanding of both introductory programming and how to perform basic statistical analyses in R. Towards this end, you will analyze a real publicly available data set, the same as a real data scientist would (in this case, you are taking the role of a computational psychology researcher). The analysis will be guided by the questions and hints below.
At this point, you are expected to have a general understanding of the concepts covered so far, but details are still likely to be missing. By being forced to think about a real, applied, data analysis problem, you will be able to identify and review what you don’t know. As usual, you can use the slides, textbooks, and any other resources you want to complete the assignment. You can also discuss it with classmates at a high level, however, you cannot copy code from one another. 数据分析课业代写
The data we will analyze comes from the following paper, which you can download online:
Tom, S. M., Fox, C. R., Trepel, C., & Poldrack, R. A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811), 515-518.
This is a behavioral/fMRI study about how people make decisions between certain reward (“safe” choices) and mixed gambles (“risky” choices). We will analyze just the behavioral data. The details of the decision task that participants performed will be presented in class, and can be reviewed by referencing the original paper above. In brief: participants started the experiment with $30.
On each trial, they saw a 50/50 gamble with a positive amount of money and a negative amount. For example: +12 / -14, which meant, if they accepted the gamble (the risky choice), they would have a 50% chance of winning (an additional) $12 and a 50% chance of losing $14. If they rejected the gamble (the safe choice), they would simply keep the $30 they already had. The gamble was presented for 3 seconds.
The data for the task can be found on the OpenNeuro network 数据分析课业代写
(https://openneuro.org/datasets/ds000005/versions/00001). However, because the fMRI data takes a lot of disk space and we will not be analyzing it, you can download just the behavioral data here:
http://solwaylab.org/wp-content/uploads/mid_term_project_data.zip
Submit a *.R file with all of your analysis code, and a brief report in *.pdf format explaining the experimental data and questions, what you did, what the results were, the limitations, and what further experiments you might run to follow up. Include the following sections: Introduction, Methods, Results, and Discussion, and all generated plots in your report. The Introduction should define loss aversion, give a couple of sentences of background, and describe the questions your analysis will ask and your hypotheses about the outcomes.
The Methods should describe the data analytic techniques you’re using, how they work and their assumptions (in more detail than a traditional research paper; illustrate you know how the techniques work ‘under the hood’), and which aspects of the data you’re applying them to. The Results should describe what you found both quantitatively and qualitatively – describe what your results mean. The Discussion should provide a brief summary of what you did, describe the limitations of this work, and propose a follow up experiment. You should follow APA format, although grading will focus on data analysis and not formatting. There are no specific length requirements so long as you describe everything you did.
Step 0 – Inspect the data 数据分析课业代写
There are 16 subjects in this data set, with the experimental data for each in a separate subdirectory. e.g. the data for subject 1 is in a subdirectory called ‘sub-01’. Each subject directory has two subdirectories of its own: anat and func. The behavioral data are in func, and you can ignore anat. Each func directory contains (among other things) three *.tsv files. A *.tsv file is exactly the same as a *.csv file, with the column separator a tab instead of a comma, so don’t be confused!
Together, these three *.tsv files make up the behavioral data for the corresponding subject. The experiment was divided into three parts because it was run in a MRI scanner – this allowed the participant to take breaks and it reduces the effects of drifting scanner parameters on data analysis – not anything we have to worry about for looking at the behavioral data.
Open one of the *.tsv file in Excel and see what data (columns) are there. The columns of interest to us are: gain (the possible gain for the gamble), loss (the possible loss), respcat (1 for accepting the gamble, 0 for rejecting, and -1 if they did not respond), and response_time (how long it took to respond, in seconds).
There is also a file called participants.tsv in the top level directory. This file lists, for each participant, demographic data. You will make use of this file in Step 7.
Step 1 – Make a function to load all of the experimental data into a single data.frame 数据分析课业代写
Make a function called load_data() that will load all three runs for all subjects into a single data.frame. It should take no arguments, and return a master data.frame.
In the body of this function, you will have to loop through the subject IDs, load the three *.tsv files for each, and combine them into the single master data.frame.
First, define your master data.frame:
data = data.frame()
Now write a loop that will iterate over the 16 subjects.
[start loop]For subject i, you can construct a string containing the proper path to one of the data files using the built-in sprintf function as follows. 数据分析课业代写
For example, if i = 5, then calling sprintf(“[full path]/mid_term_project_data/sub-%0.2d/func/sub-%0.2d_task-mixedgamblestask_run-01_events.tsv”, i, i)
will return “[full path]/mid_term_project_data/sub-05/func/sub-05_task-mixedgamblestask_run 01_event.tsv”, which is the filename of run 1 for subject 5. You will need to replace [full path] with the full directory path to where you downloaded and unzipped the data folder. Inside your loop, load the three runs for subject i. To read a *.tsv file, you can use the same read.csv(..) function we’ve been using, by passing the argument sep=”\t”, e.g. read.csv(“my_file.tsv”, sep=“\t”). This sets the separator to be a tab instead of the default comma.
Store the output of your call to read.csv(..) in a variable called run_data. run_data will be a data.frame containing the data from the respective *.tsv file. Now, add another column to this data.frame called subid, containing the subject ID (i in your loop) of the current subject. First, use the rep(..) function to make a column of i’s, and then use the cbind(..) (for column bind) function to add the column to your data.frame:
run_data = cbind(run_data, sub=rep(i, nrow(run_data)))
We write “sub=” to call our new column “sub” for “subject”.
Now append run_data to your master data.frame using the rbind(..) (for row bind) function, which “stacks” two data.frames one on top of the other. e.g.
data = rbind(data, run_data)
will stack run_data below data.
Do this for each of the three runs.
[end loop]Be sure to return your master data.frame at the end of your function (after the loop).
Step 2 – Make a function to filter out trials where the participant didn’t respond in time 数据分析课业代写
There was a hard deadline of 3 seconds to make the decision for each trial. On trials where this deadline was missed, respcat has a value of -1. We need to remove these trials before continuing on with the current analysis.
Write a function called filter_data(..) that takes one argument, a data.frame of the type loaded in step 1, and returns another data.frame with missed trials removed.
First, define a new master data.frame inside your filter_data(..) function:
new_data = data.frame()
Write a loop to iterate over subjects.
[start loop]Use the subset(..) function to select the data for subject i, and store the result in a variable called sub_data.
Using a conditional statement, check if this subject has more than 10 trials where respcat is -1. If they do, print the following message:
print(sprintf(“Removing >= 10 trials for subject %d”, i))
In general, we may wish to further examine the subjects that missed a lot of trials, but for simplicity we will not do so here. Use the subset(..) function again, this time to remove the rows from sub_data where respcat is -1.
Use rbind(..) to stack the newly filtered sub_data below new_data.
[end loop]Return new_data.
Step 3 – Call your two functions to load the data 数据分析课业代写
Load the experimental data by calling your load_data(..) function and store the result in a variable called data. Then, pass this variable to your filter_data(..) function, overwriting the variable data with the output of this function call.
Step 4 – Compute individual differences in loss aversion
A well-known finding is that participants are much more sensitive to prospective losses than to gains. This is called loss aversion. In order for a 50/50 gamble to be equal to the “safe choice” of $0 (rejecting the gamble), on average, participants need the gain to be twice as large as the loss. For example, a gamble of +20 / -10 is roughly equivalent to accepting $0 for sure. Of course, there are individual differences in how people weigh gains and losses. We will compute each individual’s weights on losses and gains using logistic regression.
Define two new vectors that will hold, respectively, the weight of losses and the weight of gains for each participant:
weight_losses = rep(NA, 16)
weight_gains = rep(NA, 16)
Write a loop to iterate over subjects.
[start loop]Use the subset(..) function to select the data for subject i, and store the result in a variable called
sub_data.
For the data for this subject, perform logistic regression (use the glm(..) function) using respcat as the dependent variable and gain and loss as two independent variables.
Store the output of the call to glm(..) in a variable called fit. Remember, respcat is 1 for accepting the gamble and 0 for rejecting. The coefficient for gain will therefore determine how much the log odds of accepting the gamble changes for each potential dollar gained, and the coefficient for loss will determine how much the log odds for accepting the gamble changes for each potential dollar lost. In general, the coefficient for gain will be positive (increasing the potential gain will increase the probability of accepting the gamble) and the coefficient for loss will be negative (increasing the potential loss will decrease the probability of accepting the gamble).
Store the coefficients for this subject in your two weight vectors as follows:
weight_losses[i] = coef(fit)[“loss”]weight_gains[i] = coef(fit)[“gain”] [end loop]
You will see a total of two warning messages after running the logistic regression for all subjects. You can safely ignore these.
Step 5 – Is there an overall group difference in the weights on gains versus losses? 数据分析课业代写
Perform a t-test to compare the magnitudes of the weights on gains vs losses, i.e. weight_gains vs-weight_losses. Because in general the coefficients for gains will be positive and the coefficients for losses will be negative as explained above, taking the negative of weight_losses lets us talk about the relative magnitude of the two. Is there a significant difference between them? What is the difference in means? (Hint: you will need to set paired=TRUE in your call to t.test(..) because each loss and gain weight comes from the same subject. Why does that matter?)
Plot two histograms, one for each set of weights. Do these match the output of your t-test?
Step 6 – Look at the relationship between loss aversion and response times (exploratory)
Compute loss aversion, the ratio of the weights for losses to the weights for gains:
loss_aversion = -weight_losses / weight_gains
Use the aggregate(..) function to compute the mean response time for each subject.
Perform a linear regression with loss_aversion as the dependent variable and the mean response time as the independent variable. Is there a relationship between the two?
Create a scatter plot with loss aversion on the y-axis and mean response time on the x-axis.
Step 7 (optional) – Look at the relationship between loss aversion and age (exploratory) 数据分析课业代写
Read in participants.tsv in the top level directory and store the data in a variable called demographic_data. Store loss aversion, mean response time, and the newly loaded age variable in a single data.frame.
Run a linear regression with loss aversion as the dependent variable and both mean response time and age as the independent variables. Is there a relationship between age and loss aversion?
更多代写:北美cs代写 北美exam网课代考 英国高中代修网课案例 澳洲essay论文代写价格 北美Coursework写作 罗德学院代写