2. Write a Function!

A tutorial for applying functional programming to personality research

Raleigh Goodwin, Vinita Vader
05-28-2021

Introduction

Ipsatization

This is a tutorial on using functional programming to solve specific problems in research. This tutorial addresses the issue of ipsatization, which consists of methods of data transformation used in Personality Psychology and Social Psychology research. Ipsatization transforms each participant’s ratings relative to their average response such that the total and the average of the participant’s scores across all items in the data set are zero (or another constant for all people) (Greer and Dunlap, 1997). In simpler terms, it’s a transformation in which you compute an average response for each participant and then subtract that average from each of their individual responses.

Packages such as multicon have built functions like ipsatize() which enable standardizing rows of the dataframes being studied. However it does not address the various types of ipsative scorings available for carrying out different transformations.

An important aspect of using data transformations involves understanding the relationship between raw data and transformed data. The purpose of the function built here will be to address this specific issue.

Loading Libraries

Before we get started, we need to load the libraries necessary to complete this tutorial. Loading the entire library may not be always necessary, especially if you intend to use it only once. This will be the case for rio, here, and knitr in this tutorial, so you may choose not to load them here if you’d like.

library(tidyverse)
library(purrr)
library(rio) # optional
library(here) # optional
library(knitr) # optional

About the Data

For this tutorial, we will be working with a dataset containing the Ten Item Personality Inventory (TIPI; Gosling, S. D., Rentfrow, P. J., & Swann, W. B., Jr., 2003), a brief measure of the Big Five Personality Domains (Goldberg, 1993). Each item asks respondents to rate themselves on attributes (e.g., extroverted, critical, anxious, calm, etc.) using a Likert scale ranging from 1 to 7, wherein:

1 = “Disagree strongly”
2 = “Disagree moderately”
3 = “Disagree a little”
4 = “Neither agree nor disagree”
5 = “Agree a little”
6 = “Agree moderately”
7 = “Agree strongly”

This particular dataset contains observations from N=2495 individuals who completed, among many other measures, the 10 TIPI items in 2016. Other variables included the Generic Conspiracist Beliefs Scale (Brotherton et al., 2013), various response time metrics, a vocabulary validity check, and demographics.

Importing the Data

When importing data, two important things to keep in mind are your working directory and reproducibility. Where you save your files can impact the ease at which you can call them; you’ll have the best luck saving data files of interest within the corresponding R Project. rio’s import() function provides an easy method for importing data files, including the ability to set the class of the data to tibble using the setclass argument, which helps to retain the data in a format which is more amenable to data manipulation in tidyverse. To enhance reproducibility across different devices and potentially changing file paths, we’ll use the here() function within the here package when specifying our file path.

# Import data
full_df <- rio::import(here::here("content/dataCT.csv"), setclass = "tibble")

For the current project, we’ll only be working with the TIPI items, so to simplify the dataframe we’re using, we can select only those columns.

# Select desired variables
data <- full_df %>% 
  select(TIPI1:TIPI10)

Now we can take a look at the data we’ll be working with. The kable() function from the knitr package helps to format the data into a neat table.

# Take a look at the data
data %>% 
  head(n = 5) %>% # Take a look at the first 5 rows of the resulting dataframe
  knitr::kable() # Format the output table neatly
TIPI1 TIPI2 TIPI3 TIPI4 TIPI5 TIPI6 TIPI7 TIPI8 TIPI9 TIPI10
5 3 6 2 6 6 7 2 7 1
6 7 6 7 6 3 7 5 1 1
6 6 6 1 7 5 6 5 7 7
6 7 7 5 7 6 5 1 5 1
1 3 7 2 6 4 5 5 5 3

Another package is rmarkdown which can be used for creating neat tables in Distill. The function paged_table() creates a table in its own box on the page.

With the libraries loaded and data imported, we can now begin building our function.

Taking a look at the data

First, though, we can brush off our function-building skills with a simple function to start. In the code above, we used head() to take a look at the data and kable() to format it. We’ll be doing this same process many, many times throughout the tutorial, so it would be a very useful function for us. This function, glance(), will have two arguments: df and nrows, which are the dataframe and the desired number of preview rows, respectively.

glance <- function(df, nrows) {
  df %>% 
    head(n = nrows) %>% 
    knitr::kable()
}

# Test it out
data %>% 
  glance(5)
TIPI1 TIPI2 TIPI3 TIPI4 TIPI5 TIPI6 TIPI7 TIPI8 TIPI9 TIPI10
5 3 6 2 6 6 7 2 7 1
6 7 6 7 6 3 7 5 1 1
6 6 6 1 7 5 6 5 7 7
6 7 7 5 7 6 5 1 5 1
1 3 7 2 6 4 5 5 5 3

This works! If you’d like to read more about functions before continuing, chapter 6 of Hadley Wickham’s Advanced R is an incredibly useful resource. In this tutorial, we will now move on to writing something more complex, walking through an applied case of functional programming within personality research.

Building Functions

There are several ways in which one could go about building functions. The approach outlined here should be viewed as one of the several approaches to go about building functions.

As you think about building a function, keep in mind the purpose of why you set to build a function in the first place. Your function will ideally solve a problem specific to your analysis or can also be used by others to carry out their analyses.

Let’s state the problem first: The difference between raw and ipsatized data has been studied to some extent leading to several debates amongst researchers questioning the utility of these methods. It is therefore important to look at correlations between the raw and ipsatized data. With this function, we will perform the ipsatization transformation and correlate its results with the raw data.

Now that we understand the problem, let’s think about how our function could address this problem. Here are a sequence of questions which will help you think about the function you intend to build.

What is the goal of this function?
Basically, what do we need this function to do? For the current tutorial, we are writing a function that ipsatizes any dataset, meaning that it will compute the means of the rows and subtract the mean from every score in the respective rows. Ideally, it will produce output in the form of a list containing the raw and transformed (i.e., ipsatized) data, along with a correlation matrix.

How can we achieve this goal for a specific dataset?
When taking a functional programming approach to this problem, we should first attempt to solve it within a specific case. Once we’ve done so, we can then consider generalizing to a function. For this tutorial, we will be solving the problem first with the TIPI dataset, and then we can apply that solution to build the final function.

How can we break the function’s goal into smaller tasks?
Most likely, we aren’t just going to be writing one function in this tutorial. Ideally, a function should complete exactly one task; therefore, when we are attempting to build a function to complete a complicated task like ipsatization, we will need to write multiple simple functions and combine them. Thus, it can be helpful to first outline and think through each step of the process and eventually create a function for each step.

Solving for a specific case

To ipsatize the data, we need to calculate each participant’s mean response to the TIPI scale items and then subtract each response by that mean. This means we need to be able to conduct these operations by row rather than by column. One of the easier ways to do this is to use pivot_longer() to transform the data into a “longer” format.

First, though, we need to create an ID for each participant that can then be used to identify their responses once the data is transformed.

data_id <- data %>% 
  mutate(id = c(1:nrow(data))) # Create ID variable

data_id %>%
  glance(5)
TIPI1 TIPI2 TIPI3 TIPI4 TIPI5 TIPI6 TIPI7 TIPI8 TIPI9 TIPI10 id
5 3 6 2 6 6 7 2 7 1 1
6 7 6 7 6 3 7 5 1 1 2
6 6 6 1 7 5 6 5 7 7 3
6 7 7 5 7 6 5 1 5 1 4
1 3 7 2 6 4 5 5 5 3 5

Now that we have an ID variable that can be used to identify each participant’s responses, we can figure out how to create a column that calculates the mean of each rows using pivot_longer().

data_long <- data_id %>% 
  pivot_longer(cols = !id, names_to = "item", values_to = "response")

data_long %>% 
  glance(15)
id item response
1 TIPI1 5
1 TIPI2 3
1 TIPI3 6
1 TIPI4 2
1 TIPI5 6
1 TIPI6 6
1 TIPI7 7
1 TIPI8 2
1 TIPI9 7
1 TIPI10 1
2 TIPI1 6
2 TIPI2 7
2 TIPI3 6
2 TIPI4 7
2 TIPI5 6

Instead of participants’ responses being organized by row, all responses are now contained in one column and can be identified using the corresponding ID and Item values. We can use tidyverse’s group_by() function to group this dataframe by participant ID and then compute 1) the mean for each group and 2) the difference between each response and the mean of its group.

data_dev <- data_long %>% 
  group_by(id) %>% # Group by participant ID
  mutate(mean_row = mean(response, na.rm = TRUE), # Calculate participant mean
         ipsatized = response - mean_row) # Calculate individual response deviation from mean

data_dev %>% 
  glance(15)
id item response mean_row ipsatized
1 TIPI1 5 4.5 0.5
1 TIPI2 3 4.5 -1.5
1 TIPI3 6 4.5 1.5
1 TIPI4 2 4.5 -2.5
1 TIPI5 6 4.5 1.5
1 TIPI6 6 4.5 1.5
1 TIPI7 7 4.5 2.5
1 TIPI8 2 4.5 -2.5
1 TIPI9 7 4.5 2.5
1 TIPI10 1 4.5 -3.5
2 TIPI1 6 4.9 1.1
2 TIPI2 7 4.9 2.1
2 TIPI3 6 4.9 1.1
2 TIPI4 7 4.9 2.1
2 TIPI5 6 4.9 1.1

Now, we can use pivot_wider() to transform the data back to its original format. Because we want the function output to be formatted as a list that contains the ipsatized data, raw data, and a correlation matrix of the two, it will be helpful to create two dataframes: an ipsatized dataframe and a raw dataframe.

# Create ipsatized data frame
data_ips <- data_dev %>%
  pivot_wider(id_cols = id, names_from = item, values_from = c(response, ipsatized)) %>%
  select(id, contains("ipsatized")) %>%
   ungroup()

# Create raw dataframe
data_raw <- data_dev %>%
  pivot_wider(id_cols = id, names_from = item, values_from = c(response, ipsatized)) %>%
  select(id, contains("response")) %>%
   ungroup()

# Take a look at the results
data_ips %>% 
  glance(15)
id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3 ipsatized_TIPI4 ipsatized_TIPI5 ipsatized_TIPI6 ipsatized_TIPI7 ipsatized_TIPI8 ipsatized_TIPI9 ipsatized_TIPI10
1 0.5 -1.5 1.5 -2.5 1.5 1.5 2.5 -2.5 2.5 -3.5
2 1.1 2.1 1.1 2.1 1.1 -1.9 2.1 0.1 -3.9 -3.9
3 0.4 0.4 0.4 -4.6 1.4 -0.6 0.4 -0.6 1.4 1.4
4 1.0 2.0 2.0 0.0 2.0 1.0 0.0 -4.0 0.0 -4.0
5 -3.1 -1.1 2.9 -2.1 1.9 -0.1 0.9 0.9 0.9 -1.1
6 -0.2 -2.2 1.8 -2.2 1.8 0.8 1.8 -1.2 1.8 -2.2
7 -1.9 1.1 0.1 -1.9 1.1 2.1 -1.9 -0.9 1.1 1.1
8 -0.2 0.8 1.8 -2.2 2.8 -0.2 0.8 0.8 -1.2 -3.2
9 -0.3 0.7 1.7 -2.3 -0.3 0.7 1.7 -2.3 2.7 -2.3
10 -3.5 1.5 -1.5 -3.5 0.5 2.5 1.5 0.5 2.5 -0.5
11 -0.3 3.7 -2.3 3.7 3.7 -0.3 -2.3 -1.3 -2.3 -2.3
12 2.2 -0.8 -0.8 -3.8 2.2 -0.8 2.2 0.2 2.2 -2.8
13 1.6 -2.4 1.6 -3.4 1.6 0.6 2.6 -0.4 0.6 -2.4
14 -2.8 -2.8 1.2 0.2 2.2 1.2 2.2 0.2 1.2 -2.8
15 -2.7 -2.7 -2.7 2.3 1.3 3.3 3.3 3.3 -2.7 -2.7
data_raw %>% 
  glance(15)
id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4 response_TIPI5 response_TIPI6 response_TIPI7 response_TIPI8 response_TIPI9 response_TIPI10
1 5 3 6 2 6 6 7 2 7 1
2 6 7 6 7 6 3 7 5 1 1
3 6 6 6 1 7 5 6 5 7 7
4 6 7 7 5 7 6 5 1 5 1
5 1 3 7 2 6 4 5 5 5 3
6 4 2 6 2 6 5 6 3 6 2
7 2 5 4 2 5 6 2 3 5 5
8 4 5 6 2 7 4 5 5 3 1
9 4 5 6 2 4 5 6 2 7 2
10 1 6 3 1 5 7 6 5 7 4
11 3 7 1 7 7 3 1 2 1 1
12 7 4 4 1 7 4 7 5 7 2
13 6 2 6 1 6 5 7 4 5 2
14 2 2 6 5 7 6 7 5 6 2
15 1 1 1 6 5 7 7 7 1 1

Lastly, let’s create that list.

list_output <- list("ipsatized" = data_ips,
          "raw" = data_raw,
          "correlation_matrix" = cor(data_ips, data_raw))

list_output
$ipsatized
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>

$correlation_matrix
                           id response_TIPI1 response_TIPI2
id                1.000000000    -0.03087740    -0.03727441
ipsatized_TIPI1  -0.021481713     0.95997515    -0.11340834
ipsatized_TIPI2  -0.027501605    -0.12071784     0.95428074
ipsatized_TIPI3  -0.036290745     0.02535938    -0.16688353
ipsatized_TIPI4   0.022218869    -0.25970548     0.12817941
ipsatized_TIPI5   0.017044441     0.15867518    -0.13509164
ipsatized_TIPI6   0.033065029    -0.68318241    -0.07328539
ipsatized_TIPI7   0.021550310     0.19604318    -0.39102208
ipsatized_TIPI8   0.028246853    -0.09883568     0.05362126
ipsatized_TIPI9  -0.029634489     0.06676858    -0.27632275
ipsatized_TIPI10 -0.005017054    -0.22041641    -0.06129857
                 response_TIPI3 response_TIPI4 response_TIPI5
id                 -0.045915845    0.011284963   0.0008224971
ipsatized_TIPI1     0.031167709   -0.244068520   0.1736167712
ipsatized_TIPI2    -0.167224238    0.148187176  -0.1206510887
ipsatized_TIPI3     0.948872899   -0.298764735   0.0086683433
ipsatized_TIPI4    -0.307516250    0.962139678  -0.2550686241
ipsatized_TIPI5    -0.005073157   -0.256663167   0.9192189953
ipsatized_TIPI6    -0.024670231    0.073760081  -0.2270060851
ipsatized_TIPI7     0.007475973   -0.008942264   0.0736922080
ipsatized_TIPI8    -0.550478491    0.158821663  -0.0270662628
ipsatized_TIPI9     0.268938383   -0.690397982   0.0655434271
ipsatized_TIPI10   -0.108530999   -0.002177791  -0.3767678405
                 response_TIPI6 response_TIPI7 response_TIPI8
id                  0.021000522    0.007644219     0.01618021
ipsatized_TIPI1    -0.666541441    0.170445729    -0.08266381
ipsatized_TIPI2    -0.038939637   -0.403279907     0.06565301
ipsatized_TIPI3     0.015568018   -0.020080341    -0.54989652
ipsatized_TIPI4     0.091005873   -0.050084957     0.15131155
ipsatized_TIPI5    -0.203227013    0.024189448    -0.02356731
ipsatized_TIPI6     0.953180313   -0.215281446    -0.08954131
ipsatized_TIPI7    -0.162399055    0.949232042    -0.08558253
ipsatized_TIPI8    -0.067303304   -0.119072017     0.95695523
ipsatized_TIPI9    -0.009290563   -0.028893032    -0.33029253
ipsatized_TIPI10    0.040201343   -0.228842712    -0.10619799
                 response_TIPI9 response_TIPI10
id                  -0.03949432    -0.017594425
ipsatized_TIPI1      0.09172718    -0.187414036
ipsatized_TIPI2     -0.26483551    -0.031356832
ipsatized_TIPI3      0.29771049    -0.076414836
ipsatized_TIPI4     -0.68790076     0.007123348
ipsatized_TIPI5      0.08228079    -0.365950413
ipsatized_TIPI6     -0.02645471     0.029629981
ipsatized_TIPI7      0.01615832    -0.176200329
ipsatized_TIPI8     -0.32648284    -0.089488624
ipsatized_TIPI9      0.95761526    -0.057303325
ipsatized_TIPI10    -0.06741043     0.941250723

This list is what we set out to create! We’ve achieved our goal using this dataset.
Now that we’ve solved this problem in a specific case, we can begin to generalize it to a function. Or, rather, a set of functions!

Applying specific case to generalized function(s)

Since we want each function to only do one task, we can first outline the individual tasks that make up the ipsatization process.

  1. Add an ID variable to the dataframe
  2. Pivot the data to a longer format
  3. Calculate the mean of each row and transform each response by subtracting the row mean from it
  4. Pivot the data back to a wider format
  5. Create a list to organize the output

Now we can set out to make a function to complete each task. These functions don’t ever have to be used on their own; in the end, they’ll all be combined into a final, single function. This may seem like it’s making work more complicated, but this approach enhances readability of your code and aids in troubleshooting errors.
Since we’ve done the majority of the problem solving already, we can essentially copy and paste our code from above, making sure to adapt as necessary to the function format. Luckily, for the current tutorial, these changes mostly consist of changing the name of the dataframe input to “df,” which is the name of our only argument in this function.
After we build each function, we can test that it works by running it with a couple of datasets. Since we wrote this code with the TIPI dataset in mind, we can also test it with another dataset in order to catch any potential issues that may crop up when using different data. Though ipsatization is typically used in personality research, we’ll use the iris dataset as our second test case for simplicity.

  1. Add an ID variable to the dataframe
add_id <- function(df) {
  df %>%
    mutate(id = c(1:nrow(df)))
}
# Test it out
test1 <- data %>% 
  add_id()

test2 <- iris %>% 
  add_id()

test1 %>%  
  glance(5)
TIPI1 TIPI2 TIPI3 TIPI4 TIPI5 TIPI6 TIPI7 TIPI8 TIPI9 TIPI10 id
5 3 6 2 6 6 7 2 7 1 1
6 7 6 7 6 3 7 5 1 1 2
6 6 6 1 7 5 6 5 7 7 3
6 7 7 5 7 6 5 1 5 1 4
1 3 7 2 6 4 5 5 5 3 5
test2 %>%  
  glance(5)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species id
5.1 3.5 1.4 0.2 setosa 1
4.9 3.0 1.4 0.2 setosa 2
4.7 3.2 1.3 0.2 setosa 3
4.6 3.1 1.5 0.2 setosa 4
5.0 3.6 1.4 0.2 setosa 5

It works! However, when looking at the output for the dataframe iris, you may notice a difference between it and the specific case in which we originally wrote this code: This dataset contains character data in addition to numeric data. Before we go any further, we have to write code that extracts only numeric columns from the dataframe of interest.

1.5. Select only numeric columns from dataframe

We can accomplish this using the map_lgl() function from the purrr package, which maps the is.numeric() function to every column in the dataframe and is appropriate in this case because the output will be a logical vector. This will ensure that all the columns in the dataframe we are working with are numeric.
For more information about the purrr::map() family, see our first post.

First, we can try writing this code to solve the problem in the iris dataset specifically.

iris[ , purrr::map_lgl(iris, is.numeric)] %>% 
  glance(10)
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1

Just like before, we can now translate that code into a function. This time, we’ll also add a condition to our function: If there are no numeric columns in the dataset (i.e., if the sum of all possible numeric columns is 0), the loop will stop and the function will throw an error message. If there is at least one numeric column, the function will run as normal.

just_num <- function(df) {
  if(sum(purrr::map_lgl(df, is.numeric)) == 0) {
    stop("No numeric columns.")
  }
    else{
      df1 <- df[ , purrr::map_lgl(df, is.numeric)]
      df1
    }
}
# Test it out
test1 <- test1 %>% 
  just_num()

test2 <- test2 %>% 
  just_num()

test1 %>% 
  glance(15)
TIPI1 TIPI2 TIPI3 TIPI4 TIPI5 TIPI6 TIPI7 TIPI8 TIPI9 TIPI10 id
5 3 6 2 6 6 7 2 7 1 1
6 7 6 7 6 3 7 5 1 1 2
6 6 6 1 7 5 6 5 7 7 3
6 7 7 5 7 6 5 1 5 1 4
1 3 7 2 6 4 5 5 5 3 5
4 2 6 2 6 5 6 3 6 2 6
2 5 4 2 5 6 2 3 5 5 7
4 5 6 2 7 4 5 5 3 1 8
4 5 6 2 4 5 6 2 7 2 9
1 6 3 1 5 7 6 5 7 4 10
3 7 1 7 7 3 1 2 1 1 11
7 4 4 1 7 4 7 5 7 2 12
6 2 6 1 6 5 7 4 5 2 13
2 2 6 5 7 6 7 5 6 2 14
1 1 1 6 5 7 7 7 1 1 15
test2 %>% 
  glance(15)
Sepal.Length Sepal.Width Petal.Length Petal.Width id
5.1 3.5 1.4 0.2 1
4.9 3.0 1.4 0.2 2
4.7 3.2 1.3 0.2 3
4.6 3.1 1.5 0.2 4
5.0 3.6 1.4 0.2 5
5.4 3.9 1.7 0.4 6
4.6 3.4 1.4 0.3 7
5.0 3.4 1.5 0.2 8
4.4 2.9 1.4 0.2 9
4.9 3.1 1.5 0.1 10
5.4 3.7 1.5 0.2 11
4.8 3.4 1.6 0.2 12
4.8 3.0 1.4 0.1 13
4.3 3.0 1.1 0.1 14
5.8 4.0 1.2 0.2 15

To test our condition, we should also test this function with a dataset that has no numeric columns.

test3 <- tibble(letters, LETTERS)

test3 %>% 
  glance(5)
letters LETTERS
a A
b B
c C
d D
e E

When doing so, we can also explore the utility of the safely() function from purrr. If we use just_num() on test3 and it throws the correct error, we will not be able to knit this document. safely() allows you to create “safe” functions that will return output that also “captures” errors, which would normally stop a function from being able to run. This can be very useful for troubleshooting and will help us test our function on test3.

# Test it out with `safely()`
safe_just_num <- purrr::safely(just_num)

test3 %>% 
  safe_just_num()
$result
NULL

$error
<simpleError in .f(...): No numeric columns.>
# vs:

test4 <- tibble(1:5, 6:10)

test4 %>% 
  safe_just_num()
$result
# A tibble: 5 x 2
  `1:5` `6:10`
  <int>  <int>
1     1      6
2     2      7
3     3      8
4     4      9
5     5     10

$error
NULL

Critically, this function will allow our final function to generalize to multiple different datasets. With that done, we can continue with the rest of our outlined tasks.

  1. Pivot the data to a longer format
lengthen_data <- function(df) {
  df %>% 
  pivot_longer(cols = !id, names_to = "item", values_to = "response")
}
# Test it out
test1 <- test1 %>% 
  lengthen_data()

test2 <- test2 %>% 
  lengthen_data()

test1 %>% 
  glance(15)
id item response
1 TIPI1 5
1 TIPI2 3
1 TIPI3 6
1 TIPI4 2
1 TIPI5 6
1 TIPI6 6
1 TIPI7 7
1 TIPI8 2
1 TIPI9 7
1 TIPI10 1
2 TIPI1 6
2 TIPI2 7
2 TIPI3 6
2 TIPI4 7
2 TIPI5 6
test2 %>% 
  glance(15)
id item response
1 Sepal.Length 5.1
1 Sepal.Width 3.5
1 Petal.Length 1.4
1 Petal.Width 0.2
2 Sepal.Length 4.9
2 Sepal.Width 3.0
2 Petal.Length 1.4
2 Petal.Width 0.2
3 Sepal.Length 4.7
3 Sepal.Width 3.2
3 Petal.Length 1.3
3 Petal.Width 0.2
4 Sepal.Length 4.6
4 Sepal.Width 3.1
4 Petal.Length 1.5
  1. Calculate the mean of each row and transform each response by subtracting the row mean from it
transform_data <- function(df) {
  df %>% 
  group_by(id) %>% # Group by participant ID
  mutate(mean_row = mean(response, na.rm = TRUE), # Calculate participant mean
         ipsatized = response - mean_row) # Calculate individual response deviation from mean
}
# Test it out
test1 <- test1 %>% 
  transform_data()

test2 <- test2 %>% 
  transform_data()

test1 %>% 
  glance(15)
id item response mean_row ipsatized
1 TIPI1 5 4.5 0.5
1 TIPI2 3 4.5 -1.5
1 TIPI3 6 4.5 1.5
1 TIPI4 2 4.5 -2.5
1 TIPI5 6 4.5 1.5
1 TIPI6 6 4.5 1.5
1 TIPI7 7 4.5 2.5
1 TIPI8 2 4.5 -2.5
1 TIPI9 7 4.5 2.5
1 TIPI10 1 4.5 -3.5
2 TIPI1 6 4.9 1.1
2 TIPI2 7 4.9 2.1
2 TIPI3 6 4.9 1.1
2 TIPI4 7 4.9 2.1
2 TIPI5 6 4.9 1.1
test2 %>% 
  glance(15)
id item response mean_row ipsatized
1 Sepal.Length 5.1 2.550 2.550
1 Sepal.Width 3.5 2.550 0.950
1 Petal.Length 1.4 2.550 -1.150
1 Petal.Width 0.2 2.550 -2.350
2 Sepal.Length 4.9 2.375 2.525
2 Sepal.Width 3.0 2.375 0.625
2 Petal.Length 1.4 2.375 -0.975
2 Petal.Width 0.2 2.375 -2.175
3 Sepal.Length 4.7 2.350 2.350
3 Sepal.Width 3.2 2.350 0.850
3 Petal.Length 1.3 2.350 -1.050
3 Petal.Width 0.2 2.350 -2.150
4 Sepal.Length 4.6 2.350 2.250
4 Sepal.Width 3.1 2.350 0.750
4 Petal.Length 1.5 2.350 -0.850
  1. Pivot the data back to a wider format
widen_data <- function(df) {
  # Create ipsatized data frame
data_ips_id <- df %>%
  pivot_wider(id_cols = id, names_from = item, values_from = c(response, ipsatized)) %>%
  select(id, contains("ipsatized")) %>%
   ungroup()

# Create raw dataframe
data_raw_id <- df %>%
  pivot_wider(id_cols = id, names_from = item, values_from = c(response, ipsatized)) %>%
  select(id, contains("response")) %>%
   ungroup()

# Functions don't return multiple objects, so we have to wrap them into a single list
outputlist <- list("data_ips" = data_ips, 
                   "data_raw" = data_raw)

# Return list
return(outputlist)
}
# Test it out
test1 <- test1 %>% 
  widen_data()

test2 <- test2 %>% 
  widen_data()

test1
$data_ips
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$data_raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>
test2
$data_ips
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$data_raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>
  1. Create a list to organize the desired, final output
ipsatize_list <- function(df) {
  list("ipsatized" = df$data_ips,
          "raw" = df$data_raw,
          "correlation_matrix" = cor(df$data_ips, df$data_raw))
}
# Test it out
test1 %>%
  ipsatize_list()
$ipsatized
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>

$correlation_matrix
                           id response_TIPI1 response_TIPI2
id                1.000000000    -0.03087740    -0.03727441
ipsatized_TIPI1  -0.021481713     0.95997515    -0.11340834
ipsatized_TIPI2  -0.027501605    -0.12071784     0.95428074
ipsatized_TIPI3  -0.036290745     0.02535938    -0.16688353
ipsatized_TIPI4   0.022218869    -0.25970548     0.12817941
ipsatized_TIPI5   0.017044441     0.15867518    -0.13509164
ipsatized_TIPI6   0.033065029    -0.68318241    -0.07328539
ipsatized_TIPI7   0.021550310     0.19604318    -0.39102208
ipsatized_TIPI8   0.028246853    -0.09883568     0.05362126
ipsatized_TIPI9  -0.029634489     0.06676858    -0.27632275
ipsatized_TIPI10 -0.005017054    -0.22041641    -0.06129857
                 response_TIPI3 response_TIPI4 response_TIPI5
id                 -0.045915845    0.011284963   0.0008224971
ipsatized_TIPI1     0.031167709   -0.244068520   0.1736167712
ipsatized_TIPI2    -0.167224238    0.148187176  -0.1206510887
ipsatized_TIPI3     0.948872899   -0.298764735   0.0086683433
ipsatized_TIPI4    -0.307516250    0.962139678  -0.2550686241
ipsatized_TIPI5    -0.005073157   -0.256663167   0.9192189953
ipsatized_TIPI6    -0.024670231    0.073760081  -0.2270060851
ipsatized_TIPI7     0.007475973   -0.008942264   0.0736922080
ipsatized_TIPI8    -0.550478491    0.158821663  -0.0270662628
ipsatized_TIPI9     0.268938383   -0.690397982   0.0655434271
ipsatized_TIPI10   -0.108530999   -0.002177791  -0.3767678405
                 response_TIPI6 response_TIPI7 response_TIPI8
id                  0.021000522    0.007644219     0.01618021
ipsatized_TIPI1    -0.666541441    0.170445729    -0.08266381
ipsatized_TIPI2    -0.038939637   -0.403279907     0.06565301
ipsatized_TIPI3     0.015568018   -0.020080341    -0.54989652
ipsatized_TIPI4     0.091005873   -0.050084957     0.15131155
ipsatized_TIPI5    -0.203227013    0.024189448    -0.02356731
ipsatized_TIPI6     0.953180313   -0.215281446    -0.08954131
ipsatized_TIPI7    -0.162399055    0.949232042    -0.08558253
ipsatized_TIPI8    -0.067303304   -0.119072017     0.95695523
ipsatized_TIPI9    -0.009290563   -0.028893032    -0.33029253
ipsatized_TIPI10    0.040201343   -0.228842712    -0.10619799
                 response_TIPI9 response_TIPI10
id                  -0.03949432    -0.017594425
ipsatized_TIPI1      0.09172718    -0.187414036
ipsatized_TIPI2     -0.26483551    -0.031356832
ipsatized_TIPI3      0.29771049    -0.076414836
ipsatized_TIPI4     -0.68790076     0.007123348
ipsatized_TIPI5      0.08228079    -0.365950413
ipsatized_TIPI6     -0.02645471     0.029629981
ipsatized_TIPI7      0.01615832    -0.176200329
ipsatized_TIPI8     -0.32648284    -0.089488624
ipsatized_TIPI9      0.95761526    -0.057303325
ipsatized_TIPI10    -0.06741043     0.941250723
test2 %>%
  ipsatize_list()
$ipsatized
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>

$correlation_matrix
                           id response_TIPI1 response_TIPI2
id                1.000000000    -0.03087740    -0.03727441
ipsatized_TIPI1  -0.021481713     0.95997515    -0.11340834
ipsatized_TIPI2  -0.027501605    -0.12071784     0.95428074
ipsatized_TIPI3  -0.036290745     0.02535938    -0.16688353
ipsatized_TIPI4   0.022218869    -0.25970548     0.12817941
ipsatized_TIPI5   0.017044441     0.15867518    -0.13509164
ipsatized_TIPI6   0.033065029    -0.68318241    -0.07328539
ipsatized_TIPI7   0.021550310     0.19604318    -0.39102208
ipsatized_TIPI8   0.028246853    -0.09883568     0.05362126
ipsatized_TIPI9  -0.029634489     0.06676858    -0.27632275
ipsatized_TIPI10 -0.005017054    -0.22041641    -0.06129857
                 response_TIPI3 response_TIPI4 response_TIPI5
id                 -0.045915845    0.011284963   0.0008224971
ipsatized_TIPI1     0.031167709   -0.244068520   0.1736167712
ipsatized_TIPI2    -0.167224238    0.148187176  -0.1206510887
ipsatized_TIPI3     0.948872899   -0.298764735   0.0086683433
ipsatized_TIPI4    -0.307516250    0.962139678  -0.2550686241
ipsatized_TIPI5    -0.005073157   -0.256663167   0.9192189953
ipsatized_TIPI6    -0.024670231    0.073760081  -0.2270060851
ipsatized_TIPI7     0.007475973   -0.008942264   0.0736922080
ipsatized_TIPI8    -0.550478491    0.158821663  -0.0270662628
ipsatized_TIPI9     0.268938383   -0.690397982   0.0655434271
ipsatized_TIPI10   -0.108530999   -0.002177791  -0.3767678405
                 response_TIPI6 response_TIPI7 response_TIPI8
id                  0.021000522    0.007644219     0.01618021
ipsatized_TIPI1    -0.666541441    0.170445729    -0.08266381
ipsatized_TIPI2    -0.038939637   -0.403279907     0.06565301
ipsatized_TIPI3     0.015568018   -0.020080341    -0.54989652
ipsatized_TIPI4     0.091005873   -0.050084957     0.15131155
ipsatized_TIPI5    -0.203227013    0.024189448    -0.02356731
ipsatized_TIPI6     0.953180313   -0.215281446    -0.08954131
ipsatized_TIPI7    -0.162399055    0.949232042    -0.08558253
ipsatized_TIPI8    -0.067303304   -0.119072017     0.95695523
ipsatized_TIPI9    -0.009290563   -0.028893032    -0.33029253
ipsatized_TIPI10    0.040201343   -0.228842712    -0.10619799
                 response_TIPI9 response_TIPI10
id                  -0.03949432    -0.017594425
ipsatized_TIPI1      0.09172718    -0.187414036
ipsatized_TIPI2     -0.26483551    -0.031356832
ipsatized_TIPI3      0.29771049    -0.076414836
ipsatized_TIPI4     -0.68790076     0.007123348
ipsatized_TIPI5      0.08228079    -0.365950413
ipsatized_TIPI6     -0.02645471     0.029629981
ipsatized_TIPI7      0.01615832    -0.176200329
ipsatized_TIPI8     -0.32648284    -0.089488624
ipsatized_TIPI9      0.95761526    -0.057303325
ipsatized_TIPI10    -0.06741043     0.941250723

Now let’s combine all of these functions together!

ipsatize <- function(df) {
  df %>% 
    just_num() %>% 
    add_id() %>% 
    lengthen_data() %>% 
    transform_data() %>% 
    widen_data() %>% 
    ipsatize_list()
}
# Test it out
ipsatize(data)
$ipsatized
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>

$correlation_matrix
                           id response_TIPI1 response_TIPI2
id                1.000000000    -0.03087740    -0.03727441
ipsatized_TIPI1  -0.021481713     0.95997515    -0.11340834
ipsatized_TIPI2  -0.027501605    -0.12071784     0.95428074
ipsatized_TIPI3  -0.036290745     0.02535938    -0.16688353
ipsatized_TIPI4   0.022218869    -0.25970548     0.12817941
ipsatized_TIPI5   0.017044441     0.15867518    -0.13509164
ipsatized_TIPI6   0.033065029    -0.68318241    -0.07328539
ipsatized_TIPI7   0.021550310     0.19604318    -0.39102208
ipsatized_TIPI8   0.028246853    -0.09883568     0.05362126
ipsatized_TIPI9  -0.029634489     0.06676858    -0.27632275
ipsatized_TIPI10 -0.005017054    -0.22041641    -0.06129857
                 response_TIPI3 response_TIPI4 response_TIPI5
id                 -0.045915845    0.011284963   0.0008224971
ipsatized_TIPI1     0.031167709   -0.244068520   0.1736167712
ipsatized_TIPI2    -0.167224238    0.148187176  -0.1206510887
ipsatized_TIPI3     0.948872899   -0.298764735   0.0086683433
ipsatized_TIPI4    -0.307516250    0.962139678  -0.2550686241
ipsatized_TIPI5    -0.005073157   -0.256663167   0.9192189953
ipsatized_TIPI6    -0.024670231    0.073760081  -0.2270060851
ipsatized_TIPI7     0.007475973   -0.008942264   0.0736922080
ipsatized_TIPI8    -0.550478491    0.158821663  -0.0270662628
ipsatized_TIPI9     0.268938383   -0.690397982   0.0655434271
ipsatized_TIPI10   -0.108530999   -0.002177791  -0.3767678405
                 response_TIPI6 response_TIPI7 response_TIPI8
id                  0.021000522    0.007644219     0.01618021
ipsatized_TIPI1    -0.666541441    0.170445729    -0.08266381
ipsatized_TIPI2    -0.038939637   -0.403279907     0.06565301
ipsatized_TIPI3     0.015568018   -0.020080341    -0.54989652
ipsatized_TIPI4     0.091005873   -0.050084957     0.15131155
ipsatized_TIPI5    -0.203227013    0.024189448    -0.02356731
ipsatized_TIPI6     0.953180313   -0.215281446    -0.08954131
ipsatized_TIPI7    -0.162399055    0.949232042    -0.08558253
ipsatized_TIPI8    -0.067303304   -0.119072017     0.95695523
ipsatized_TIPI9    -0.009290563   -0.028893032    -0.33029253
ipsatized_TIPI10    0.040201343   -0.228842712    -0.10619799
                 response_TIPI9 response_TIPI10
id                  -0.03949432    -0.017594425
ipsatized_TIPI1      0.09172718    -0.187414036
ipsatized_TIPI2     -0.26483551    -0.031356832
ipsatized_TIPI3      0.29771049    -0.076414836
ipsatized_TIPI4     -0.68790076     0.007123348
ipsatized_TIPI5      0.08228079    -0.365950413
ipsatized_TIPI6     -0.02645471     0.029629981
ipsatized_TIPI7      0.01615832    -0.176200329
ipsatized_TIPI8     -0.32648284    -0.089488624
ipsatized_TIPI9      0.95761526    -0.057303325
ipsatized_TIPI10    -0.06741043     0.941250723
ipsatize(iris)
$ipsatized
# A tibble: 2,495 x 11
      id ipsatized_TIPI1 ipsatized_TIPI2 ipsatized_TIPI3
   <int>           <dbl>           <dbl>           <dbl>
 1     1           0.5            -1.5              1.5 
 2     2           1.10            2.10             1.10
 3     3           0.4             0.4              0.4 
 4     4           1               2                2   
 5     5          -3.10           -1.10             2.9 
 6     6          -0.2            -2.2              1.8 
 7     7          -1.9             1.1              0.1 
 8     8          -0.2             0.800            1.8 
 9     9          -0.300           0.7              1.7 
10    10          -3.5             1.5             -1.5 
# … with 2,485 more rows, and 7 more variables:
#   ipsatized_TIPI4 <dbl>, ipsatized_TIPI5 <dbl>,
#   ipsatized_TIPI6 <dbl>, ipsatized_TIPI7 <dbl>,
#   ipsatized_TIPI8 <dbl>, ipsatized_TIPI9 <dbl>,
#   ipsatized_TIPI10 <dbl>

$raw
# A tibble: 2,495 x 11
      id response_TIPI1 response_TIPI2 response_TIPI3 response_TIPI4
   <int>          <int>          <int>          <int>          <int>
 1     1              5              3              6              2
 2     2              6              7              6              7
 3     3              6              6              6              1
 4     4              6              7              7              5
 5     5              1              3              7              2
 6     6              4              2              6              2
 7     7              2              5              4              2
 8     8              4              5              6              2
 9     9              4              5              6              2
10    10              1              6              3              1
# … with 2,485 more rows, and 6 more variables: response_TIPI5 <int>,
#   response_TIPI6 <int>, response_TIPI7 <int>, response_TIPI8 <int>,
#   response_TIPI9 <int>, response_TIPI10 <int>

$correlation_matrix
                           id response_TIPI1 response_TIPI2
id                1.000000000    -0.03087740    -0.03727441
ipsatized_TIPI1  -0.021481713     0.95997515    -0.11340834
ipsatized_TIPI2  -0.027501605    -0.12071784     0.95428074
ipsatized_TIPI3  -0.036290745     0.02535938    -0.16688353
ipsatized_TIPI4   0.022218869    -0.25970548     0.12817941
ipsatized_TIPI5   0.017044441     0.15867518    -0.13509164
ipsatized_TIPI6   0.033065029    -0.68318241    -0.07328539
ipsatized_TIPI7   0.021550310     0.19604318    -0.39102208
ipsatized_TIPI8   0.028246853    -0.09883568     0.05362126
ipsatized_TIPI9  -0.029634489     0.06676858    -0.27632275
ipsatized_TIPI10 -0.005017054    -0.22041641    -0.06129857
                 response_TIPI3 response_TIPI4 response_TIPI5
id                 -0.045915845    0.011284963   0.0008224971
ipsatized_TIPI1     0.031167709   -0.244068520   0.1736167712
ipsatized_TIPI2    -0.167224238    0.148187176  -0.1206510887
ipsatized_TIPI3     0.948872899   -0.298764735   0.0086683433
ipsatized_TIPI4    -0.307516250    0.962139678  -0.2550686241
ipsatized_TIPI5    -0.005073157   -0.256663167   0.9192189953
ipsatized_TIPI6    -0.024670231    0.073760081  -0.2270060851
ipsatized_TIPI7     0.007475973   -0.008942264   0.0736922080
ipsatized_TIPI8    -0.550478491    0.158821663  -0.0270662628
ipsatized_TIPI9     0.268938383   -0.690397982   0.0655434271
ipsatized_TIPI10   -0.108530999   -0.002177791  -0.3767678405
                 response_TIPI6 response_TIPI7 response_TIPI8
id                  0.021000522    0.007644219     0.01618021
ipsatized_TIPI1    -0.666541441    0.170445729    -0.08266381
ipsatized_TIPI2    -0.038939637   -0.403279907     0.06565301
ipsatized_TIPI3     0.015568018   -0.020080341    -0.54989652
ipsatized_TIPI4     0.091005873   -0.050084957     0.15131155
ipsatized_TIPI5    -0.203227013    0.024189448    -0.02356731
ipsatized_TIPI6     0.953180313   -0.215281446    -0.08954131
ipsatized_TIPI7    -0.162399055    0.949232042    -0.08558253
ipsatized_TIPI8    -0.067303304   -0.119072017     0.95695523
ipsatized_TIPI9    -0.009290563   -0.028893032    -0.33029253
ipsatized_TIPI10    0.040201343   -0.228842712    -0.10619799
                 response_TIPI9 response_TIPI10
id                  -0.03949432    -0.017594425
ipsatized_TIPI1      0.09172718    -0.187414036
ipsatized_TIPI2     -0.26483551    -0.031356832
ipsatized_TIPI3      0.29771049    -0.076414836
ipsatized_TIPI4     -0.68790076     0.007123348
ipsatized_TIPI5      0.08228079    -0.365950413
ipsatized_TIPI6     -0.02645471     0.029629981
ipsatized_TIPI7      0.01615832    -0.176200329
ipsatized_TIPI8     -0.32648284    -0.089488624
ipsatized_TIPI9      0.95761526    -0.057303325
ipsatized_TIPI10    -0.06741043     0.941250723

The utility of safely() is especially apparent with this function. In addition to running code that would normally stop with an error, it also tells you in which specific function the error is occurring: in just_num().

# Test it out with `safely()`
safe_ipsatize <- safely(ipsatize)

test3 %>% 
  safe_ipsatize()
$result
NULL

$error
<simpleError in just_num(.): No numeric columns.>

So now we have a list with the three dataframes. Our function-writing journey is officially complete. Finally, we can explore a couple of ways to use this function and its output.

When using this function on projects for personality research, we may want to look at the correlations between raw and ipsatized data.

ipsdat <- ipsatize(data)

tibble(
  diag(ipsdat$correlation_matrix), colnames(ipsdat$ipsatized), colnames(ipsdat$raw)
) %>% 
  rename(Correlation = `diag(ipsdat$correlation_matrix)`,
         Ipsatized = `colnames(ipsdat$ipsatized)`,
         Raw = `colnames(ipsdat$raw)`
) %>% 
  filter(Ipsatized != "id",
         Raw != "id") %>% 
  knitr::kable()
Correlation Ipsatized Raw
0.9599752 ipsatized_TIPI1 response_TIPI1
0.9542807 ipsatized_TIPI2 response_TIPI2
0.9488729 ipsatized_TIPI3 response_TIPI3
0.9621397 ipsatized_TIPI4 response_TIPI4
0.9192190 ipsatized_TIPI5 response_TIPI5
0.9531803 ipsatized_TIPI6 response_TIPI6
0.9492320 ipsatized_TIPI7 response_TIPI7
0.9569552 ipsatized_TIPI8 response_TIPI8
0.9576153 ipsatized_TIPI9 response_TIPI9
0.9412507 ipsatized_TIPI10 response_TIPI10

We can also plot raw and ipsatized data. For example, let’s look at Item 1.

TIPI_item1 <- data.frame(ipsdat$raw$response_TIPI1, ipsdat$ipsatized$ipsatized_TIPI1) %>% 
  rename(Raw = ipsdat.raw.response_TIPI1,
         Ipsatized = ipsdat.ipsatized.ipsatized_TIPI1) %>% 
  pivot_longer(cols = Raw:Ipsatized, names_to = "Data", values_to = "Item1")

TIPI_item1 %>% 
  ggplot() +
  geom_density(aes(x = Item1, color = Data, fill = Data), alpha = .6) +
  labs(x = "TIPI Item 1", y = "Density", title="Comparison of Raw and Ipsatized Scores") +
  colorblindr::scale_color_OkabeIto() +
  colorblindr::scale_fill_OkabeIto() +
  theme_minimal()

The correlation plot here indicates the multimodal nature of raw data which is reduced to a great extend in the ipsatized data. This helps to limit within person variability affecting the structural assessment of personality.

This concludes our tutorial on writing complex functions using a functional programming approach.