Creating Progressively Blurred Images and a GIF in R

Step 1: Set Up Your R Environment

Before you start, make sure you have R installed on your computer. You can download it from The Comprehensive R Archive Network (CRAN). Additionally, you’ll need to install RStudio, a popular integrated development environment for R, which you can get from the RStudio website.

Step 2: Install Required Packages

Open RStudio and install the required packages. You’ll need magick for image processing, here for easy file path management, and gifski for creating GIFs. Install these packages using the following commands:

install.packages("magick")
install.packages("here")
install.packages("gifski")

Step 3: Load the Packages

Load the installed packages into your R session with the library() function:

library(magick)
library(here)
library(gifski)

Step 4: Define the Function to Create Blurred Images

You’ll create a function named create_blurred_and_gif. This function takes an image file path as input, creates a series of progressively blurred images, and compiles them into a GIF (Credit). Here’s the function definition:

create_blurred_and_gif <- function(image_file) {
  # Define the base path and load the original image
  base_path <- here("projects", "hindsight-activity")
  original <- image_read(image_file)

  # Parameters for image processing
  num_images <- 20
  max_blur <- 99.9999
  
  # Generate images
  png_paths <- vector("character", num_images)
  for (i in 1:num_images) {
    blur_amount <- max_blur * (1 - (i-1) / (num_images-1))
    blurred <- image_blur(original, radius = blur_amount, sigma = blur_amount / 2)
    png_path <- file.path(base_path, sprintf("blurred_%02d.png", i))
    png_paths[i] <- png_path
    image_write(blurred, path = png_path)
  }
  
  # Create and save the GIF
  gif_path <- file.path(base_path, "blurred_images.gif")
  gifski(png_files = png_paths, gif_file = gif_path, width = 800, height = 600, delay = 0.5)
}

Step 5: Prepare Your Image

Place the image you want to use in the projects/hindsight-activity folder within your RStudio project directory.

Step 6: Run the Function

Call the function with the path to your image. Ensure the path is correct:

image_path <- here("projects", "hindsight-activity", "your_image.png")
create_blurred_and_gif(image_path)

Step 7: View the GIF

The GIF will be saved in the same directory as your original image. You can view it by navigating to the folder or by using R:

utils::browseURL(here("projects", "hindsight-activity", "blurred_images.gif"))

Conclusion

This simple tutorial shows you how to leverage R and its powerful packages to create visually appealing educational tools. Such techniques are particularly useful in psychological studies, visual effects demonstrations, or any field requiring detailed visual analysis.

Experimental Design

Revised Experimental Design:

a. Between-Subjects Factor:
- Condition: “old” vs “new” (2 levels)
b. Within-Subjects Factors:
- Phase: Phase 1 vs Phase 2 (2 levels)
- Fame Level: Extremely Famous vs Moderately Famous (2 levels)
Participants:
- 60 participants total (30 per between-subjects condition)
Stimuli:
- 20 celebrity faces in total
- 10 faces per phase
- In each phase:
  - 5 extremely famous celebrities
  - 5 moderately famous celebrities
Procedure:
- Phase 1: All participants view 5 extremely famous and 5 moderately famous faces
- Phase 2:
  - “Old” condition: Same 10 faces as Phase 1 (5 extremely, 5 moderately famous)
  - “New” condition: 10 new faces (5 extremely, 5 moderately famous)
Measurements for each trial:
- F1 score (time to identify)
- Correct identification (binary: 1 or 0)

This revised description correctly captures the full factorial design of the experiment. It’s a 2 (Condition: old vs new) × 2 (Phase: 1 vs 2) × 2 (Fame Level: extremely vs moderately famous) mixed factorial design. The Condition is between-subjects, while Phase and Fame Level are within-subjects factors.

Step-by-Step Guide to Simulating Experimental Data in R

Introduction

In psychological experiments, simulating data can be crucial for planning studies, testing data analysis pipelines, and teaching statistical concepts. This guide will explain how to simulate data for an experiment in which participants are shown pictures of celebrities and must identify them. The dataset will include different conditions and phases, and track whether participants correctly identify the celebrities.

Requirements

To follow along, you will need R installed on your computer. The script uses the following packages: - dplyr for data manipulation - tidyr for reshaping data - purrr for functional programming tools - haven for exporting data to SPSS

You can install these packages using the following command if you haven’t already:

install.packages(c("dplyr", "tidyr", "purrr", "haven"))

Step 1: Set Up Your Environment

First, load the necessary libraries and set a seed for reproducibility:

library(dplyr)
library(tidyr)
library(purrr)
library(haven)

# Set seed for reproducibility
set.seed(4747)

Step 2: Define Experiment Parameters

Define the basic parameters of your experiment:

n_participants <- 60  # Total number of participants
n_faces_per_phase <- 10  # Number of faces shown per phase
n_famous <- 5  # Number of extremely famous faces
n_moderately_famous <- 5  # Number of moderately famous faces

Step 3: Generate Participant Data

Create a data frame for your participants. Each participant is assigned to one of two conditions:

participants <- data.frame(
  participant_id = 1:n_participants,
  condition = rep(c("old", "new"), each = n_participants / 2)
)

Step 4: Generate Face Data

Create a data frame for the faces that will be shown to participants:

faces <- data.frame(
  face_id = 1:n_faces_per_phase,
  fame_level = rep(c(rep("extremely_famous", n_famous), rep("moderately_famous", n_moderately_famous)), 2)
)

Step 5: Simulate Trial Data

Define a function to simulate the outcome of each trial, including the time taken to identify the celebrity and whether the identification was correct:

simulate_trial <- function(fame_level, phase, condition) {
  base_score <- ifelse(fame_level == "extremely_famous", 15, 25)
  base_score <- base_score + rnorm(1, 0, 2)
  
  if (phase == 2) {
    if (condition == "old") {
      base_score <- base_score - 1.45 - rnorm(1, 0.5, 0.5)
    } else {
      base_score <- base_score - 1.45 - rnorm(1, 0, 0.5)
    }
  }
  
  correct <- rbinom(1, 1, ifelse(fame_level == "extremely_famous", 0.9, 0.7))
  return(list(score = base_score, correct = correct))
}

Step 6: Generate Data for Both Phases

Use the simulate_trial function to generate data across two phases for all participants and faces:

all_data <- map_df(1:2, function(phase) {
  crossing(participants, faces) |>
    mutate(
      trial_results = pmap(list(fame_level, phase, condition), simulate_trial),
      score = map_dbl(trial_results, "score"),
      correct = map_int(trial_results, "correct"),
      phase = phase
    ) |>
    select(-trial_results)
})

Step 7: Reshape the Data

Pivot the data so each row contains the scores from both phases and a single correctness indicator:

all_data <- all_data |>
  pivot_wider(
    id_cols = c(participant_id, face_id, condition, fame_level),
    names_from = phase,
    values_from = c(score, correct),
    names_glue = "{.value}_{phase}"
  ) |>
  mutate(
    correct = correct_1
  ) |>
  select(-correct_1, -correct_2)

Step 8: Label and Export Data

Add labels to your variables and export the data to an SPSS file:

spss_data <- all_data |>
  mutate(
    condition = factor(condition, levels = c("old", "new")),
    fame_level = factor(fame_level, levels = c("extremely_famous", "moderately_famous")),
    condition = labelled(as.numeric(condition), labels = c(Old = 1, New = 2), label = "Experimental Condition"),
    fame_level = labelled(as.numeric(fame_level), labels = c("Extremely Famous" = 1, "Moderately Famous" = 2), label = "Celebrity Fame Level"),
    score_1 = labelled(score_1, label = "Time to Identify Celebrity (Phase 1)"),
    score_2 = labelled(score_2, label = "Time to Identify Celebrity (Phase 2)"),
    correct = labelled(correct, labels = c("Incorrect" = 0, "Correct" = 1), label = "Correct Identification Across Phases"),
    participant_id = labelled(participant_id, label = "Participant ID"),
    face_id = labelled(face_id, label = "Face ID")
  )

write_sav(spss_data, here("FALL24", "weeks", "08-anova", "visual_hindsight_data.sav"))

cat("\nData has been saved as 'visual_hindsight_data.sav' in SPSS format\n")

Creating an Answer Key for Practice Assignment

```{r}
library(tidyverse)
library(dplyr)
library(jmvReadWrite)
library(haven)
library(jmv)


## Formatting p values 

format.pnum <- function(p, precision = 0.001) {
   # Calculate the number of digits needed based on the precision
   digits <- max(0, -floor(log10(precision)))

   # Format the p-value with the required number of digits
   if (p < precision) {
     formatted_p <- format(precision, nsmall = digits, scientific = FALSE)
   } else {
     formatted_p <- format(p, nsmall = digits, scientific = FALSE)
   }

   # Remove leading zeros if present
   formatted_p <- sub("^0\\.", ".", formatted_p)

   return(formatted_p)
}


# Formatting p-values formula
format.p <- function(p, precision = 0.001) {
  digits <- -log(precision, base = 10)
  p <- formatC(p, format = 'f', digits = digits)
  if (p < .001) {
    p = paste0('< ', precision)}
  if (p >= .001) {
    p = paste0('= ', p)    }
  sub("0", "", p)
}

#BG <- jmvReadWrite::read_omv('data/hindsight_BGanova.omv')
#WG <- jmvReadWrite::read_omv('data/hindsight_WGanova.omv')

# Load the data
BG <- haven::read_sav("hindsight-bias/data/hindsight_BGanova.sav")

# Preprocess and mutate the data
BGdata <- BG |> 
  filter(correct == 1) |>  # Limit to correct trials
  mutate(hb = score_2 - score_1)  # Compute Mdiff for hindsight bias

# Run the ANOVA
BGanova <- jmv::ANOVA(data = BGdata,
                      dep = "hb",
                      factors = "condition",
                      emMeans = list("condition"),
                      emmPlots = TRUE,
                      emmTables = TRUE)

print(BGanova)
#Use the str() function to inspect the structure of the BGanova object and identify where the ANOVA table is stored.
#print(names(BGanova))
#str(BGanova)

# Assuming the ANOVA results are stored in a component, let's say `main`
if ("main" %in% names(BGanova)) {
  BGanova_table <- as.data.frame(BGanova$main)
  print(BGanova_table)
} else {
  print("ANOVA results component not found.")
}

# Once you have the ANOVA table as a data frame, you can extract the p-value, F-statistic, and Mean Square Error by referencing the appropriate columns. These columns are typically named something like p, F, and MSE or Error.



# Assuming the ANOVA table is already in BGanova_table
# and that you have checked the names of the columns:
# Extract p-value (usually from the first row, 'condition' effect)

p_value <- BGanova_table$p[1]
# Extract F-statistic (usually from the first row, 'condition' effect)
F_statistic <- BGanova_table$F[1]
# Extract Mean Square Error (from the 'ms' column, second row for residuals)
MSe <- BGanova_table$ms[2]

# Extract degrees of freedom
df1 <- BGanova_table$df[1]
df2 <- BGanova_table$df[2]


# Print extracted values
print(paste("P-Value: ", p_value))
print(paste("F-Statistic: ", F_statistic))
print(paste("Mean Square Error: ", MSe))


# Round the values
F_value_rounded <- round(F_statistic, 2)
MSe_rounded <- round(MSe, 2)
p_value_rounded <- ifelse(p_value < .001, "< .001", round(p_value, 3))

# Print extracted values
print(paste("P-Value: ", p_value_rounded))
print(paste("F-Statistic: ", F_value_rounded))
print(paste("Mean Square Error: ", MSe_rounded))

# Create the formatted string
apa_result <- sprintf("F(%d, %d) = %.2f, MSe = %.2f, p %s", df1, df2, F_value_rounded, MSe_rounded, p_value_rounded)

# Print the result
cat("Results of the between-groups ANOVA:\n")
cat(apa_result, "\n")
```

Computing Descriptives for BG Analysis

```{r}
#| label: Get Descriptives
#| echo: false

# Assuming BGdata is loaded and contains the correct columns 'condition' and 'hb'
library(dplyr)

BGdescriptives <- BGdata |>
  group_by(condition) |>
  summarise(
    n = n(),
    mean = mean(hb, na.rm = TRUE),
    sd = sd(hb, na.rm = TRUE),
    .groups = "drop"
  )

# Print BGdescriptives to verify the structure
print(BGdescriptives)

# Get the unique conditions
conditions <- unique(BGdescriptives$condition)

# Check if we have exactly two conditions
if (length(conditions) == 2) {
  # Extract values for the first condition
  condition1_values <- BGdescriptives[BGdescriptives$condition == conditions[1], ]
  condition1_n <- condition1_values$n
  condition1_mean <- condition1_values$mean
  condition1_sd <- condition1_values$sd

  # Extract values for the second condition
  condition2_values <- BGdescriptives[BGdescriptives$condition == conditions[2], ]
  condition2_n <- condition2_values$n
  condition2_mean <- condition2_values$mean
  condition2_sd <- condition2_values$sd

# Print extracted values
  cat("Condition", conditions[1], "- n:", condition1_n, "Mean:", condition1_mean, "SD:", condition1_sd, "\n")
  cat("Condition", conditions[2], "- n:", condition2_n, "Mean:", condition2_mean, "SD:", condition2_sd, "\n")
} else {
  cat("Expected 2 conditions, but found", length(conditions), "conditions.\n")
  print(BGdescriptives)
}


```

Within-Groups Analysis

```{r}
#| label: WG data prep
#| echo: false
## Within Group Results
WG <- haven::read_sav("data/hindsight_WGanova.sav")

WGdata <- WG |> 
  mutate(
    EXTREMEdiff = EXTREMEavg_2 - EXTREMEavg_1,
    MODERATEdiff = MODERATEavg_2 - MODERATEavg_1,
  ) |> 
  select("participant_id",
         "EXTREMEdiff",
         "MODERATEdiff")

WGanova <- anovaRM(
    data = WGdata,
    rm = list(
        list(
            label = 'Fame Level',
            levels = c('Moderate', 'Extremely'))),
    rmCells = list(
        list(
            measure = 'MODERATEdiff',
            cell = 'Moderate'),
        list(
            measure = 'EXTREMEdiff',
            cell = 'Extremely')),
    rmTerms = list(
        'Fame Level'))

#print(WGanova)

# Assuming the ANOVA results are stored in a component, let's say `main`
if ("rmTable" %in% names(WGanova)) {
  WGanova_table <- as.data.frame(WGanova$rmTable)
  #print(WGanova_table)
} else {
  print("ANOVA results component not found.")
}

# Assuming the object is named WGanova_table

# Extract F value
WG_F_value <- WGanova_table$`F[none]`[1]

# Extract p value
WG_p_value <- WGanova_table$`p[none]`[1]

# Extract degrees of freedom
WG_df1 <- WGanova_table$`df[none]`[1]
WG_df2 <- WGanova_table$`df[none]`[2]

# Round the values
WG_F_value_rounded <- round(WG_F_value, 2)
WG_p_value_apa <- ifelse(WG_p_value < .001, "< .001", round(WG_p_value, 3))


WG_p_value_rounded <- format.pnum(WG_p_value)
  

```

```{r}
#| label: Generate means
#| echo: false

# Calculate means and SDs for each condition
moderate_stats <- WGdata |> 
  summarise(
    n = n(),
    mean = mean(MODERATEdiff, na.rm = TRUE),
    sd = sd(MODERATEdiff, na.rm = TRUE)
  )

extreme_stats <- WGdata |> 
  summarise(
    n = n(),
    mean = mean(EXTREMEdiff, na.rm = TRUE),
    sd = sd(EXTREMEdiff, na.rm = TRUE)
  )

## Moderate descriptives
moderate_n <- round(moderate_stats$n, 2)
moderate_Mean = round(moderate_stats$mean, 2)
moderate_SD <- round(moderate_stats$sd, 2)

## Extreme descriptives

extreme_n <- round(extreme_stats$n, 2)
extreme_Mean = round(extreme_stats$mean, 2)
extreme_SD <- round(extreme_stats$sd, 2)
                                                                                         
# Print the results
#cat("Moderate condition: N =", round(moderate_stats$n, 2), ", Mean =", round(moderate_stats$mean, 2), ", SD =", round(moderate_stats$sd, 2), "\n")
#cat("Extreme condition: Mean =", round(extreme_stats$mean, 2), ", SD =", round(extreme_stats$sd, 2), "\n")

```

Conclusion

You have now created a simulated data set for a psychological experiment using R. This dataset can be used for analysis, teaching, or further study design. Adjust the parameters and functions as needed to fit the specific needs of your experiment or hypothesis.