Lifestyle as a predictor of loneliness: a longitudinal study

Data analysis code

Author
Affiliation

Bruno Braga Montezano

Published

June 7, 2024

1. Load the data

Code
df <- haven::read_sav("./data/covidpsy_weighted.sav") |>
    janitor::clean_names() |>
    haven::as_factor()

First, we used the read_sav function from the haven package to load the dataset. The original data has 8102 rows and 477 columns. Although, in the next step we will filter the data with subjects that present valid values for UCLA (UCLA Loneliness Scale 3-item version) in the first and fourth waves and evaluate just those that did not have loneliness at baseline (W1). We will also select only the variables to be used on the data routines. The clean_names function from the janitor package was used to clean the variable names in order to facilitate data manipulation later on.

2. Select variables of interest and filter the data

Code
df_loneliness <- df |> dplyr::select(
    ucla_total,
    ucla_total_w4,
    gender_birth,
    heterosexual = heterossexual,
    color,
    year_birth,
    education,
    education_grouped,
    household_income,
    number_people_house,
    unemployed,
    social_distancing,
    phq_total,
    gad_total,
    classif_audit,
    marijuana_frequency,
    physical_activity_dichotomized,
    physical_activity_grouped,
    sleep_quality,
    marital_status_grouped,
    friend_relationship,
    family_relationship,
    religion,
    sampling_weight,
    ipw_w4
) |>
    dplyr::filter(
        !is.na(ucla_total),
        !is.na(ucla_total_w4)
    )

In this step, we selected 25 variables. They are the following: UCLA total score on W1 (ucla_total), UCLA total score on W4 (ucla_total_w4), gender at birth (gender_birth), heterosexual (heterosexual), skin color (color), birth year (year_birth), seven categories of education (education), three categories of education (education_grouped), five levels of household income (household_income), number of people in your house including you (number_people_house), unemployment (unemployed), social distancing (social_distancing), PHQ-9 total score at W1 (phq_total), GAD-7 total score at W1 (gad_total), AUDIT risk [4 categories] (classif_audit), cannabis use frequency (marijuana_frequency), physical activity [binary variable] (physical_acitivty_dichotomized), physical activity [more or less than 150min/week] (physical_activity_grouped), sleep quality [four categories] (sleep_quality), marital status [with or without partner] (marital_status_grouped), friend relationship quality (friend_relationship), family relationship quality (family_relationship), has a religion (religion), sampling weight for demographics (sampling_weight), weight for attrition at W4 (ipw_w4).

After the removal of subjects with missing values on UCLA at W1 or W4, we ended up with a dataset of 944 subjects.

Code
df_loneliness <- df_loneliness |>
    dplyr::filter(ucla_total < 6)

After removing the subjects with loneliness (UCLA \(\geq\) 6) at baseline (W1), there were 473 participants remaining.

3. Exploratory data analysis

In this section, we use the skimr package to create a detailed summary for each variable in the dataset in order to get an overview of the data.

3.1. Skim the data

Code
skimr::skim(df_loneliness)
Data summary
Name df_loneliness
Number of rows 473
Number of columns 25
_______________________
Column type frequency:
factor 17
numeric 8
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
gender_birth 0 1 FALSE 2 Fem: 412, Mal: 61
heterosexual 0 1 FALSE 2 sim: 414, não: 59
color 0 1 FALSE 5 Bra: 278, Par: 116, Pre: 72, Ama: 6
education 0 1 FALSE 6 Pós: 249, Ens: 110, Ens: 79, Ens: 32
education_grouped 0 1 FALSE 3 Mas: 249, Som: 189, Up : 35
household_income 0 1 FALSE 5 B: 180, D: 114, C: 108, A: 50
unemployed 0 1 FALSE 2 No: 416, Yes: 57
social_distancing 0 1 FALSE 2 Sim: 446, Não: 27
classif_audit 0 1 FALSE 4 bai: 345, ris: 96, alt: 16, ris: 16
marijuana_frequency 0 1 FALSE 3 Não: 454, Sem: 13, Men: 6
physical_activity_dichotomized 0 1 FALSE 2 yes: 259, no: 214
physical_activity_grouped 0 1 FALSE 2 Les: 424, Mor: 49
sleep_quality 0 1 FALSE 4 Reg: 168, Bom: 162, Rui: 101, Exc: 42
marital_status_grouped 0 1 FALSE 2 Wit: 259, Wit: 214
friend_relationship 0 1 FALSE 4 Bon: 247, Reg: 142, Exc: 56, Rui: 28
family_relationship 0 1 FALSE 4 Bon: 266, Reg: 120, Exc: 76, Rui: 11
religion 0 1 FALSE 2 Não: 244, Sim: 229

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ucla_total 0 1 4.06 0.82 3.00 3.00 4.00 5.00 5.00 ▆▁▇▁▇
ucla_total_w4 0 1 4.49 1.60 3.00 3.00 4.00 6.00 9.00 ▇▂▂▁▁
year_birth 0 1 1981.33 12.10 1946.00 1974.00 1983.00 1991.00 2002.00 ▁▃▅▇▆
number_people_house 0 1 3.08 1.43 1.00 2.00 3.00 4.00 10.00 ▇▇▂▁▁
phq_total 0 1 9.96 6.26 0.00 5.00 9.00 14.00 26.00 ▇▇▆▃▂
gad_total 0 1 9.63 5.51 0.00 5.00 9.00 14.00 21.00 ▆▇▇▅▅
sampling_weight 0 1 0.71 1.93 0.12 0.15 0.21 0.44 20.00 ▇▁▁▁▁
ipw_w4 0 1 8.16 3.07 5.11 5.87 7.22 9.66 19.16 ▇▃▁▁▁

4. Feature engineering

In this section, we are going to modify variables and create some new ones in order to facilitate the data analysis. We’ll go through the following steps:

  1. Create an outcome variable based on UCLA total score at W4. Scores \(\geq\) 6 will be positive (1) and values \(<\) 6 will be negative (0).
  2. Build an age variable based on year_birth.
  3. Create quadratic and cubic terms (variables) for age (i.e. age_2 and age_3). We will also standardize all age variables with \(z\)-scores.
  4. Build a combined_weight variable that is the product of sampling_weight and ipw_w4 to weight analysis for both demographics and attrition.
  5. Standardize PHQ-9 (phq_total) and GAD-7 (gad_total) scores with \(z\)-scores to facilitate the interpretability of the coefficients.
  6. Dichotomize skin color variable (color) in white and non-white subjects.
  7. Dichotomize the AUDIT risk variable (classif_audit) into two levels: low risk and moderate/high/severe risk.
  8. Dichotomize cannabis use variable (marijuana frequency) into “uses cannabis” and “does not use cannabis”.
  9. Dichotomize the variables of quality of family relationships (family_relationship), quality of friendship relationships (friend_relationship), and sleep quality (sleep_quality) into two levels: Bad/Fair and Good/Excellent.
  10. Categorize the household income variable (household_income) into three levels of socioeconomic status: A/B (upper), C (middle) and D/E (lower).
  11. Recode all variables to easy and interpretable category labels.
  12. Remove the following variables: education, physical_activity_dichotomized, ucla_total, ucla_total_w4, year_birth, marijuana_frequency, classif_audit.
  13. Relocate the variables (not necessary per se).
Code
df_loneliness <- df_loneliness |>
    dplyr::mutate(
        # Step 1: Outcome variable based on UCLA score at W4
        outcome = dplyr::if_else(ucla_total_w4 >= 6, 1, 0),
        # Step 2: Age variable based on `year_birth`
        age = 2020 - year_birth,
        age_raw = age,
        # Step 3: Create quadratic and cubic terms for age
        age_2 = age ^ 2,
        age_3 = age ^ 3,
        dplyr::across(c(dplyr::starts_with("age"), -age_raw),
            \(x) (x - mean(x)) / sd(x)
        ),
        # Step 4: Build combined weight variable
        combined_weight = sampling_weight * ipw_w4,
        # Step 5: Standardize PHQ-9 and GAD-7 scores
        dplyr::across(
            dplyr::matches("^(phq|gad)_total$"),
            \(x) (x - mean(x)) / sd(x)
        ),
        # Step 6: Dichotomize skin color in white/non-white subjects
        color = as.factor(dplyr::case_match(color,
            "Branca" ~ "White",
            c("Parda", "Preta", "Amarela", "Indigena") ~ "Non-white"
        )),
        # Step 7: Dichotomize AUDIT risk score
        alcohol_risk = as.factor(dplyr::case_match(
            classif_audit,
            "baixo risco" ~ "Low risk",
            c("risco moderado", "alto risco", "risco severo") ~ "High risk"
        )),
        # Step 8: Create cannabis use variable
        cannabis_use = as.factor(dplyr::case_match( 
            marijuana_frequency,
            "Não" ~ "No",
            c("Menos que semanalmente", "Semanalmente ou Diariamente") ~ "Yes"
        )),
        # Step 9: Recode sleep quality, friend and family relationships
        dplyr::across(
            c(sleep_quality, friend_relationship, family_relationship),
            \(x) as.factor(dplyr::case_match(
                as.character(x),
                c("Ruim", "Ruins", "Regular", "Regulares") ~ "Worse",
                c("Bom", "Bons", "Excelente", "Excelentes") ~ "Better"
    ) |> forcats::fct_rev())
        ),
        # Step 10: Create three levels of socioeconomic status
        household_income = as.factor(dplyr::case_match(
            household_income,
            c("A", "B") ~ "Upper",
            "C" ~ "Middle",
            c("D", "E") ~ "Lower"
        )),
        # Step 11: Recode some variable labels and adjust reference levels
        heterosexual = relevel(as.factor(dplyr::case_match(
            heterosexual,
            "não" ~ "No",
            "sim" ~ "Yes"
        )), ref = "Yes"),
        religion = relevel(as.factor(dplyr::case_match(
            religion,
            "Não" ~ "No",
            "Sim" ~ "Yes"
        )), ref = "No"),
        social_distancing = as.factor(dplyr::case_match(
            social_distancing,
            "Sim" ~ "Yes",
            "Não" ~ "No"
        )),
        education_grouped = relevel(
            education_grouped,
            ref = "Up to high-school"
        ),
        alcohol_risk = relevel(
            alcohol_risk,
            ref = "Low risk"
        ),
        gender_birth = relevel(
            gender_birth,
            ref = "Male"
        ),
    ) |>
    # Step 12: Remove some variables
    dplyr::select(-c(education, physical_activity_dichotomized,
                     ucla_total, ucla_total_w4, year_birth,
                     marijuana_frequency, classif_audit)) |>
    # Step 13: Relocate the outcome variable
    dplyr::relocate(outcome, .before = dplyr::everything())

The dataset is formatted as below:

Code
knitr::kable(head(df_loneliness, 10))
outcome gender_birth heterosexual color education_grouped household_income number_people_house unemployed social_distancing phq_total gad_total physical_activity_grouped sleep_quality marital_status_grouped friend_relationship family_relationship religion sampling_weight ipw_w4 age age_raw age_2 age_3 combined_weight alcohol_risk cannabis_use
0 Female Yes Non-white Some college education Lower 2 No Yes -0.7935850 -1.3851317 Less than 150min/week Better With partner Worse Worse Yes 0.3063481 12.300424 -1.0476520 26 -0.9445699 -0.8243562 3.7682117 Low risk No
0 Female Yes White Some college education Lower 5 No Yes -0.1541204 -0.6589834 Less than 150min/week Worse Without partner Better Worse No 0.4129219 5.857325 -1.3782871 22 -1.1323569 -0.9226467 2.4186181 Low risk No
0 Female Yes White Master or Doctorate Upper 3 No Yes 0.6452103 0.0671649 Less than 150min/week Worse With partner Worse Worse No 0.2446609 5.910265 1.1014763 52 1.0389307 0.9211506 1.4460104 Low risk Yes
0 Female Yes Non-white Master or Doctorate Upper 2 No Yes -1.1133173 -0.1143722 Less than 150min/week Worse With partner Better Better Yes 0.1504424 6.925452 1.4321114 56 1.4614516 1.4178239 1.0418815 High risk No
0 Female Yes White Some college education Upper 2 No Yes 1.1248087 0.4302390 Less than 150min/week Worse Without partner Better Better Yes 0.1970159 8.664241 1.3494526 55 1.3528872 1.2867179 1.7069930 Low risk No
0 Female Yes Non-white Some college education Upper 5 No Yes -0.4738527 -0.6589834 Less than 150min/week Better With partner Better Better No 0.1352377 9.693859 -1.3782871 22 -1.1323569 -0.9226467 1.3109755 Low risk No
0 Female Yes White Some college education Lower 5 Yes No 0.9649426 1.7009985 Less than 150min/week Worse With partner Worse Worse No 0.6710622 8.782572 0.1095709 40 -0.0408448 -0.1657194 5.8936524 Low risk No
1 Female Yes Non-white Some college education Upper 2 Yes Yes -0.1541204 0.0671649 Less than 150min/week Better With partner Worse Worse No 0.1287050 5.551102 -0.7996756 29 -0.7831904 -0.7276973 0.7144546 High risk No
0 Female Yes White Master or Doctorate Upper 2 No No -0.9534511 -0.8405205 More than 150min/week Better Without partner Worse Better Yes 0.1343119 5.111151 0.0269122 39 -0.1181113 -0.2321307 0.6864886 Low risk No
0 Female No White Some college education Lower 2 No Yes 1.1248087 1.1563873 Less than 150min/week Better Without partner Better Better Yes 9.9837119 6.017146 0.1922297 41 0.0383779 -0.0959031 60.0734511 Low risk No

5. Descriptive analysis

5.1. Loneliness incidence

Code
df_loneliness |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    dplyr::count(outcome) |>
    ggplot2::ggplot(ggplot2::aes(y = n, x = outcome, label = n)) +
    ggplot2::geom_col(fill = "steelblue4") +
    ggplot2::theme_light(14, "IBM Plex Sans") +
    ggplot2::geom_label(
        size = 4,
        family = "IBM Plex Sans",
        label.padding = ggplot2::unit(0.5, "lines")
    ) +
    ggplot2::labs(x = "Loneliness at W4", y = "# of participants")

5.2. Weighted descriptive tables

In this section we will create three tables: the first, weighted from demographics (using sampling_weight), the second weighted based on the attrition (ipw_w4) associated with the fourth wave (W4), and the third based on the combined weight from demographics and also attrition (sampling_weight * ipw_w4).

Code
df_loneliness |>
    dplyr::select(-c(ipw_w4, age, age_2, age_3, combined_weight)) |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    survey::svydesign(
        ~ 1,
        data = _,
        weights = ~ sampling_weight) |>
    gtsummary::tbl_svysummary(
        by = outcome,
        include = -c(sampling_weight),
        digits = list(
            gtsummary::all_categorical() ~ c(0, 1, 1),
            gtsummary::all_continuous() ~ 2
        ),
        statistic = list(
            gtsummary::all_categorical() ~ "{n_unweighted} ({p_unweighted}%/{p}%)"
        ),
        label = list(
            gender_birth ~ "Sex",
            heterosexual ~ "Sexual orientation (heterosexual)",
            color ~ "Skin color",
            education_grouped ~ "Education",
            household_income ~ "Socioeconomic status",
            number_people_house ~ "Number of people in your house (including you)",
            unemployed ~ "Unemployed",
            social_distancing ~ "Social distancing",
            phq_total ~ "Depressive symptoms (PHQ-9)",
            gad_total ~ "Anxiety symptoms (GAD-7)",
            physical_activity_grouped ~ "Physical activity",
            sleep_quality ~ "Sleep quality",
            marital_status_grouped ~ "Marital status (grouped)",
            friend_relationship ~ "Friendship relationship quality",
            family_relationship ~ "Family relationship quality",
            religion ~ "Religion",
            age_raw ~ "Age (in years)",
            alcohol_risk ~ "Alcohol risk zone",
            cannabis_use ~ "Cannabis use"
        )
    ) |>
    gtsummary::add_overall(last = TRUE) |>
    gtsummary::add_p() |>
    gtsummary::modify_header(
        label = '**Characteristic**',
        stat_1 = '**No**, *n* = 348',
        stat_2 = '**Yes**, *n* = 125',
        stat_0 = '**Overall**, *n* = 473',
        p.value = '***p*-value**'
    ) |>
    gtsummary::modify_caption("Descriptive table stratified by loneliness
                              incidence, weighted by demographics.")
Descriptive table stratified by loneliness incidence, weighted by demographics.
Characteristic No, n = 3481 Yes, n = 1251 Overall, n = 4731 p-value2
Sex


0.10
    Male 45 (12.9%/43.7%) 16 (12.8%/23.8%) 61 (12.9%/38.6%)
    Female 303 (87.1%/56.3%) 109 (87.2%/76.2%) 412 (87.1%/61.4%)
Sexual orientation (heterosexual) 309 (88.8%/79.1%) 105 (84.0%/77.9%) 414 (87.5%/78.8%) >0.9
Skin color


0.8
    Non-white 136 (39.1%/48.6%) 59 (47.2%/45.9%) 195 (41.2%/47.9%)
    White 212 (60.9%/51.4%) 66 (52.8%/54.1%) 278 (58.8%/52.1%)
Education


0.4
    Up to high-school 20 (5.7%/12.8%) 15 (12.0%/29.4%) 35 (7.4%/17.1%)
    Some college education 135 (38.8%/42.6%) 54 (43.2%/32.6%) 189 (40.0%/40.0%)
    Master or Doctorate 193 (55.5%/44.6%) 56 (44.8%/38.0%) 249 (52.6%/42.9%)
Socioeconomic status


0.5
    Lower 93 (26.7%/66.6%) 42 (33.6%/71.0%) 135 (28.5%/67.8%)
    Middle 75 (21.6%/10.7%) 33 (26.4%/13.9%) 108 (22.8%/11.5%)
    Upper 180 (51.7%/22.6%) 50 (40.0%/15.1%) 230 (48.6%/20.7%)
Number of people in your house (including you) 3.00 (2.00, 4.00) 2.00 (1.00, 3.00) 3.00 (2.00, 4.00) 0.4
Unemployed 34 (9.8%/29.5%) 23 (18.4%/38.1%) 57 (12.1%/31.7%) 0.6
Social distancing 327 (94.0%/95.4%) 119 (95.2%/97.5%) 446 (94.3%/95.9%) 0.4
Depressive symptoms (PHQ-9) -0.63 (-1.27, 0.65) 0.56 (0.01, 1.95) -0.15 (-0.95, 1.12) 0.001
Anxiety symptoms (GAD-7) -0.48 (-1.02, 0.79) 0.73 (-0.11, 1.27) -0.11 (-1.02, 1.09) 0.005
Physical activity


<0.001
    Less than 150min/week 305 (87.6%/88.8%) 119 (95.2%/98.4%) 424 (89.6%/91.3%)
    More than 150min/week 43 (12.4%/11.2%) 6 (4.8%/1.6%) 49 (10.4%/8.7%)
Sleep quality


0.003
    Worse 185 (53.2%/48.8%) 84 (67.2%/81.1%) 269 (56.9%/57.1%)
    Better 163 (46.8%/51.2%) 41 (32.8%/18.9%) 204 (43.1%/42.9%)
Marital status (grouped)


0.2
    Without partner 146 (42.0%/56.8%) 68 (54.4%/72.2%) 214 (45.2%/60.7%)
    With partner 202 (58.0%/43.2%) 57 (45.6%/27.8%) 259 (54.8%/39.3%)
Friendship relationship quality


0.4
    Worse 117 (33.6%/33.8%) 53 (42.4%/44.9%) 170 (35.9%/36.7%)
    Better 231 (66.4%/66.2%) 72 (57.6%/55.1%) 303 (64.1%/63.3%)
Family relationship quality


0.3
    Worse 85 (24.4%/21.7%) 46 (36.8%/34.1%) 131 (27.7%/24.8%)
    Better 263 (75.6%/78.3%) 79 (63.2%/65.9%) 342 (72.3%/75.2%)
Religion 165 (47.4%/53.9%) 64 (51.2%/65.1%) 229 (48.4%/56.8%) 0.4
Age (in years) 44.00 (38.00, 52.00) 43.00 (32.00, 53.78) 44.00 (35.00, 53.33) >0.9
Alcohol risk zone


0.056
    Low risk 253 (72.7%/82.9%) 92 (73.6%/59.2%) 345 (72.9%/76.8%)
    High risk 95 (27.3%/17.1%) 33 (26.4%/40.8%) 128 (27.1%/23.2%)
Cannabis use 14 (4.0%/5.4%) 5 (4.0%/5.2%) 19 (4.0%/5.4%) >0.9
1 n (unweighted) (% (unweighted)/%); Median (IQR)
2 chi-squared test with Rao & Scott’s second-order correction; Wilcoxon rank-sum test for complex survey samples
Code
df_loneliness |>
    dplyr::select(-c(sampling_weight, age, age_2, age_3, combined_weight)) |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    survey::svydesign(
        ~ 1,
        data = _,
        weights = ~ ipw_w4) |>
    gtsummary::tbl_svysummary(
        by = outcome,
        include = -c(ipw_w4),
        digits = list(
            gtsummary::all_categorical() ~ c(0, 1, 1),
            gtsummary::all_continuous() ~ 2
        ),
        statistic = list(
            gtsummary::all_categorical() ~ "{n_unweighted} ({p_unweighted}%/{p}%)"
        ),
        label = list(
            gender_birth ~ "Sex",
            heterosexual ~ "Sexual orientation (heterosexual)",
            color ~ "Skin color",
            education_grouped ~ "Education",
            household_income ~ "Socioeconomic status",
            number_people_house ~ "Number of people in your house (including you)",
            unemployed ~ "Unemployed",
            social_distancing ~ "Social distancing",
            phq_total ~ "Depressive symptoms (PHQ-9)",
            gad_total ~ "Anxiety symptoms (GAD-7)",
            physical_activity_grouped ~ "Physical activity",
            sleep_quality ~ "Sleep quality",
            marital_status_grouped ~ "Marital status (grouped)",
            friend_relationship ~ "Friendship relationship quality",
            family_relationship ~ "Family relationship quality",
            religion ~ "Religion",
            age_raw ~ "Age (in years)",
            alcohol_risk ~ "Alcohol risk zone",
            cannabis_use ~ "Cannabis use"
        )
    ) |>
    gtsummary::add_overall(last = TRUE) |>
    gtsummary::add_p() |>
    gtsummary::modify_header(
        label = '**Characteristic**',
        stat_1 = '**No**, *n* = 348',
        stat_2 = '**Yes**, *n* = 125',
        stat_0 = '**Overall**, *n* = 473',
        p.value = '***p*-value**'
    ) |>
    gtsummary::modify_caption("Descriptive table stratified by loneliness
                              incidence, weighted by attrition.")
Descriptive table stratified by loneliness incidence, weighted by attrition.
Characteristic No, n = 3481 Yes, n = 1251 Overall, n = 4731 p-value2
Sex


>0.9
    Male 45 (12.9%/18.9%) 16 (12.8%/18.5%) 61 (12.9%/18.8%)
    Female 303 (87.1%/81.1%) 109 (87.2%/81.5%) 412 (87.1%/81.2%)
Sexual orientation (heterosexual) 309 (88.8%/87.9%) 105 (84.0%/81.2%) 414 (87.5%/86.1%) 0.10
Skin color


0.041
    Non-white 136 (39.1%/44.2%) 59 (47.2%/55.6%) 195 (41.2%/47.3%)
    White 212 (60.9%/55.8%) 66 (52.8%/44.4%) 278 (58.8%/52.7%)
Education


0.023
    Up to high-school 20 (5.7%/8.1%) 15 (12.0%/17.6%) 35 (7.4%/10.6%)
    Some college education 135 (38.8%/42.2%) 54 (43.2%/43.2%) 189 (40.0%/42.5%)
    Master or Doctorate 193 (55.5%/49.7%) 56 (44.8%/39.3%) 249 (52.6%/46.9%)
Socioeconomic status


0.2
    Lower 93 (26.7%/32.2%) 42 (33.6%/39.9%) 135 (28.5%/34.3%)
    Middle 75 (21.6%/21.8%) 33 (26.4%/24.0%) 108 (22.8%/22.4%)
    Upper 180 (51.7%/46.0%) 50 (40.0%/36.0%) 230 (48.6%/43.3%)
Number of people in your house (including you) 3.00 (2.00, 4.00) 3.00 (2.00, 4.00) 3.00 (2.00, 4.00) 0.2
Unemployed 34 (9.8%/10.4%) 23 (18.4%/22.4%) 57 (12.1%/13.7%) 0.003
Social distancing 327 (94.0%/93.5%) 119 (95.2%/95.7%) 446 (94.3%/94.1%) 0.4
Depressive symptoms (PHQ-9) -0.31 (-0.95, 0.59) 0.33 (-0.15, 1.12) -0.15 (-0.79, 0.65) <0.001
Anxiety symptoms (GAD-7) -0.30 (-0.84, 0.43) 0.29 (-0.30, 1.34) -0.11 (-0.84, 0.79) <0.001
Physical activity


0.010
    Less than 150min/week 305 (87.6%/89.9%) 119 (95.2%/96.6%) 424 (89.6%/91.7%)
    More than 150min/week 43 (12.4%/10.1%) 6 (4.8%/3.4%) 49 (10.4%/8.3%)
Sleep quality


0.003
    Worse 185 (53.2%/54.3%) 84 (67.2%/70.4%) 269 (56.9%/58.6%)
    Better 163 (46.8%/45.7%) 41 (32.8%/29.6%) 204 (43.1%/41.4%)
Marital status (grouped)


0.023
    Without partner 146 (42.0%/39.2%) 68 (54.4%/51.8%) 214 (45.2%/42.7%)
    With partner 202 (58.0%/60.8%) 57 (45.6%/48.2%) 259 (54.8%/57.3%)
Friendship relationship quality


0.11
    Worse 117 (33.6%/34.8%) 53 (42.4%/43.6%) 170 (35.9%/37.2%)
    Better 231 (66.4%/65.2%) 72 (57.6%/56.4%) 303 (64.1%/62.8%)
Family relationship quality


0.013
    Worse 85 (24.4%/26.0%) 46 (36.8%/39.0%) 131 (27.7%/29.5%)
    Better 263 (75.6%/74.0%) 79 (63.2%/61.0%) 342 (72.3%/70.5%)
Religion 165 (47.4%/47.7%) 64 (51.2%/51.4%) 229 (48.4%/48.7%) 0.5
Age (in years) 37.00 (29.00, 48.00) 35.00 (27.00, 42.00) 37.00 (28.00, 46.00) 0.025
Alcohol risk zone


0.9
    Low risk 253 (72.7%/70.6%) 92 (73.6%/71.5%) 345 (72.9%/70.9%)
    High risk 95 (27.3%/29.4%) 33 (26.4%/28.5%) 128 (27.1%/29.1%)
Cannabis use 14 (4.0%/4.0%) 5 (4.0%/5.3%) 19 (4.0%/4.4%) 0.6
1 n (unweighted) (% (unweighted)/%); Median (IQR)
2 chi-squared test with Rao & Scott’s second-order correction; Wilcoxon rank-sum test for complex survey samples
Code
df_loneliness |>
    dplyr::select(-c(ipw_w4, age, age_2, age_3, sampling_weight)) |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    survey::svydesign(
        ~ 1,
        data = _,
        weights = ~ combined_weight) |>
    gtsummary::tbl_svysummary(
        by = outcome,
        include = -c(combined_weight),
        digits = list(
            gtsummary::all_categorical() ~ c(0, 1, 1),
            gtsummary::all_continuous() ~ 2
        ),
        statistic = list(
            gtsummary::all_categorical() ~ "{n_unweighted} ({p_unweighted}%/{p}%)"
        ),
        label = list(
            gender_birth ~ "Sex",
            heterosexual ~ "Sexual orientation (heterosexual)",
            color ~ "Skin color",
            education_grouped ~ "Education",
            household_income ~ "Socioeconomic status",
            number_people_house ~ "Number of people in your house (including you)",
            unemployed ~ "Unemployed",
            social_distancing ~ "Social distancing",
            phq_total ~ "Depressive symptoms (PHQ-9)",
            gad_total ~ "Anxiety symptoms (GAD-7)",
            physical_activity_grouped ~ "Physical activity",
            sleep_quality ~ "Sleep quality",
            marital_status_grouped ~ "Marital status (grouped)",
            friend_relationship ~ "Friendship relationship quality",
            family_relationship ~ "Family relationship quality",
            religion ~ "Religion",
            age_raw ~ "Age (in years)",
            alcohol_risk ~ "Alcohol risk zone",
            cannabis_use ~ "Cannabis use"
        )
    ) |>
    gtsummary::add_overall(last = TRUE) |>
    gtsummary::add_p() |>
    gtsummary::modify_header(
        label = '**Characteristic**',
        stat_1 = '**No**, *n* = 348',
        stat_2 = '**Yes**, *n* = 125',
        stat_0 = '**Overall**, *n* = 473',
        p.value = '***p*-value**'
    ) |>
    gtsummary::modify_caption("Descriptive table stratified by loneliness
                              incidence, weighted by the combined weight.")
Descriptive table stratified by loneliness incidence, weighted by the combined weight.
Characteristic No, n = 3481 Yes, n = 1251 Overall, n = 4731 p-value2
Sex


0.079
    Male 45 (12.9%/58.7%) 16 (12.8%/33.3%) 61 (12.9%/52.7%)
    Female 303 (87.1%/41.3%) 109 (87.2%/66.7%) 412 (87.1%/47.3%)
Sexual orientation (heterosexual) 309 (88.8%/77.2%) 105 (84.0%/70.7%) 414 (87.5%/75.6%) 0.7
Skin color


0.9
    Non-white 136 (39.1%/57.0%) 59 (47.2%/54.7%) 195 (41.2%/56.5%)
    White 212 (60.9%/43.0%) 66 (52.8%/45.3%) 278 (58.8%/43.5%)
Education


0.6
    Up to high-school 20 (5.7%/18.0%) 15 (12.0%/31.1%) 35 (7.4%/21.1%)
    Some college education 135 (38.8%/40.8%) 54 (43.2%/32.9%) 189 (40.0%/38.9%)
    Master or Doctorate 193 (55.5%/41.1%) 56 (44.8%/36.0%) 249 (52.6%/39.9%)
Socioeconomic status


0.8
    Lower 93 (26.7%/72.5%) 42 (33.6%/72.3%) 135 (28.5%/72.4%)
    Middle 75 (21.6%/8.6%) 33 (26.4%/11.6%) 108 (22.8%/9.3%)
    Upper 180 (51.7%/18.9%) 50 (40.0%/16.1%) 230 (48.6%/18.3%)
Number of people in your house (including you) 3.00 (2.00, 4.00) 2.00 (1.00, 3.00) 3.00 (1.00, 4.00) 0.4
Unemployed 34 (9.8%/34.1%) 23 (18.4%/42.3%) 57 (12.1%/36.1%) 0.7
Social distancing 327 (94.0%/95.9%) 119 (95.2%/97.7%) 446 (94.3%/96.3%) 0.5
Depressive symptoms (PHQ-9) -0.63 (-1.27, 0.81) 0.49 (-0.31, 1.94) -0.31 (-1.11, 1.12) 0.015
Anxiety symptoms (GAD-7) -0.66 (-1.20, 0.79) 0.79 (-0.11, 1.25) -0.30 (-1.02, 1.16) 0.018
Physical activity


<0.001
    Less than 150min/week 305 (87.6%/92.7%) 119 (95.2%/99.0%) 424 (89.6%/94.2%)
    More than 150min/week 43 (12.4%/7.3%) 6 (4.8%/1.0%) 49 (10.4%/5.8%)
Sleep quality


0.004
    Worse 185 (53.2%/49.6%) 84 (67.2%/83.3%) 269 (56.9%/57.6%)
    Better 163 (46.8%/50.4%) 41 (32.8%/16.7%) 204 (43.1%/42.4%)
Marital status (grouped)


0.5
    Without partner 146 (42.0%/56.0%) 68 (54.4%/65.5%) 214 (45.2%/58.3%)
    With partner 202 (58.0%/44.0%) 57 (45.6%/34.5%) 259 (54.8%/41.7%)
Friendship relationship quality


0.6
    Worse 117 (33.6%/37.8%) 53 (42.4%/47.1%) 170 (35.9%/40.0%)
    Better 231 (66.4%/62.2%) 72 (57.6%/52.9%) 303 (64.1%/60.0%)
Family relationship quality


0.13
    Worse 85 (24.4%/19.3%) 46 (36.8%/36.0%) 131 (27.7%/23.3%)
    Better 263 (75.6%/80.7%) 79 (63.2%/64.0%) 342 (72.3%/76.7%)
Religion 165 (47.4%/56.4%) 64 (51.2%/63.0%) 229 (48.4%/58.0%) 0.7
Age (in years) 46.00 (38.00, 52.00) 42.00 (30.00, 53.65) 44.01 (35.00, 52.00) 0.8
Alcohol risk zone


0.040
    Low risk 253 (72.7%/83.3%) 92 (73.6%/57.5%) 345 (72.9%/77.1%)
    High risk 95 (27.3%/16.7%) 33 (26.4%/42.5%) 128 (27.1%/22.9%)
Cannabis use 14 (4.0%/4.2%) 5 (4.0%/6.3%) 19 (4.0%/4.7%) 0.7
1 n (unweighted) (% (unweighted)/%); Median (IQR)
2 chi-squared test with Rao & Scott’s second-order correction; Wilcoxon rank-sum test for complex survey samples

6. Bivariate analysis

In this step, we will run several bivariate analyses (\(loneliness ~ predictor\)) to pre-select the variables that will be used in the multi-predictor Poisson regression later on. The cut-off point we will use is based on \(p<0.2\). In the code below, we created a function to calculate the lower and upper bounds of the 95% confidence intervals (CI) within a tibble structure.

Code
confint_df <- function(x) {
    tibble::tibble(
        lower = x[2, 1],
        upper = x[2, 2]
    )
}

6.1. Sociodemographic domain

Code
df_socio <- df_loneliness |> 
    dplyr::select(
        outcome,
        age,
        age_2,
        age_3,
        gender_birth,
        heterosexual,
        education_grouped,
        household_income,
        color,
        unemployed,
        combined_weight
    )

df_socio |>
    purrr::keep(is.numeric) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of numeric predictors of the sociodemographic domain."
    )
Bivariate analysis of numeric predictors of the sociodemographic domain.
Variable Category \(\beta\) SE RR RR (lower) RR (upper) p-value
age value -0.089 0.033 0.914 0.856 0.976 0.007
age_2 value -0.053 0.031 0.948 0.892 1.007 0.087
age_3 value -0.028 0.028 0.973 0.920 1.027 0.328
Code
df_socio |>
    dplyr::select(-dplyr::contains("age")) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of categorical predictors of the sociodemographic domain."
    )
Bivariate analysis of categorical predictors of the sociodemographic domain.
Variable Category \(\beta\) SE RR RR (lower) RR (upper) p-value
color valueWhite 0.072 0.071 1.074 0.934 1.234 0.314
education_grouped valueSome college education -0.552 0.088 0.576 0.484 0.685 0.000
education_grouped valueMaster or Doctorate -0.489 0.087 0.614 0.484 0.685 0.000
gender_birth valueFemale 0.804 0.075 2.234 1.930 2.591 0.000
heterosexual valueNo 0.253 0.078 1.288 1.105 1.498 0.001
household_income valueMiddle 0.220 0.112 1.246 0.995 1.543 0.049
household_income valueUpper -0.122 0.097 0.885 0.995 1.543 0.209
unemployed valueNo -0.263 0.072 0.769 0.669 0.885 0.000

On the sociodemographic domain, the only variables that did not reach the significance cut-off were the cubic term of age (age_3), skin color (color), and “Upper” level of socioeconomic status (household_income). Since there are three levels in socioeconomic status (household_income), this variable will continue in further investigations.

6.2. Lifestyle domain

Code
df_ls <- df_loneliness |> 
    dplyr::select(
        outcome,
        physical_activity_grouped,
        sleep_quality,
        family_relationship,
        friend_relationship,
        marital_status_grouped,
        combined_weight
    )

df_ls |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of predictors of the lifestyle domain."
    )
Bivariate analysis of predictors of the lifestyle domain.
Variable Category \(\beta\) SE RR RR (lower) RR (upper) p-value
family_relationship valueBetter -0.618 0.074 0.539 0.467 0.623 0
friend_relationship valueBetter -0.287 0.071 0.750 0.653 0.862 0
marital_status_grouped valueWith partner -0.306 0.074 0.736 0.636 0.851 0
physical_activity_grouped valueMore than 150min/week -1.771 0.349 0.170 0.079 0.315 0
sleep_quality valueBetter -1.301 0.095 0.272 0.225 0.327 0

In the lifestyle domain, all variables have reached the significance cut-off.

6.3. Social domain

Code
df_social <- df_loneliness |> 
    dplyr::select(
        outcome,
        number_people_house,
        combined_weight
    )

df_social |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of predictors of the social domain."
    )
Bivariate analysis of predictors of the social domain.
Variable Category \(\beta\) SE RR RR (lower) RR (upper) p-value
number_people_house value -0.146 0.024 0.864 0.824 0.906 0

On the social domain, we only have the variable number of people that lives in your house [including you] (number_people_house) and it has passed the \(p<0.2\) cut-off for the hierarchical multi-predictor regression.

6.4. Clinical domain

Code
df_clinical <- df_loneliness |> 
    dplyr::select(
        outcome,
        phq_total,
        gad_total,
        alcohol_risk,
        cannabis_use,
        combined_weight
    )

df_clinical |>
    dplyr::select(-alcohol_risk, -cannabis_use) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of depressive and anxiety symptoms."
    )
Bivariate analysis of depressive and anxiety symptoms.
Variable Category \(\beta\) SE RR RR (lower) RR (upper) p-value
gad_total value 0.531 0.034 1.70 1.591 1.818 0
phq_total value 0.513 0.029 1.67 1.577 1.769 0
Code
df_clinical |>
    dplyr::select(-phq_total, -gad_total) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of alcohol risk and cannabis use."
    )
Bivariate analysis of alcohol risk and cannabis use.
Variable Category \(\beta\) SE RR RR (lower) RR (upper) p-value
alcohol_risk valueHigh risk 0.916 0.072 2.500 2.172 2.875 0.000
cannabis_use valueYes 0.301 0.146 1.351 1.003 1.778 0.039

Regarding the clinical domain, all variables have reached the statistical significance cut-off.

7. Hierarchical multi-predictor Poisson regression

We sequentially added the significant variables from the last analyses into the Poisson regression, starting from the most distal domain to the most proximal one (sociodemographic \(\rightarrow\) lifestyle/social \(\rightarrow\) clinical). The variables that presented \(p<0.1\) in the entry stage remained in the model until the end, even if they lost their significance in later stages, and those with \(p<0.05\) were considered significant.

7.1. Domain-based models

Code
# Sociodemographic domain - first entry
df_loneliness |>
    glm(outcome ~ age + age_2 + education_grouped + gender_birth +
        heterosexual + household_income + number_people_house +
        marital_status_grouped,
        data = _,
        family = poisson(link = "log"),
        weights = combined_weight
    ) |>
    broom::tidy(conf.int = TRUE) |>
    dplyr::select(-statistic, -std.error) |>
    dplyr::mutate(dplyr::across(c(estimate, conf.low, conf.high),
                                exp)) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::relocate(p.value, .after = dplyr::last_col()) |>
    dplyr::relocate(estimate, .before = conf.low) |>
    dplyr::mutate(p_lt_01 = dplyr::if_else(p.value < 0.1, "Yes", "No")) |>
    purrr::set_names(
        "Variable", "RR", "RR (lower)", "RR (upper)", "$p$-value", "$p<0.1$"
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Weighted poisson regression with $p$-value
        $<$ 0.2 in bivariate analysis in sociodemographic domain - first entry."
    )
Weighted poisson regression with \(p\)-value \(<\) 0.2 in bivariate analysis in sociodemographic domain - first entry.
Variable RR RR (lower) RR (upper) \(p\)-value \(p<0.1\)
age 0.386 0.276 0.547 0.000 Yes
age_2 2.501 1.790 3.464 0.000 Yes
education_groupedSome college education 0.423 0.348 0.514 0.000 Yes
education_groupedMaster or Doctorate 0.458 0.372 0.563 0.000 Yes
gender_birthFemale 2.672 2.265 3.160 0.000 Yes
heterosexualNo 2.203 1.809 2.675 0.000 Yes
household_incomeMiddle 1.380 1.088 1.732 0.007 Yes
household_incomeUpper 1.311 1.030 1.660 0.026 Yes
number_people_house 0.891 0.845 0.939 0.000 Yes
marital_status_groupedWith partner 0.924 0.782 1.089 0.349 No
Code
# Lifestyle and social domain - second entry
df_loneliness |>
    glm(outcome ~ age + age_2 + education_grouped + gender_birth +
        heterosexual + household_income + number_people_house +
        marital_status_grouped + family_relationship + friend_relationship +
        physical_activity_grouped + sleep_quality,
        data = _,
        family = poisson(link = "log"),
        weights = combined_weight
    ) |>
    broom::tidy(conf.int = TRUE) |>
    dplyr::select(-statistic, -std.error) |>
    dplyr::mutate(dplyr::across(c(estimate, conf.low, conf.high),
                                exp)) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::relocate(p.value, .after = dplyr::last_col()) |>
    dplyr::relocate(estimate, .before = conf.low) |>
    dplyr::mutate(p_lt_01 = dplyr::if_else(p.value < 0.1, "Yes", "No")) |>
    purrr::set_names(
        "Variable", "RR", "RR (lower)", "RR (upper)", "$p$-value", "$p<0.1$"
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Weighted poisson regression with $p$-value
        $<$ 0.2 in bivariate analysis in lifestyle and social domain -
        second entry."
    )
Weighted poisson regression with \(p\)-value \(<\) 0.2 in bivariate analysis in lifestyle and social domain - second entry.
Variable RR RR (lower) RR (upper) \(p\)-value \(p<0.1\)
age 0.701 0.475 1.045 0.078 Yes
age_2 1.475 1.006 2.144 0.044 Yes
education_groupedSome college education 0.590 0.483 0.722 0.000 Yes
education_groupedMaster or Doctorate 0.467 0.376 0.581 0.000 Yes
gender_birthFemale 2.247 1.866 2.712 0.000 Yes
heterosexualNo 1.926 1.566 2.360 0.000 Yes
household_incomeMiddle 1.512 1.196 1.893 0.000 Yes
household_incomeUpper 1.605 1.265 2.026 0.000 Yes
number_people_house 0.921 0.869 0.974 0.004 Yes
marital_status_groupedWith partner 0.755 0.637 0.892 0.001 Yes
family_relationshipBetter 0.668 0.555 0.805 0.000 Yes
friend_relationshipBetter 0.973 0.820 1.156 0.758 No
physical_activity_groupedMore than 150min/week 0.150 0.069 0.284 0.000 Yes
sleep_qualityBetter 0.329 0.268 0.401 0.000 Yes
Code
# Clinical domain - third entry
df_loneliness |>
    glm(outcome ~ age + age_2 + education_grouped + gender_birth +
        heterosexual + household_income + number_people_house +
        marital_status_grouped + family_relationship + friend_relationship +
        physical_activity_grouped + sleep_quality + phq_total + gad_total +
        alcohol_risk + cannabis_use,
        data = _,
        family = poisson(link = "log"),
        weights = combined_weight
    ) |>
    broom::tidy(conf.int = TRUE) |>
    dplyr::select(-statistic, -std.error) |>
    dplyr::mutate(dplyr::across(c(estimate, conf.low, conf.high),
                                exp)) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::relocate(p.value, .after = dplyr::last_col()) |>
    dplyr::relocate(estimate, .before = conf.low) |>
    dplyr::mutate(p_lt_01 = dplyr::if_else(p.value < 0.1, "Yes", "No")) |>
    purrr::set_names(
        "Variable", "RR", "RR (lower)", "RR (upper)", "$p$-value", "$p<0.1$"
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Weighted poisson regression with $p$-value
        $<$ 0.2 in bivariate analysis in clinical domain - third entry."
    )
Weighted poisson regression with \(p\)-value \(<\) 0.2 in bivariate analysis in clinical domain - third entry.
Variable RR RR (lower) RR (upper) \(p\)-value \(p<0.1\)
age 0.362 0.235 0.560 0.000 Yes
age_2 2.751 1.807 4.160 0.000 Yes
education_groupedSome college education 0.686 0.552 0.853 0.001 Yes
education_groupedMaster or Doctorate 0.670 0.525 0.853 0.001 Yes
gender_birthFemale 1.830 1.476 2.266 0.000 Yes
heterosexualNo 1.474 1.188 1.822 0.000 Yes
household_incomeMiddle 1.357 1.067 1.708 0.011 Yes
household_incomeUpper 1.400 1.080 1.803 0.010 Yes
number_people_house 0.976 0.921 1.034 0.424 No
marital_status_groupedWith partner 0.887 0.741 1.061 0.191 No
family_relationshipBetter 0.786 0.650 0.951 0.013 Yes
friend_relationshipBetter 0.760 0.630 0.917 0.004 Yes
physical_activity_groupedMore than 150min/week 0.180 0.079 0.354 0.000 Yes
sleep_qualityBetter 0.515 0.413 0.640 0.000 Yes
phq_total 1.166 1.031 1.317 0.014 Yes
gad_total 1.218 1.071 1.386 0.003 Yes
alcohol_riskHigh risk 1.731 1.435 2.084 0.000 Yes
cannabis_useYes 1.835 1.306 2.526 0.000 Yes

7.2. Final model

Code
# Block 1: sociodemographic
block_1 <- glm(outcome ~ age + age_2 + education_grouped + gender_birth +
    heterosexual + household_income + number_people_house +
    marital_status_grouped,
    data = df_loneliness,
    family = poisson(link = "log"),
    weights = combined_weight
)

# Block 2: Lifestyle and social
block_2 <- glm(outcome ~ age + age_2 + education_grouped + gender_birth +
    heterosexual + household_income +
    number_people_house + family_relationship + friend_relationship +
    physical_activity_grouped + sleep_quality,
    data = df_loneliness,
    family = poisson(link = "log"),
    weights = combined_weight
)

# Block 3: Clinical
block_3 <- glm(outcome ~ age + age_2 + education_grouped + gender_birth +
    heterosexual + household_income + family_relationship +
    physical_activity_grouped + sleep_quality + number_people_house +
    phq_total + gad_total + alcohol_risk + cannabis_use,
    data = df_loneliness,
    family = poisson(link = "log"),
    weights = combined_weight
)

huxtable::huxreg(
  list("Block 1: Sociodemographic" = block_1,
       "Block 2: Lifestyle and social" = block_2,
       "Block 3: Clinical" = block_3),
  stars = NULL,
  error_format = "[{conf.low},  {conf.high}], {dplyr::if_else(p.value < 0.001, 'p<0.001', paste0('p=', as.character(p.value)))}",
  error_pos = "same",
  bold_signif = 0.05,
  number_format = "%.3f",
  statistics = c(n = "nobs",
                 "-2log likelihood" = "logLik",
                 AIC = "AIC",
                 BIC = "BIC"),
  ci_level = 0.95,
  outer_borders = 1,
  tidy_args = list(exponentiate = TRUE, conf.int = TRUE),
  coefs = c(
    "(Intercept)" = "(Intercept)",
    "Age (z-score)" = "age",
    "Age (quadratic term)" = "age_2",
    "Education (college education)" = "education_groupedSome college education",
    "Education (master or PhD)" = "education_groupedMaster or Doctorate",
    "Sex (female)" = "gender_birthFemale",
    "Sexual orientation (non-heterosexual)" = "heterosexualNo",
    "Socioeconomic status (middle)" = "household_incomeMiddle",
    "Socioeconomic status (upper)" = "household_incomeUpper",
    "Socioeconomic status (upper)" = "household_incomeUpper",
    "Number of people living in your house" = "number_people_house",
    "Marital status (with partner)" = "marital_status_groupedWith partner",
    "Positive family relationship" = "family_relationshipBetter",
    "Positive friendship relationship" = "friend_relationshipBetter",
    "Physical activity (more than 150 min/week)" = "physical_activity_groupedMore than 150min/week",
    "Good sleep quality" = "sleep_qualityBetter",
    "Depressive symptoms (PHQ-9)" = "phq_total",
    "Anxiety symptoms (GAD-7)" = "gad_total",
    "Alcohol risk zone (moderate, high or severe)" = "alcohol_riskHigh risk",
    "Cannabis use (yes)" = "cannabis_useYes")
) |>
    huxtable::set_caption("Multi-predictor Poisson regression analysis to
        evaluate factors associated with loneliness incidence. The table
        presents the risk ratios and the corresponding 95% CIs with p-values.")
Multi-predictor Poisson regression analysis to evaluate factors associated with loneliness incidence. The table presents the risk ratios and the corresponding 95% CIs with p-values.
Block 1: SociodemographicBlock 2: Lifestyle and socialBlock 3: Clinical
(Intercept)0.258 [0.202, 0.329], p<0.0010.448 [0.325, 0.616], p<0.0010.240 [0.169, 0.341], p<0.001
Age (z-score)0.386 [0.276, 0.547], p<0.0010.693 [0.471, 1.029], p=0.0660.343 [0.224, 0.529], p<0.001
Age (quadratic term)2.501 [1.790, 3.464], p<0.0011.475 [1.011, 2.133], p=0.0412.849 [1.881, 4.287], p<0.001
Education (college education)0.423 [0.348, 0.514], p<0.0010.567 [0.465, 0.691], p<0.0010.684 [0.551, 0.851], p<0.001
Education (master or PhD)0.458 [0.372, 0.563], p<0.0010.458 [0.368, 0.568], p<0.0010.705 [0.556, 0.893], p=0.004
Sex (female)2.672 [2.265, 3.160], p<0.0012.317 [1.926, 2.795], p<0.0011.719 [1.393, 2.119], p<0.001
Sexual orientation (non-heterosexual)2.203 [1.809, 2.675], p<0.0011.976 [1.612, 2.414], p<0.0011.518 [1.227, 1.872], p<0.001
Socioeconomic status (middle)1.380 [1.088, 1.732], p=0.0071.497 [1.185, 1.871], p<0.0011.388 [1.094, 1.743], p=0.006
Socioeconomic status (upper)1.311 [1.030, 1.660], p=0.0261.432 [1.139, 1.790], p=0.0021.361 [1.062, 1.735], p=0.014
Number of people living in your house0.891 [0.845, 0.939], p<0.0010.902 [0.852, 0.953], p<0.0010.968 [0.913, 1.024], p=0.262
Marital status (with partner)0.924 [0.782, 1.089], p=0.349          
Positive family relationship     0.689 [0.574, 0.828], p<0.0010.730 [0.609, 0.875], p<0.001
Positive friendship relationship     0.970 [0.817, 1.151], p=0.725     
Physical activity (more than 150 min/week)     0.161 [0.074, 0.304], p<0.0010.177 [0.078, 0.348], p<0.001
Good sleep quality     0.340 [0.278, 0.414], p<0.0010.483 [0.390, 0.594], p<0.001
Depressive symptoms (PHQ-9)          1.214 [1.080, 1.366], p=0.001
Anxiety symptoms (GAD-7)          1.191 [1.049, 1.352], p=0.007
Alcohol risk zone (moderate, high or severe)          1.579 [1.322, 1.882], p<0.001
Cannabis use (yes)          1.750 [1.250, 2.399], p<0.001
n473    473    473    
-2log likelihood-1793.697-1678.612-1621.319
AIC3609.3953385.2233276.637
BIC3655.1453443.4503347.342

7.3. Odds ratio plot

7.3.1. With sort

Code
or_plot <- plot(parameters::parameters(
        block_3,
        exponentiate = TRUE
        ), show_labels = TRUE, size_text = 6, size_point = 1.2, sort = "descending") +
     #ggplot2::scale_y_discrete(labels = c(
     #"Physical activity [More than 150min/week]",
     #"Age",
     #"Sleep quality [Better]",
     #"Education [Some college education]",
     #"Education [Master or Doctorate]",
     #"Family relationship [Better]",
     #"Number of people living in your house",
     #"Anxiety symptoms [GAD-7]",
     #"Depressive symptoms [PHQ-9]",
     #"Household income [Upper]",
     #"Household income [Middle]",
     #"Heterosexual [No]",
     #"Alcohol risk [High risk]",
     #"Gender birth [Female]",
     #"Cannabis use [Yes]",
     #"Age (quadratic term)")
     #) +
    ggplot2::labs(x = "RR") +
    ggplot2::theme(
        axis.text = ggplot2::element_text(size = 18, color = "black"),
        axis.title = ggplot2::element_text(size = 18, color = "black")
    )

ggplot2::ggsave(filename = "~/tmp/or_plot_loneliness_sorted.png", or_plot,
dpi = 300, width = 16, height = 10)
Code
or_plot <- plot(parameters::parameters(
        block_3,
        exponentiate = TRUE
        ), show_labels = TRUE, size_text = 6, size_point = 1.2) +
     #ggplot2::scale_y_discrete(labels = c(
     #"Cannabis use [Yes]",
     #"Alcohol risk [High risk]",
     #"Anxiety symptoms [GAD-7]",
     #"Depressive symptoms [PHQ-9]",
     #"Number of people living in your house",
     #"Sleep quality [Better]",
     #"Physical activity [More than 150min/week]",
     #"Family relationship [Better]",
     #"Household income [Upper]",
     #"Household income [Middle]",
     #"Heterosexual [No]",
     #"Gender birth [Female]",
     #"Education [Master or Doctorate]",
     #"Education [Some college education]",
     #"Age (quadratic term)",
     #"Age")
     #) +
    ggplot2::labs(x = "RR") +
    ggplot2::theme(
        axis.text = ggplot2::element_text(size = 18, color = "black"),
        axis.title = ggplot2::element_text(size = 18, color = "black")
    )

ggplot2::ggsave(filename = "~/tmp/or_plot_loneliness_unsorted.png", or_plot,
dpi = 300, width = 16, height = 10)

Session information

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.0 (2024-04-24)
 os       Arch Linux
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Sao_Paulo
 date     2024-06-07
 pandoc   3.1.13 @ /usr/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version  date (UTC) lib source
 assertthat      0.2.1    2019-03-21 [1] CRAN (R 4.2.2)
 backports       1.5.0    2024-05-23 [1] CRAN (R 4.4.0)
 base64enc       0.1-3    2015-07-28 [1] CRAN (R 4.2.2)
 bayestestR      0.13.2   2024-02-12 [1] CRAN (R 4.3.2)
 broom           1.0.6    2024-05-17 [1] CRAN (R 4.4.0)
 broom.helpers   1.15.0   2024-04-05 [1] CRAN (R 4.3.3)
 broom.mixed     0.2.9.5  2024-04-01 [1] CRAN (R 4.3.3)
 cli             3.6.2    2023-12-11 [1] CRAN (R 4.3.2)
 coda            0.19-4.1 2024-01-31 [1] CRAN (R 4.3.2)
 codetools       0.2-20   2024-03-31 [2] CRAN (R 4.4.0)
 colorspace      2.1-0    2023-01-23 [1] CRAN (R 4.3.1)
 commonmark      1.9.1    2024-01-30 [1] CRAN (R 4.3.2)
 crayon          1.5.2    2022-09-29 [1] CRAN (R 4.2.2)
 datawizard      0.11.0   2024-06-05 [1] CRAN (R 4.4.0)
 DBI             1.2.3    2024-06-02 [1] CRAN (R 4.4.0)
 digest          0.6.35   2024-03-11 [1] CRAN (R 4.3.3)
 dplyr           1.1.4    2023-11-17 [1] CRAN (R 4.3.2)
 emmeans         1.10.2   2024-05-20 [1] CRAN (R 4.4.0)
 estimability    1.5.1    2024-05-12 [1] CRAN (R 4.4.0)
 evaluate        0.23     2023-11-01 [1] CRAN (R 4.3.2)
 fansi           1.0.6    2023-12-08 [1] CRAN (R 4.3.2)
 farver          2.1.2    2024-05-13 [1] CRAN (R 4.4.0)
 fastmap         1.2.0    2024-05-15 [1] CRAN (R 4.4.0)
 forcats         1.0.0    2023-01-29 [1] CRAN (R 4.2.2)
 furrr           0.3.1    2022-08-15 [1] CRAN (R 4.2.2)
 future          1.33.2   2024-03-26 [1] CRAN (R 4.3.3)
 generics        0.1.3    2022-07-05 [1] CRAN (R 4.2.2)
 ggplot2         3.5.1    2024-04-23 [1] CRAN (R 4.3.3)
 globals         0.16.3   2024-03-08 [1] CRAN (R 4.3.3)
 glue            1.7.0    2024-01-09 [1] CRAN (R 4.3.2)
 gt              0.10.1   2024-01-17 [1] CRAN (R 4.3.2)
 gtable          0.3.5    2024-04-22 [1] CRAN (R 4.3.3)
 gtsummary       1.7.2    2023-07-15 [1] CRAN (R 4.3.1)
 haven           2.5.4    2023-11-30 [1] CRAN (R 4.3.2)
 hms             1.1.3    2023-03-21 [1] CRAN (R 4.2.3)
 htmltools       0.5.8.1  2024-04-04 [1] CRAN (R 4.3.3)
 htmlwidgets     1.6.4    2023-12-06 [1] CRAN (R 4.3.2)
 huxtable        5.5.6    2024-02-15 [1] CRAN (R 4.3.2)
 insight         0.20.0   2024-06-04 [1] CRAN (R 4.4.0)
 janitor         2.2.0    2023-02-02 [1] CRAN (R 4.2.2)
 jsonlite        1.8.8    2023-12-04 [1] CRAN (R 4.3.2)
 knitr           1.47     2024-05-29 [1] CRAN (R 4.4.0)
 labeling        0.4.3    2023-08-29 [1] CRAN (R 4.3.1)
 later           1.3.2    2023-12-06 [1] CRAN (R 4.3.2)
 lattice         0.22-6   2024-03-20 [2] CRAN (R 4.4.0)
 lifecycle       1.0.4    2023-11-07 [1] CRAN (R 4.3.2)
 listenv         0.9.1    2024-01-29 [1] CRAN (R 4.3.2)
 lubridate       1.9.3    2023-09-27 [1] CRAN (R 4.3.1)
 magrittr        2.0.3    2022-03-30 [1] CRAN (R 4.2.2)
 markdown        1.13     2024-06-04 [1] CRAN (R 4.4.0)
 MASS            7.3-60.2 2024-04-26 [1] CRAN (R 4.4.0)
 Matrix          1.7-0    2024-04-26 [1] CRAN (R 4.4.0)
 mitools         2.4      2019-04-26 [1] CRAN (R 4.2.2)
 multcomp        1.4-25   2023-06-20 [1] CRAN (R 4.3.1)
 munsell         0.5.1    2024-04-01 [1] CRAN (R 4.3.3)
 mvtnorm         1.2-5    2024-05-21 [1] CRAN (R 4.4.0)
 nlme            3.1-164  2023-11-27 [1] CRAN (R 4.3.2)
 parallelly      1.37.1   2024-02-29 [1] CRAN (R 4.3.2)
 parameters      0.21.7   2024-05-14 [1] CRAN (R 4.4.0)
 pillar          1.9.0    2023-03-22 [1] CRAN (R 4.2.3)
 pkgconfig       2.0.3    2019-09-22 [1] CRAN (R 4.2.2)
 processx        3.8.4    2024-03-16 [1] CRAN (R 4.3.3)
 ps              1.7.6    2024-01-18 [1] CRAN (R 4.3.2)
 purrr           1.0.2    2023-08-10 [1] CRAN (R 4.3.1)
 quarto          1.4      2024-03-06 [1] CRAN (R 4.3.3)
 R6              2.5.1    2021-08-19 [1] CRAN (R 4.2.2)
 ragg            1.3.2    2024-05-15 [1] CRAN (R 4.4.0)
 Rcpp            1.0.12   2024-01-09 [1] CRAN (R 4.3.2)
 readr           2.1.5    2024-01-10 [1] CRAN (R 4.3.2)
 repr            1.1.7    2024-03-22 [1] CRAN (R 4.3.3)
 rlang           1.1.4    2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown       2.27     2024-05-17 [1] CRAN (R 4.4.0)
 rstudioapi      0.16.0   2024-03-24 [1] CRAN (R 4.3.3)
 sandwich        3.1-0    2023-12-11 [1] CRAN (R 4.3.2)
 sass            0.4.9    2024-03-15 [1] CRAN (R 4.3.3)
 scales          1.3.0    2023-11-28 [1] CRAN (R 4.3.2)
 see             0.8.4    2024-04-29 [1] CRAN (R 4.4.0)
 sessioninfo     1.2.2    2021-12-06 [1] CRAN (R 4.2.2)
 skimr           2.1.5    2022-12-23 [1] CRAN (R 4.2.3)
 snakecase       0.11.1   2023-08-27 [1] CRAN (R 4.3.1)
 stringi         1.8.4    2024-05-06 [1] CRAN (R 4.4.0)
 stringr         1.5.1    2023-11-14 [1] CRAN (R 4.3.2)
 survey          4.4-2    2024-03-20 [1] CRAN (R 4.3.3)
 survival        3.7-0    2024-06-05 [1] CRAN (R 4.4.0)
 systemfonts     1.1.0    2024-05-15 [1] CRAN (R 4.4.0)
 textshaping     0.4.0    2024-05-24 [1] CRAN (R 4.4.0)
 TH.data         1.1-2    2023-04-17 [1] CRAN (R 4.3.0)
 tibble          3.2.1    2023-03-20 [1] CRAN (R 4.2.3)
 tidyr           1.3.1    2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect      1.2.1    2024-03-11 [1] CRAN (R 4.3.3)
 timechange      0.3.0    2024-01-18 [1] CRAN (R 4.3.2)
 tzdb            0.4.0    2023-05-12 [1] CRAN (R 4.3.0)
 utf8            1.2.4    2023-10-22 [1] CRAN (R 4.3.1)
 vctrs           0.6.5    2023-12-01 [1] CRAN (R 4.3.2)
 withr           3.0.0    2024-01-16 [1] CRAN (R 4.3.2)
 xfun            0.44     2024-05-15 [1] CRAN (R 4.4.0)
 xml2            1.3.6    2023-12-04 [1] CRAN (R 4.3.2)
 xtable          1.8-4    2019-04-21 [1] CRAN (R 4.2.2)
 yaml            2.3.8    2023-12-11 [1] CRAN (R 4.3.2)
 zoo             1.8-12   2023-04-13 [1] CRAN (R 4.3.0)

 [1] /home/bruno/.local/share/r
 [2] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────