Lifestyle as a predictor of loneliness: a longitudinal study

Data analysis code

Author

Affiliation

Bruno Braga Montezano

Federal University of Rio Grande do Sul

Published

June 7, 2024

1. Load the data

Code

df <- haven::read_sav("./data/covidpsy_weighted.sav") |>
    janitor::clean_names() |>
    haven::as_factor()

First, we used the read_sav function from the haven package to load the dataset. The original data has 8102 rows and 477 columns. Although, in the next step we will filter the data with subjects that present valid values for UCLA (UCLA Loneliness Scale 3-item version) in the first and fourth waves and evaluate just those that did not have loneliness at baseline (W1). We will also select only the variables to be used on the data routines. The clean_names function from the janitor package was used to clean the variable names in order to facilitate data manipulation later on.

2. Select variables of interest and filter the data

Code

df_loneliness <- df |> dplyr::select(
    ucla_total,
    ucla_total_w4,
    gender_birth,
    heterosexual = heterossexual,
    color,
    year_birth,
    education,
    education_grouped,
    household_income,
    number_people_house,
    unemployed,
    social_distancing,
    phq_total,
    gad_total,
    classif_audit,
    marijuana_frequency,
    physical_activity_dichotomized,
    physical_activity_grouped,
    sleep_quality,
    marital_status_grouped,
    friend_relationship,
    family_relationship,
    religion,
    sampling_weight,
    ipw_w4
) |>
    dplyr::filter(
        !is.na(ucla_total),
        !is.na(ucla_total_w4)
    )

In this step, we selected 25 variables. They are the following: UCLA total score on W1 (ucla_total), UCLA total score on W4 (ucla_total_w4), gender at birth (gender_birth), heterosexual (heterosexual), skin color (color), birth year (year_birth), seven categories of education (education), three categories of education (education_grouped), five levels of household income (household_income), number of people in your house including you (number_people_house), unemployment (unemployed), social distancing (social_distancing), PHQ-9 total score at W1 (phq_total), GAD-7 total score at W1 (gad_total), AUDIT risk [4 categories] (classif_audit), cannabis use frequency (marijuana_frequency), physical activity [binary variable] (physical_acitivty_dichotomized), physical activity [more or less than 150min/week] (physical_activity_grouped), sleep quality [four categories] (sleep_quality), marital status [with or without partner] (marital_status_grouped), friend relationship quality (friend_relationship), family relationship quality (family_relationship), has a religion (religion), sampling weight for demographics (sampling_weight), weight for attrition at W4 (ipw_w4).

After the removal of subjects with missing values on UCLA at W1 or W4, we ended up with a dataset of 944 subjects.

Code

df_loneliness <- df_loneliness |>
    dplyr::filter(ucla_total < 6)

After removing the subjects with loneliness (UCLA \(\geq\) 6) at baseline (W1), there were 473 participants remaining.

3. Exploratory data analysis

In this section, we use the skimr package to create a detailed summary for each variable in the dataset in order to get an overview of the data.

3.1. Skim the data

Code

skimr::skim(df_loneliness)

Data summary
Name	df_loneliness
Number of rows	473
Number of columns	25
_______________________
Column type frequency:
factor	17
numeric	8
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
gender_birth	1	FALSE	2	Fem: 412, Mal: 61
heterosexual	1	FALSE	2	sim: 414, não: 59
color	1	FALSE	5	Bra: 278, Par: 116, Pre: 72, Ama: 6
education	1	FALSE	6	Pós: 249, Ens: 110, Ens: 79, Ens: 32
education_grouped	1	FALSE	3	Mas: 249, Som: 189, Up : 35
household_income	1	FALSE	5	B: 180, D: 114, C: 108, A: 50
unemployed	1	FALSE	2	No: 416, Yes: 57
social_distancing	1	FALSE	2	Sim: 446, Não: 27
classif_audit	1	FALSE	4	bai: 345, ris: 96, alt: 16, ris: 16
marijuana_frequency	1	FALSE	3	Não: 454, Sem: 13, Men: 6
physical_activity_dichotomized	1	FALSE	2	yes: 259, no: 214
physical_activity_grouped	1	FALSE	2	Les: 424, Mor: 49
sleep_quality	1	FALSE	4	Reg: 168, Bom: 162, Rui: 101, Exc: 42
marital_status_grouped	1	FALSE	2	Wit: 259, Wit: 214
friend_relationship	1	FALSE	4	Bon: 247, Reg: 142, Exc: 56, Rui: 28
family_relationship	1	FALSE	4	Bon: 266, Reg: 120, Exc: 76, Rui: 11
religion	1	FALSE	2	Não: 244, Sim: 229

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
ucla_total	1	4.06	0.82	3.00	3.00	4.00	5.00	5.00	▆▁▇▁▇
ucla_total_w4	1	4.49	1.60	3.00	3.00	4.00	6.00	9.00	▇▂▂▁▁
year_birth	1	1981.33	12.10	1946.00	1974.00	1983.00	1991.00	2002.00	▁▃▅▇▆
number_people_house	1	3.08	1.43	1.00	2.00	3.00	4.00	10.00	▇▇▂▁▁
phq_total	1	9.96	6.26	0.00	5.00	9.00	14.00	26.00	▇▇▆▃▂
gad_total	1	9.63	5.51	0.00	5.00	9.00	14.00	21.00	▆▇▇▅▅
sampling_weight	1	0.71	1.93	0.12	0.15	0.21	0.44	20.00	▇▁▁▁▁
ipw_w4	1	8.16	3.07	5.11	5.87	7.22	9.66	19.16	▇▃▁▁▁

4. Feature engineering

In this section, we are going to modify variables and create some new ones in order to facilitate the data analysis. We’ll go through the following steps:

Create an outcome variable based on UCLA total score at W4. Scores \(\geq\) 6 will be positive (1) and values \(<\) 6 will be negative (0).
Build an age variable based on year_birth.
Create quadratic and cubic terms (variables) for age (i.e. age_2 and age_3). We will also standardize all age variables with \(z\)-scores.
Build a combined_weight variable that is the product of sampling_weight and ipw_w4 to weight analysis for both demographics and attrition.
Standardize PHQ-9 (phq_total) and GAD-7 (gad_total) scores with \(z\)-scores to facilitate the interpretability of the coefficients.
Dichotomize skin color variable (color) in white and non-white subjects.
Dichotomize the AUDIT risk variable (classif_audit) into two levels: low risk and moderate/high/severe risk.
Dichotomize cannabis use variable (marijuana frequency) into “uses cannabis” and “does not use cannabis”.
Dichotomize the variables of quality of family relationships (family_relationship), quality of friendship relationships (friend_relationship), and sleep quality (sleep_quality) into two levels: Bad/Fair and Good/Excellent.
Categorize the household income variable (household_income) into three levels of socioeconomic status: A/B (upper), C (middle) and D/E (lower).
Recode all variables to easy and interpretable category labels.
Remove the following variables: education, physical_activity_dichotomized, ucla_total, ucla_total_w4, year_birth, marijuana_frequency, classif_audit.
Relocate the variables (not necessary per se).

Code

df_loneliness <- df_loneliness |>
    dplyr::mutate(
        # Step 1: Outcome variable based on UCLA score at W4
        outcome = dplyr::if_else(ucla_total_w4 >= 6, 1, 0),
        # Step 2: Age variable based on `year_birth`
        age = 2020 - year_birth,
        age_raw = age,
        # Step 3: Create quadratic and cubic terms for age
        age_2 = age ^ 2,
        age_3 = age ^ 3,
        dplyr::across(c(dplyr::starts_with("age"), -age_raw),
            \(x) (x - mean(x)) / sd(x)
        ),
        # Step 4: Build combined weight variable
        combined_weight = sampling_weight * ipw_w4,
        # Step 5: Standardize PHQ-9 and GAD-7 scores
        dplyr::across(
            dplyr::matches("^(phq|gad)_total$"),
            \(x) (x - mean(x)) / sd(x)
        ),
        # Step 6: Dichotomize skin color in white/non-white subjects
        color = as.factor(dplyr::case_match(color,
            "Branca" ~ "White",
            c("Parda", "Preta", "Amarela", "Indigena") ~ "Non-white"
        )),
        # Step 7: Dichotomize AUDIT risk score
        alcohol_risk = as.factor(dplyr::case_match(
            classif_audit,
            "baixo risco" ~ "Low risk",
            c("risco moderado", "alto risco", "risco severo") ~ "High risk"
        )),
        # Step 8: Create cannabis use variable
        cannabis_use = as.factor(dplyr::case_match( 
            marijuana_frequency,
            "Não" ~ "No",
            c("Menos que semanalmente", "Semanalmente ou Diariamente") ~ "Yes"
        )),
        # Step 9: Recode sleep quality, friend and family relationships
        dplyr::across(
            c(sleep_quality, friend_relationship, family_relationship),
            \(x) as.factor(dplyr::case_match(
                as.character(x),
                c("Ruim", "Ruins", "Regular", "Regulares") ~ "Worse",
                c("Bom", "Bons", "Excelente", "Excelentes") ~ "Better"
    ) |> forcats::fct_rev())
        ),
        # Step 10: Create three levels of socioeconomic status
        household_income = as.factor(dplyr::case_match(
            household_income,
            c("A", "B") ~ "Upper",
            "C" ~ "Middle",
            c("D", "E") ~ "Lower"
        )),
        # Step 11: Recode some variable labels and adjust reference levels
        heterosexual = relevel(as.factor(dplyr::case_match(
            heterosexual,
            "não" ~ "No",
            "sim" ~ "Yes"
        )), ref = "Yes"),
        religion = relevel(as.factor(dplyr::case_match(
            religion,
            "Não" ~ "No",
            "Sim" ~ "Yes"
        )), ref = "No"),
        social_distancing = as.factor(dplyr::case_match(
            social_distancing,
            "Sim" ~ "Yes",
            "Não" ~ "No"
        )),
        education_grouped = relevel(
            education_grouped,
            ref = "Up to high-school"
        ),
        alcohol_risk = relevel(
            alcohol_risk,
            ref = "Low risk"
        ),
        gender_birth = relevel(
            gender_birth,
            ref = "Male"
        ),
    ) |>
    # Step 12: Remove some variables
    dplyr::select(-c(education, physical_activity_dichotomized,
                     ucla_total, ucla_total_w4, year_birth,
                     marijuana_frequency, classif_audit)) |>
    # Step 13: Relocate the outcome variable
    dplyr::relocate(outcome, .before = dplyr::everything())

The dataset is formatted as below:

Code

knitr::kable(head(df_loneliness, 10))

outcome	gender_birth	heterosexual	color	education_grouped	household_income	number_people_house	unemployed	social_distancing	phq_total	gad_total	physical_activity_grouped	sleep_quality	marital_status_grouped	friend_relationship	family_relationship	religion	sampling_weight	ipw_w4	age	age_raw	age_2	age_3	combined_weight	alcohol_risk	cannabis_use
0	Female	Yes	Non-white	Some college education	Lower	2	No	Yes	-0.7935850	-1.3851317	Less than 150min/week	Better	With partner	Worse	Worse	Yes	0.3063481	12.300424	-1.0476520	26	-0.9445699	-0.8243562	3.7682117	Low risk	No
0	Female	Yes	White	Some college education	Lower	5	No	Yes	-0.1541204	-0.6589834	Less than 150min/week	Worse	Without partner	Better	Worse	No	0.4129219	5.857325	-1.3782871	22	-1.1323569	-0.9226467	2.4186181	Low risk	No
0	Female	Yes	White	Master or Doctorate	Upper	3	No	Yes	0.6452103	0.0671649	Less than 150min/week	Worse	With partner	Worse	Worse	No	0.2446609	5.910265	1.1014763	52	1.0389307	0.9211506	1.4460104	Low risk	Yes
0	Female	Yes	Non-white	Master or Doctorate	Upper	2	No	Yes	-1.1133173	-0.1143722	Less than 150min/week	Worse	With partner	Better	Better	Yes	0.1504424	6.925452	1.4321114	56	1.4614516	1.4178239	1.0418815	High risk	No
0	Female	Yes	White	Some college education	Upper	2	No	Yes	1.1248087	0.4302390	Less than 150min/week	Worse	Without partner	Better	Better	Yes	0.1970159	8.664241	1.3494526	55	1.3528872	1.2867179	1.7069930	Low risk	No
0	Female	Yes	Non-white	Some college education	Upper	5	No	Yes	-0.4738527	-0.6589834	Less than 150min/week	Better	With partner	Better	Better	No	0.1352377	9.693859	-1.3782871	22	-1.1323569	-0.9226467	1.3109755	Low risk	No
0	Female	Yes	White	Some college education	Lower	5	Yes	No	0.9649426	1.7009985	Less than 150min/week	Worse	With partner	Worse	Worse	No	0.6710622	8.782572	0.1095709	40	-0.0408448	-0.1657194	5.8936524	Low risk	No
1	Female	Yes	Non-white	Some college education	Upper	2	Yes	Yes	-0.1541204	0.0671649	Less than 150min/week	Better	With partner	Worse	Worse	No	0.1287050	5.551102	-0.7996756	29	-0.7831904	-0.7276973	0.7144546	High risk	No
0	Female	Yes	White	Master or Doctorate	Upper	2	No	No	-0.9534511	-0.8405205	More than 150min/week	Better	Without partner	Worse	Better	Yes	0.1343119	5.111151	0.0269122	39	-0.1181113	-0.2321307	0.6864886	Low risk	No
0	Female	No	White	Some college education	Lower	2	No	Yes	1.1248087	1.1563873	Less than 150min/week	Better	Without partner	Better	Better	Yes	9.9837119	6.017146	0.1922297	41	0.0383779	-0.0959031	60.0734511	Low risk	No

5. Descriptive analysis

5.1. Loneliness incidence

Code

df_loneliness |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    dplyr::count(outcome) |>
    ggplot2::ggplot(ggplot2::aes(y = n, x = outcome, label = n)) +
    ggplot2::geom_col(fill = "steelblue4") +
    ggplot2::theme_light(14, "IBM Plex Sans") +
    ggplot2::geom_label(
        size = 4,
        family = "IBM Plex Sans",
        label.padding = ggplot2::unit(0.5, "lines")
    ) +
    ggplot2::labs(x = "Loneliness at W4", y = "# of participants")

5.2. Weighted descriptive tables

In this section we will create three tables: the first, weighted from demographics (using sampling_weight), the second weighted based on the attrition (ipw_w4) associated with the fourth wave (W4), and the third based on the combined weight from demographics and also attrition (sampling_weight * ipw_w4).

Code

df_loneliness |>
    dplyr::select(-c(ipw_w4, age, age_2, age_3, combined_weight)) |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    survey::svydesign(
        ~ 1,
        data = _,
        weights = ~ sampling_weight) |>
    gtsummary::tbl_svysummary(
        by = outcome,
        include = -c(sampling_weight),
        digits = list(
            gtsummary::all_categorical() ~ c(0, 1, 1),
            gtsummary::all_continuous() ~ 2
        ),
        statistic = list(
            gtsummary::all_categorical() ~ "{n_unweighted} ({p_unweighted}%/{p}%)"
        ),
        label = list(
            gender_birth ~ "Sex",
            heterosexual ~ "Sexual orientation (heterosexual)",
            color ~ "Skin color",
            education_grouped ~ "Education",
            household_income ~ "Socioeconomic status",
            number_people_house ~ "Number of people in your house (including you)",
            unemployed ~ "Unemployed",
            social_distancing ~ "Social distancing",
            phq_total ~ "Depressive symptoms (PHQ-9)",
            gad_total ~ "Anxiety symptoms (GAD-7)",
            physical_activity_grouped ~ "Physical activity",
            sleep_quality ~ "Sleep quality",
            marital_status_grouped ~ "Marital status (grouped)",
            friend_relationship ~ "Friendship relationship quality",
            family_relationship ~ "Family relationship quality",
            religion ~ "Religion",
            age_raw ~ "Age (in years)",
            alcohol_risk ~ "Alcohol risk zone",
            cannabis_use ~ "Cannabis use"
        )
    ) |>
    gtsummary::add_overall(last = TRUE) |>
    gtsummary::add_p() |>
    gtsummary::modify_header(
        label = '**Characteristic**',
        stat_1 = '**No**, *n* = 348',
        stat_2 = '**Yes**, *n* = 125',
        stat_0 = '**Overall**, *n* = 473',
        p.value = '***p*-value**'
    ) |>
    gtsummary::modify_caption("Descriptive table stratified by loneliness
                              incidence, weighted by demographics.")

Descriptive table stratified by loneliness incidence, weighted by demographics.
Characteristic	No, n = 348¹	Yes, n = 125¹	Overall, n = 473¹	p-value²
Sex				0.10
Male	45 (12.9%/43.7%)	16 (12.8%/23.8%)	61 (12.9%/38.6%)
Female	303 (87.1%/56.3%)	109 (87.2%/76.2%)	412 (87.1%/61.4%)
Sexual orientation (heterosexual)	309 (88.8%/79.1%)	105 (84.0%/77.9%)	414 (87.5%/78.8%)	>0.9
Skin color				0.8
Non-white	136 (39.1%/48.6%)	59 (47.2%/45.9%)	195 (41.2%/47.9%)
White	212 (60.9%/51.4%)	66 (52.8%/54.1%)	278 (58.8%/52.1%)
Education				0.4
Up to high-school	20 (5.7%/12.8%)	15 (12.0%/29.4%)	35 (7.4%/17.1%)
Some college education	135 (38.8%/42.6%)	54 (43.2%/32.6%)	189 (40.0%/40.0%)
Master or Doctorate	193 (55.5%/44.6%)	56 (44.8%/38.0%)	249 (52.6%/42.9%)
Socioeconomic status				0.5
Lower	93 (26.7%/66.6%)	42 (33.6%/71.0%)	135 (28.5%/67.8%)
Middle	75 (21.6%/10.7%)	33 (26.4%/13.9%)	108 (22.8%/11.5%)
Upper	180 (51.7%/22.6%)	50 (40.0%/15.1%)	230 (48.6%/20.7%)
Number of people in your house (including you)	3.00 (2.00, 4.00)	2.00 (1.00, 3.00)	3.00 (2.00, 4.00)	0.4
Unemployed	34 (9.8%/29.5%)	23 (18.4%/38.1%)	57 (12.1%/31.7%)	0.6
Social distancing	327 (94.0%/95.4%)	119 (95.2%/97.5%)	446 (94.3%/95.9%)	0.4
Depressive symptoms (PHQ-9)	-0.63 (-1.27, 0.65)	0.56 (0.01, 1.95)	-0.15 (-0.95, 1.12)	0.001
Anxiety symptoms (GAD-7)	-0.48 (-1.02, 0.79)	0.73 (-0.11, 1.27)	-0.11 (-1.02, 1.09)	0.005
Physical activity				<0.001
Less than 150min/week	305 (87.6%/88.8%)	119 (95.2%/98.4%)	424 (89.6%/91.3%)
More than 150min/week	43 (12.4%/11.2%)	6 (4.8%/1.6%)	49 (10.4%/8.7%)
Sleep quality				0.003
Worse	185 (53.2%/48.8%)	84 (67.2%/81.1%)	269 (56.9%/57.1%)
Better	163 (46.8%/51.2%)	41 (32.8%/18.9%)	204 (43.1%/42.9%)
Marital status (grouped)				0.2
Without partner	146 (42.0%/56.8%)	68 (54.4%/72.2%)	214 (45.2%/60.7%)
With partner	202 (58.0%/43.2%)	57 (45.6%/27.8%)	259 (54.8%/39.3%)
Friendship relationship quality				0.4
Worse	117 (33.6%/33.8%)	53 (42.4%/44.9%)	170 (35.9%/36.7%)
Better	231 (66.4%/66.2%)	72 (57.6%/55.1%)	303 (64.1%/63.3%)
Family relationship quality				0.3
Worse	85 (24.4%/21.7%)	46 (36.8%/34.1%)	131 (27.7%/24.8%)
Better	263 (75.6%/78.3%)	79 (63.2%/65.9%)	342 (72.3%/75.2%)
Religion	165 (47.4%/53.9%)	64 (51.2%/65.1%)	229 (48.4%/56.8%)	0.4
Age (in years)	44.00 (38.00, 52.00)	43.00 (32.00, 53.78)	44.00 (35.00, 53.33)	>0.9
Alcohol risk zone				0.056
Low risk	253 (72.7%/82.9%)	92 (73.6%/59.2%)	345 (72.9%/76.8%)
High risk	95 (27.3%/17.1%)	33 (26.4%/40.8%)	128 (27.1%/23.2%)
Cannabis use	14 (4.0%/5.4%)	5 (4.0%/5.2%)	19 (4.0%/5.4%)	>0.9
¹ n (unweighted) (% (unweighted)/%); Median (IQR)
² chi-squared test with Rao & Scott’s second-order correction; Wilcoxon rank-sum test for complex survey samples

Code

df_loneliness |>
    dplyr::select(-c(sampling_weight, age, age_2, age_3, combined_weight)) |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    survey::svydesign(
        ~ 1,
        data = _,
        weights = ~ ipw_w4) |>
    gtsummary::tbl_svysummary(
        by = outcome,
        include = -c(ipw_w4),
        digits = list(
            gtsummary::all_categorical() ~ c(0, 1, 1),
            gtsummary::all_continuous() ~ 2
        ),
        statistic = list(
            gtsummary::all_categorical() ~ "{n_unweighted} ({p_unweighted}%/{p}%)"
        ),
        label = list(
            gender_birth ~ "Sex",
            heterosexual ~ "Sexual orientation (heterosexual)",
            color ~ "Skin color",
            education_grouped ~ "Education",
            household_income ~ "Socioeconomic status",
            number_people_house ~ "Number of people in your house (including you)",
            unemployed ~ "Unemployed",
            social_distancing ~ "Social distancing",
            phq_total ~ "Depressive symptoms (PHQ-9)",
            gad_total ~ "Anxiety symptoms (GAD-7)",
            physical_activity_grouped ~ "Physical activity",
            sleep_quality ~ "Sleep quality",
            marital_status_grouped ~ "Marital status (grouped)",
            friend_relationship ~ "Friendship relationship quality",
            family_relationship ~ "Family relationship quality",
            religion ~ "Religion",
            age_raw ~ "Age (in years)",
            alcohol_risk ~ "Alcohol risk zone",
            cannabis_use ~ "Cannabis use"
        )
    ) |>
    gtsummary::add_overall(last = TRUE) |>
    gtsummary::add_p() |>
    gtsummary::modify_header(
        label = '**Characteristic**',
        stat_1 = '**No**, *n* = 348',
        stat_2 = '**Yes**, *n* = 125',
        stat_0 = '**Overall**, *n* = 473',
        p.value = '***p*-value**'
    ) |>
    gtsummary::modify_caption("Descriptive table stratified by loneliness
                              incidence, weighted by attrition.")

Descriptive table stratified by loneliness incidence, weighted by attrition.
Characteristic	No, n = 348¹	Yes, n = 125¹	Overall, n = 473¹	p-value²
Sex				>0.9
Male	45 (12.9%/18.9%)	16 (12.8%/18.5%)	61 (12.9%/18.8%)
Female	303 (87.1%/81.1%)	109 (87.2%/81.5%)	412 (87.1%/81.2%)
Sexual orientation (heterosexual)	309 (88.8%/87.9%)	105 (84.0%/81.2%)	414 (87.5%/86.1%)	0.10
Skin color				0.041
Non-white	136 (39.1%/44.2%)	59 (47.2%/55.6%)	195 (41.2%/47.3%)
White	212 (60.9%/55.8%)	66 (52.8%/44.4%)	278 (58.8%/52.7%)
Education				0.023
Up to high-school	20 (5.7%/8.1%)	15 (12.0%/17.6%)	35 (7.4%/10.6%)
Some college education	135 (38.8%/42.2%)	54 (43.2%/43.2%)	189 (40.0%/42.5%)
Master or Doctorate	193 (55.5%/49.7%)	56 (44.8%/39.3%)	249 (52.6%/46.9%)
Socioeconomic status				0.2
Lower	93 (26.7%/32.2%)	42 (33.6%/39.9%)	135 (28.5%/34.3%)
Middle	75 (21.6%/21.8%)	33 (26.4%/24.0%)	108 (22.8%/22.4%)
Upper	180 (51.7%/46.0%)	50 (40.0%/36.0%)	230 (48.6%/43.3%)
Number of people in your house (including you)	3.00 (2.00, 4.00)	3.00 (2.00, 4.00)	3.00 (2.00, 4.00)	0.2
Unemployed	34 (9.8%/10.4%)	23 (18.4%/22.4%)	57 (12.1%/13.7%)	0.003
Social distancing	327 (94.0%/93.5%)	119 (95.2%/95.7%)	446 (94.3%/94.1%)	0.4
Depressive symptoms (PHQ-9)	-0.31 (-0.95, 0.59)	0.33 (-0.15, 1.12)	-0.15 (-0.79, 0.65)	<0.001
Anxiety symptoms (GAD-7)	-0.30 (-0.84, 0.43)	0.29 (-0.30, 1.34)	-0.11 (-0.84, 0.79)	<0.001
Physical activity				0.010
Less than 150min/week	305 (87.6%/89.9%)	119 (95.2%/96.6%)	424 (89.6%/91.7%)
More than 150min/week	43 (12.4%/10.1%)	6 (4.8%/3.4%)	49 (10.4%/8.3%)
Sleep quality				0.003
Worse	185 (53.2%/54.3%)	84 (67.2%/70.4%)	269 (56.9%/58.6%)
Better	163 (46.8%/45.7%)	41 (32.8%/29.6%)	204 (43.1%/41.4%)
Marital status (grouped)				0.023
Without partner	146 (42.0%/39.2%)	68 (54.4%/51.8%)	214 (45.2%/42.7%)
With partner	202 (58.0%/60.8%)	57 (45.6%/48.2%)	259 (54.8%/57.3%)
Friendship relationship quality				0.11
Worse	117 (33.6%/34.8%)	53 (42.4%/43.6%)	170 (35.9%/37.2%)
Better	231 (66.4%/65.2%)	72 (57.6%/56.4%)	303 (64.1%/62.8%)
Family relationship quality				0.013
Worse	85 (24.4%/26.0%)	46 (36.8%/39.0%)	131 (27.7%/29.5%)
Better	263 (75.6%/74.0%)	79 (63.2%/61.0%)	342 (72.3%/70.5%)
Religion	165 (47.4%/47.7%)	64 (51.2%/51.4%)	229 (48.4%/48.7%)	0.5
Age (in years)	37.00 (29.00, 48.00)	35.00 (27.00, 42.00)	37.00 (28.00, 46.00)	0.025
Alcohol risk zone				0.9
Low risk	253 (72.7%/70.6%)	92 (73.6%/71.5%)	345 (72.9%/70.9%)
High risk	95 (27.3%/29.4%)	33 (26.4%/28.5%)	128 (27.1%/29.1%)
Cannabis use	14 (4.0%/4.0%)	5 (4.0%/5.3%)	19 (4.0%/4.4%)	0.6
¹ n (unweighted) (% (unweighted)/%); Median (IQR)
² chi-squared test with Rao & Scott’s second-order correction; Wilcoxon rank-sum test for complex survey samples

Code

df_loneliness |>
    dplyr::select(-c(ipw_w4, age, age_2, age_3, sampling_weight)) |>
    dplyr::mutate(outcome = dplyr::if_else(outcome == 1, "Yes", "No")) |>
    survey::svydesign(
        ~ 1,
        data = _,
        weights = ~ combined_weight) |>
    gtsummary::tbl_svysummary(
        by = outcome,
        include = -c(combined_weight),
        digits = list(
            gtsummary::all_categorical() ~ c(0, 1, 1),
            gtsummary::all_continuous() ~ 2
        ),
        statistic = list(
            gtsummary::all_categorical() ~ "{n_unweighted} ({p_unweighted}%/{p}%)"
        ),
        label = list(
            gender_birth ~ "Sex",
            heterosexual ~ "Sexual orientation (heterosexual)",
            color ~ "Skin color",
            education_grouped ~ "Education",
            household_income ~ "Socioeconomic status",
            number_people_house ~ "Number of people in your house (including you)",
            unemployed ~ "Unemployed",
            social_distancing ~ "Social distancing",
            phq_total ~ "Depressive symptoms (PHQ-9)",
            gad_total ~ "Anxiety symptoms (GAD-7)",
            physical_activity_grouped ~ "Physical activity",
            sleep_quality ~ "Sleep quality",
            marital_status_grouped ~ "Marital status (grouped)",
            friend_relationship ~ "Friendship relationship quality",
            family_relationship ~ "Family relationship quality",
            religion ~ "Religion",
            age_raw ~ "Age (in years)",
            alcohol_risk ~ "Alcohol risk zone",
            cannabis_use ~ "Cannabis use"
        )
    ) |>
    gtsummary::add_overall(last = TRUE) |>
    gtsummary::add_p() |>
    gtsummary::modify_header(
        label = '**Characteristic**',
        stat_1 = '**No**, *n* = 348',
        stat_2 = '**Yes**, *n* = 125',
        stat_0 = '**Overall**, *n* = 473',
        p.value = '***p*-value**'
    ) |>
    gtsummary::modify_caption("Descriptive table stratified by loneliness
                              incidence, weighted by the combined weight.")

Descriptive table stratified by loneliness incidence, weighted by the combined weight.
Characteristic	No, n = 348¹	Yes, n = 125¹	Overall, n = 473¹	p-value²
Sex				0.079
Male	45 (12.9%/58.7%)	16 (12.8%/33.3%)	61 (12.9%/52.7%)
Female	303 (87.1%/41.3%)	109 (87.2%/66.7%)	412 (87.1%/47.3%)
Sexual orientation (heterosexual)	309 (88.8%/77.2%)	105 (84.0%/70.7%)	414 (87.5%/75.6%)	0.7
Skin color				0.9
Non-white	136 (39.1%/57.0%)	59 (47.2%/54.7%)	195 (41.2%/56.5%)
White	212 (60.9%/43.0%)	66 (52.8%/45.3%)	278 (58.8%/43.5%)
Education				0.6
Up to high-school	20 (5.7%/18.0%)	15 (12.0%/31.1%)	35 (7.4%/21.1%)
Some college education	135 (38.8%/40.8%)	54 (43.2%/32.9%)	189 (40.0%/38.9%)
Master or Doctorate	193 (55.5%/41.1%)	56 (44.8%/36.0%)	249 (52.6%/39.9%)
Socioeconomic status				0.8
Lower	93 (26.7%/72.5%)	42 (33.6%/72.3%)	135 (28.5%/72.4%)
Middle	75 (21.6%/8.6%)	33 (26.4%/11.6%)	108 (22.8%/9.3%)
Upper	180 (51.7%/18.9%)	50 (40.0%/16.1%)	230 (48.6%/18.3%)
Number of people in your house (including you)	3.00 (2.00, 4.00)	2.00 (1.00, 3.00)	3.00 (1.00, 4.00)	0.4
Unemployed	34 (9.8%/34.1%)	23 (18.4%/42.3%)	57 (12.1%/36.1%)	0.7
Social distancing	327 (94.0%/95.9%)	119 (95.2%/97.7%)	446 (94.3%/96.3%)	0.5
Depressive symptoms (PHQ-9)	-0.63 (-1.27, 0.81)	0.49 (-0.31, 1.94)	-0.31 (-1.11, 1.12)	0.015
Anxiety symptoms (GAD-7)	-0.66 (-1.20, 0.79)	0.79 (-0.11, 1.25)	-0.30 (-1.02, 1.16)	0.018
Physical activity				<0.001
Less than 150min/week	305 (87.6%/92.7%)	119 (95.2%/99.0%)	424 (89.6%/94.2%)
More than 150min/week	43 (12.4%/7.3%)	6 (4.8%/1.0%)	49 (10.4%/5.8%)
Sleep quality				0.004
Worse	185 (53.2%/49.6%)	84 (67.2%/83.3%)	269 (56.9%/57.6%)
Better	163 (46.8%/50.4%)	41 (32.8%/16.7%)	204 (43.1%/42.4%)
Marital status (grouped)				0.5
Without partner	146 (42.0%/56.0%)	68 (54.4%/65.5%)	214 (45.2%/58.3%)
With partner	202 (58.0%/44.0%)	57 (45.6%/34.5%)	259 (54.8%/41.7%)
Friendship relationship quality				0.6
Worse	117 (33.6%/37.8%)	53 (42.4%/47.1%)	170 (35.9%/40.0%)
Better	231 (66.4%/62.2%)	72 (57.6%/52.9%)	303 (64.1%/60.0%)
Family relationship quality				0.13
Worse	85 (24.4%/19.3%)	46 (36.8%/36.0%)	131 (27.7%/23.3%)
Better	263 (75.6%/80.7%)	79 (63.2%/64.0%)	342 (72.3%/76.7%)
Religion	165 (47.4%/56.4%)	64 (51.2%/63.0%)	229 (48.4%/58.0%)	0.7
Age (in years)	46.00 (38.00, 52.00)	42.00 (30.00, 53.65)	44.01 (35.00, 52.00)	0.8
Alcohol risk zone				0.040
Low risk	253 (72.7%/83.3%)	92 (73.6%/57.5%)	345 (72.9%/77.1%)
High risk	95 (27.3%/16.7%)	33 (26.4%/42.5%)	128 (27.1%/22.9%)
Cannabis use	14 (4.0%/4.2%)	5 (4.0%/6.3%)	19 (4.0%/4.7%)	0.7
¹ n (unweighted) (% (unweighted)/%); Median (IQR)
² chi-squared test with Rao & Scott’s second-order correction; Wilcoxon rank-sum test for complex survey samples

6. Bivariate analysis

In this step, we will run several bivariate analyses (\(loneliness ~ predictor\)) to pre-select the variables that will be used in the multi-predictor Poisson regression later on. The cut-off point we will use is based on \(p<0.2\). In the code below, we created a function to calculate the lower and upper bounds of the 95% confidence intervals (CI) within a tibble structure.

Code

confint_df <- function(x) {
    tibble::tibble(
        lower = x[2, 1],
        upper = x[2, 2]
    )
}

6.1. Sociodemographic domain

Code

df_socio <- df_loneliness |> 
    dplyr::select(
        outcome,
        age,
        age_2,
        age_3,
        gender_birth,
        heterosexual,
        education_grouped,
        household_income,
        color,
        unemployed,
        combined_weight
    )

df_socio |>
    purrr::keep(is.numeric) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of numeric predictors of the sociodemographic domain."
    )

Bivariate analysis of numeric predictors of the sociodemographic domain.
Variable	Category	\(\beta\)	SE	RR	RR (lower)	RR (upper)	p-value
age	value	-0.089	0.033	0.914	0.856	0.976	0.007
age_2	value	-0.053	0.031	0.948	0.892	1.007	0.087
age_3	value	-0.028	0.028	0.973	0.920	1.027	0.328

Code

df_socio |>
    dplyr::select(-dplyr::contains("age")) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of categorical predictors of the sociodemographic domain."
    )

Bivariate analysis of categorical predictors of the sociodemographic domain.
Variable	Category	\(\beta\)	SE	RR	RR (lower)	RR (upper)	p-value
color	valueWhite	0.072	0.071	1.074	0.934	1.234	0.314
education_grouped	valueSome college education	-0.552	0.088	0.576	0.484	0.685	0.000
education_grouped	valueMaster or Doctorate	-0.489	0.087	0.614	0.484	0.685	0.000
gender_birth	valueFemale	0.804	0.075	2.234	1.930	2.591	0.000
heterosexual	valueNo	0.253	0.078	1.288	1.105	1.498	0.001
household_income	valueMiddle	0.220	0.112	1.246	0.995	1.543	0.049
household_income	valueUpper	-0.122	0.097	0.885	0.995	1.543	0.209
unemployed	valueNo	-0.263	0.072	0.769	0.669	0.885	0.000

On the sociodemographic domain, the only variables that did not reach the significance cut-off were the cubic term of age (age_3), skin color (color), and “Upper” level of socioeconomic status (household_income). Since there are three levels in socioeconomic status (household_income), this variable will continue in further investigations.

6.2. Lifestyle domain

Code

df_ls <- df_loneliness |> 
    dplyr::select(
        outcome,
        physical_activity_grouped,
        sleep_quality,
        family_relationship,
        friend_relationship,
        marital_status_grouped,
        combined_weight
    )

df_ls |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of predictors of the lifestyle domain."
    )

Bivariate analysis of predictors of the lifestyle domain.
Variable	Category	\(\beta\)	SE	RR	RR (lower)	RR (upper)
family_relationship	valueBetter	-0.618	0.074	0.539	0.467	0.623
friend_relationship	valueBetter	-0.287	0.071	0.750	0.653	0.862
marital_status_grouped	valueWith partner	-0.306	0.074	0.736	0.636	0.851
physical_activity_grouped	valueMore than 150min/week	-1.771	0.349	0.170	0.079	0.315
sleep_quality	valueBetter	-1.301	0.095	0.272	0.225	0.327

In the lifestyle domain, all variables have reached the significance cut-off.

6.3. Social domain

Code

df_social <- df_loneliness |> 
    dplyr::select(
        outcome,
        number_people_house,
        combined_weight
    )

df_social |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of predictors of the social domain."
    )

Bivariate analysis of predictors of the social domain.
Variable	Category	\(\beta\)	SE	RR	RR (lower)	RR (upper)	p-value
number_people_house	value	-0.146	0.024	0.864	0.824	0.906	0

On the social domain, we only have the variable number of people that lives in your house [including you] (number_people_house) and it has passed the \(p<0.2\) cut-off for the hierarchical multi-predictor regression.

6.4. Clinical domain

Code

df_clinical <- df_loneliness |> 
    dplyr::select(
        outcome,
        phq_total,
        gad_total,
        alcohol_risk,
        cannabis_use,
        combined_weight
    )

df_clinical |>
    dplyr::select(-alcohol_risk, -cannabis_use) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of depressive and anxiety symptoms."
    )

Bivariate analysis of depressive and anxiety symptoms.
Variable	Category	\(\beta\)	SE	RR	RR (lower)	RR (upper)	p-value
gad_total	value	0.531	0.034	1.70	1.591	1.818	0
phq_total	value	0.513	0.029	1.67	1.577	1.769	0

Code

df_clinical |>
    dplyr::select(-phq_total, -gad_total) |>
    tidyr::pivot_longer(
        c(-outcome, -combined_weight),
        names_to = "predictor",
        values_to = "value"
    ) |>
    dplyr::nest_by(predictor) |>
    dplyr::mutate(mod = list(glm(outcome ~ value,
                                 data = data,
                                 family = poisson(link = "log"),
                                 weights = combined_weight)),
                  confint = list(suppressMessages(confint(mod)))) |>
    dplyr::reframe(
        broom::tidy(mod),
        confint_df(confint)
    ) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::mutate(rr = exp(estimate),
                  rr_lower = exp(lower),
                  rr_upper = exp(upper)) |>
    dplyr::select(
        Variable = predictor,
        Category = term,
        `$\\beta$` = estimate,
        SE = std.error,
        RR = rr,
        `RR (lower)` = rr_lower,
        `RR (upper)` = rr_upper,
        `p-value` = p.value
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Bivariate analysis of alcohol risk and cannabis use."
    )

Bivariate analysis of alcohol risk and cannabis use.
Variable	Category	\(\beta\)	SE	RR	RR (lower)	RR (upper)	p-value
alcohol_risk	valueHigh risk	0.916	0.072	2.500	2.172	2.875	0.000
cannabis_use	valueYes	0.301	0.146	1.351	1.003	1.778	0.039

Regarding the clinical domain, all variables have reached the statistical significance cut-off.

7. Hierarchical multi-predictor Poisson regression

We sequentially added the significant variables from the last analyses into the Poisson regression, starting from the most distal domain to the most proximal one (sociodemographic \(\rightarrow\) lifestyle/social \(\rightarrow\) clinical). The variables that presented \(p<0.1\) in the entry stage remained in the model until the end, even if they lost their significance in later stages, and those with \(p<0.05\) were considered significant.

7.1. Domain-based models

Code

# Sociodemographic domain - first entry
df_loneliness |>
    glm(outcome ~ age + age_2 + education_grouped + gender_birth +
        heterosexual + household_income + number_people_house +
        marital_status_grouped,
        data = _,
        family = poisson(link = "log"),
        weights = combined_weight
    ) |>
    broom::tidy(conf.int = TRUE) |>
    dplyr::select(-statistic, -std.error) |>
    dplyr::mutate(dplyr::across(c(estimate, conf.low, conf.high),
                                exp)) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::relocate(p.value, .after = dplyr::last_col()) |>
    dplyr::relocate(estimate, .before = conf.low) |>
    dplyr::mutate(p_lt_01 = dplyr::if_else(p.value < 0.1, "Yes", "No")) |>
    purrr::set_names(
        "Variable", "RR", "RR (lower)", "RR (upper)", "$p$-value", "$p<0.1$"
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Weighted poisson regression with $p$-value
        $<$ 0.2 in bivariate analysis in sociodemographic domain - first entry."
    )

Weighted poisson regression with \(p\)-value \(<\) 0.2 in bivariate analysis in sociodemographic domain - first entry.
Variable	RR	RR (lower)	RR (upper)	\(p\)-value	\(p<0.1\)
age	0.386	0.276	0.547	0.000	Yes
age_2	2.501	1.790	3.464	0.000	Yes
education_groupedSome college education	0.423	0.348	0.514	0.000	Yes
education_groupedMaster or Doctorate	0.458	0.372	0.563	0.000	Yes
gender_birthFemale	2.672	2.265	3.160	0.000	Yes
heterosexualNo	2.203	1.809	2.675	0.000	Yes
household_incomeMiddle	1.380	1.088	1.732	0.007	Yes
household_incomeUpper	1.311	1.030	1.660	0.026	Yes
number_people_house	0.891	0.845	0.939	0.000	Yes
marital_status_groupedWith partner	0.924	0.782	1.089	0.349	No

Code

# Lifestyle and social domain - second entry
df_loneliness |>
    glm(outcome ~ age + age_2 + education_grouped + gender_birth +
        heterosexual + household_income + number_people_house +
        marital_status_grouped + family_relationship + friend_relationship +
        physical_activity_grouped + sleep_quality,
        data = _,
        family = poisson(link = "log"),
        weights = combined_weight
    ) |>
    broom::tidy(conf.int = TRUE) |>
    dplyr::select(-statistic, -std.error) |>
    dplyr::mutate(dplyr::across(c(estimate, conf.low, conf.high),
                                exp)) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::relocate(p.value, .after = dplyr::last_col()) |>
    dplyr::relocate(estimate, .before = conf.low) |>
    dplyr::mutate(p_lt_01 = dplyr::if_else(p.value < 0.1, "Yes", "No")) |>
    purrr::set_names(
        "Variable", "RR", "RR (lower)", "RR (upper)", "$p$-value", "$p<0.1$"
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Weighted poisson regression with $p$-value
        $<$ 0.2 in bivariate analysis in lifestyle and social domain -
        second entry."
    )

Weighted poisson regression with \(p\)-value \(<\) 0.2 in bivariate analysis in lifestyle and social domain - second entry.
Variable	RR	RR (lower)	RR (upper)	\(p\)-value	\(p<0.1\)
age	0.701	0.475	1.045	0.078	Yes
age_2	1.475	1.006	2.144	0.044	Yes
education_groupedSome college education	0.590	0.483	0.722	0.000	Yes
education_groupedMaster or Doctorate	0.467	0.376	0.581	0.000	Yes
gender_birthFemale	2.247	1.866	2.712	0.000	Yes
heterosexualNo	1.926	1.566	2.360	0.000	Yes
household_incomeMiddle	1.512	1.196	1.893	0.000	Yes
household_incomeUpper	1.605	1.265	2.026	0.000	Yes
number_people_house	0.921	0.869	0.974	0.004	Yes
marital_status_groupedWith partner	0.755	0.637	0.892	0.001	Yes
family_relationshipBetter	0.668	0.555	0.805	0.000	Yes
friend_relationshipBetter	0.973	0.820	1.156	0.758	No
physical_activity_groupedMore than 150min/week	0.150	0.069	0.284	0.000	Yes
sleep_qualityBetter	0.329	0.268	0.401	0.000	Yes

Code

# Clinical domain - third entry
df_loneliness |>
    glm(outcome ~ age + age_2 + education_grouped + gender_birth +
        heterosexual + household_income + number_people_house +
        marital_status_grouped + family_relationship + friend_relationship +
        physical_activity_grouped + sleep_quality + phq_total + gad_total +
        alcohol_risk + cannabis_use,
        data = _,
        family = poisson(link = "log"),
        weights = combined_weight
    ) |>
    broom::tidy(conf.int = TRUE) |>
    dplyr::select(-statistic, -std.error) |>
    dplyr::mutate(dplyr::across(c(estimate, conf.low, conf.high),
                                exp)) |>
    dplyr::filter(term != "(Intercept)") |>
    dplyr::relocate(p.value, .after = dplyr::last_col()) |>
    dplyr::relocate(estimate, .before = conf.low) |>
    dplyr::mutate(p_lt_01 = dplyr::if_else(p.value < 0.1, "Yes", "No")) |>
    purrr::set_names(
        "Variable", "RR", "RR (lower)", "RR (upper)", "$p$-value", "$p<0.1$"
    ) |>
    knitr::kable(
        digits = 3,
        caption = "Weighted poisson regression with $p$-value
        $<$ 0.2 in bivariate analysis in clinical domain - third entry."
    )

Weighted poisson regression with \(p\)-value \(<\) 0.2 in bivariate analysis in clinical domain - third entry.
Variable	RR	RR (lower)	RR (upper)	\(p\)-value	\(p<0.1\)
age	0.362	0.235	0.560	0.000	Yes
age_2	2.751	1.807	4.160	0.000	Yes
education_groupedSome college education	0.686	0.552	0.853	0.001	Yes
education_groupedMaster or Doctorate	0.670	0.525	0.853	0.001	Yes
gender_birthFemale	1.830	1.476	2.266	0.000	Yes
heterosexualNo	1.474	1.188	1.822	0.000	Yes
household_incomeMiddle	1.357	1.067	1.708	0.011	Yes
household_incomeUpper	1.400	1.080	1.803	0.010	Yes
number_people_house	0.976	0.921	1.034	0.424	No
marital_status_groupedWith partner	0.887	0.741	1.061	0.191	No
family_relationshipBetter	0.786	0.650	0.951	0.013	Yes
friend_relationshipBetter	0.760	0.630	0.917	0.004	Yes
physical_activity_groupedMore than 150min/week	0.180	0.079	0.354	0.000	Yes
sleep_qualityBetter	0.515	0.413	0.640	0.000	Yes
phq_total	1.166	1.031	1.317	0.014	Yes
gad_total	1.218	1.071	1.386	0.003	Yes
alcohol_riskHigh risk	1.731	1.435	2.084	0.000	Yes
cannabis_useYes	1.835	1.306	2.526	0.000	Yes

7.2. Final model

Code

# Block 1: sociodemographic
block_1 <- glm(outcome ~ age + age_2 + education_grouped + gender_birth +
    heterosexual + household_income + number_people_house +
    marital_status_grouped,
    data = df_loneliness,
    family = poisson(link = "log"),
    weights = combined_weight
)

# Block 2: Lifestyle and social
block_2 <- glm(outcome ~ age + age_2 + education_grouped + gender_birth +
    heterosexual + household_income +
    number_people_house + family_relationship + friend_relationship +
    physical_activity_grouped + sleep_quality,
    data = df_loneliness,
    family = poisson(link = "log"),
    weights = combined_weight
)

# Block 3: Clinical
block_3 <- glm(outcome ~ age + age_2 + education_grouped + gender_birth +
    heterosexual + household_income + family_relationship +
    physical_activity_grouped + sleep_quality + number_people_house +
    phq_total + gad_total + alcohol_risk + cannabis_use,
    data = df_loneliness,
    family = poisson(link = "log"),
    weights = combined_weight
)

huxtable::huxreg(
  list("Block 1: Sociodemographic" = block_1,
       "Block 2: Lifestyle and social" = block_2,
       "Block 3: Clinical" = block_3),
  stars = NULL,
  error_format = "[{conf.low},  {conf.high}], {dplyr::if_else(p.value < 0.001, 'p<0.001', paste0('p=', as.character(p.value)))}",
  error_pos = "same",
  bold_signif = 0.05,
  number_format = "%.3f",
  statistics = c(n = "nobs",
                 "-2log likelihood" = "logLik",
                 AIC = "AIC",
                 BIC = "BIC"),
  ci_level = 0.95,
  outer_borders = 1,
  tidy_args = list(exponentiate = TRUE, conf.int = TRUE),
  coefs = c(
    "(Intercept)" = "(Intercept)",
    "Age (z-score)" = "age",
    "Age (quadratic term)" = "age_2",
    "Education (college education)" = "education_groupedSome college education",
    "Education (master or PhD)" = "education_groupedMaster or Doctorate",
    "Sex (female)" = "gender_birthFemale",
    "Sexual orientation (non-heterosexual)" = "heterosexualNo",
    "Socioeconomic status (middle)" = "household_incomeMiddle",
    "Socioeconomic status (upper)" = "household_incomeUpper",
    "Socioeconomic status (upper)" = "household_incomeUpper",
    "Number of people living in your house" = "number_people_house",
    "Marital status (with partner)" = "marital_status_groupedWith partner",
    "Positive family relationship" = "family_relationshipBetter",
    "Positive friendship relationship" = "friend_relationshipBetter",
    "Physical activity (more than 150 min/week)" = "physical_activity_groupedMore than 150min/week",
    "Good sleep quality" = "sleep_qualityBetter",
    "Depressive symptoms (PHQ-9)" = "phq_total",
    "Anxiety symptoms (GAD-7)" = "gad_total",
    "Alcohol risk zone (moderate, high or severe)" = "alcohol_riskHigh risk",
    "Cannabis use (yes)" = "cannabis_useYes")
) |>
    huxtable::set_caption("Multi-predictor Poisson regression analysis to
        evaluate factors associated with loneliness incidence. The table
        presents the risk ratios and the corresponding 95% CIs with p-values.")

Multi-predictor Poisson regression analysis to evaluate factors associated with loneliness incidence. The table presents the risk ratios and the corresponding 95% CIs with p-values.
	Block 1: Sociodemographic	Block 2: Lifestyle and social	Block 3: Clinical
(Intercept)	0.258 [0.202, 0.329], p<0.001	0.448 [0.325, 0.616], p<0.001	0.240 [0.169, 0.341], p<0.001
Age (z-score)	0.386 [0.276, 0.547], p<0.001	0.693 [0.471, 1.029], p=0.066	0.343 [0.224, 0.529], p<0.001
Age (quadratic term)	2.501 [1.790, 3.464], p<0.001	1.475 [1.011, 2.133], p=0.041	2.849 [1.881, 4.287], p<0.001
Education (college education)	0.423 [0.348, 0.514], p<0.001	0.567 [0.465, 0.691], p<0.001	0.684 [0.551, 0.851], p<0.001
Education (master or PhD)	0.458 [0.372, 0.563], p<0.001	0.458 [0.368, 0.568], p<0.001	0.705 [0.556, 0.893], p=0.004
Sex (female)	2.672 [2.265, 3.160], p<0.001	2.317 [1.926, 2.795], p<0.001	1.719 [1.393, 2.119], p<0.001
Sexual orientation (non-heterosexual)	2.203 [1.809, 2.675], p<0.001	1.976 [1.612, 2.414], p<0.001	1.518 [1.227, 1.872], p<0.001
Socioeconomic status (middle)	1.380 [1.088, 1.732], p=0.007	1.497 [1.185, 1.871], p<0.001	1.388 [1.094, 1.743], p=0.006
Socioeconomic status (upper)	1.311 [1.030, 1.660], p=0.026	1.432 [1.139, 1.790], p=0.002	1.361 [1.062, 1.735], p=0.014
Number of people living in your house	0.891 [0.845, 0.939], p<0.001	0.902 [0.852, 0.953], p<0.001	0.968 [0.913, 1.024], p=0.262
Marital status (with partner)	0.924 [0.782, 1.089], p=0.349
Positive family relationship		0.689 [0.574, 0.828], p<0.001	0.730 [0.609, 0.875], p<0.001
Positive friendship relationship		0.970 [0.817, 1.151], p=0.725
Physical activity (more than 150 min/week)		0.161 [0.074, 0.304], p<0.001	0.177 [0.078, 0.348], p<0.001
Good sleep quality		0.340 [0.278, 0.414], p<0.001	0.483 [0.390, 0.594], p<0.001
Depressive symptoms (PHQ-9)			1.214 [1.080, 1.366], p=0.001
Anxiety symptoms (GAD-7)			1.191 [1.049, 1.352], p=0.007
Alcohol risk zone (moderate, high or severe)			1.579 [1.322, 1.882], p<0.001
Cannabis use (yes)			1.750 [1.250, 2.399], p<0.001
n	473	473	473
-2log likelihood	-1793.697	-1678.612	-1621.319
AIC	3609.395	3385.223	3276.637
BIC	3655.145	3443.450	3347.342

7.3. Odds ratio plot

7.3.1. With sort

Code

or_plot <- plot(parameters::parameters(
        block_3,
        exponentiate = TRUE
        ), show_labels = TRUE, size_text = 6, size_point = 1.2, sort = "descending") +
     #ggplot2::scale_y_discrete(labels = c(
     #"Physical activity [More than 150min/week]",
     #"Age",
     #"Sleep quality [Better]",
     #"Education [Some college education]",
     #"Education [Master or Doctorate]",
     #"Family relationship [Better]",
     #"Number of people living in your house",
     #"Anxiety symptoms [GAD-7]",
     #"Depressive symptoms [PHQ-9]",
     #"Household income [Upper]",
     #"Household income [Middle]",
     #"Heterosexual [No]",
     #"Alcohol risk [High risk]",
     #"Gender birth [Female]",
     #"Cannabis use [Yes]",
     #"Age (quadratic term)")
     #) +
    ggplot2::labs(x = "RR") +
    ggplot2::theme(
        axis.text = ggplot2::element_text(size = 18, color = "black"),
        axis.title = ggplot2::element_text(size = 18, color = "black")
    )

ggplot2::ggsave(filename = "~/tmp/or_plot_loneliness_sorted.png", or_plot,
dpi = 300, width = 16, height = 10)

Code

or_plot <- plot(parameters::parameters(
        block_3,
        exponentiate = TRUE
        ), show_labels = TRUE, size_text = 6, size_point = 1.2) +
     #ggplot2::scale_y_discrete(labels = c(
     #"Cannabis use [Yes]",
     #"Alcohol risk [High risk]",
     #"Anxiety symptoms [GAD-7]",
     #"Depressive symptoms [PHQ-9]",
     #"Number of people living in your house",
     #"Sleep quality [Better]",
     #"Physical activity [More than 150min/week]",
     #"Family relationship [Better]",
     #"Household income [Upper]",
     #"Household income [Middle]",
     #"Heterosexual [No]",
     #"Gender birth [Female]",
     #"Education [Master or Doctorate]",
     #"Education [Some college education]",
     #"Age (quadratic term)",
     #"Age")
     #) +
    ggplot2::labs(x = "RR") +
    ggplot2::theme(
        axis.text = ggplot2::element_text(size = 18, color = "black"),
        axis.title = ggplot2::element_text(size = 18, color = "black")
    )

ggplot2::ggsave(filename = "~/tmp/or_plot_loneliness_unsorted.png", or_plot,
dpi = 300, width = 16, height = 10)

Session information

Code

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.0 (2024-04-24)
 os       Arch Linux
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Sao_Paulo
 date     2024-06-07
 pandoc   3.1.13 @ /usr/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version  date (UTC) lib source
 assertthat      0.2.1    2019-03-21 [1] CRAN (R 4.2.2)
 backports       1.5.0    2024-05-23 [1] CRAN (R 4.4.0)
 base64enc       0.1-3    2015-07-28 [1] CRAN (R 4.2.2)
 bayestestR      0.13.2   2024-02-12 [1] CRAN (R 4.3.2)
 broom           1.0.6    2024-05-17 [1] CRAN (R 4.4.0)
 broom.helpers   1.15.0   2024-04-05 [1] CRAN (R 4.3.3)
 broom.mixed     0.2.9.5  2024-04-01 [1] CRAN (R 4.3.3)
 cli             3.6.2    2023-12-11 [1] CRAN (R 4.3.2)
 coda            0.19-4.1 2024-01-31 [1] CRAN (R 4.3.2)
 codetools       0.2-20   2024-03-31 [2] CRAN (R 4.4.0)
 colorspace      2.1-0    2023-01-23 [1] CRAN (R 4.3.1)
 commonmark      1.9.1    2024-01-30 [1] CRAN (R 4.3.2)
 crayon          1.5.2    2022-09-29 [1] CRAN (R 4.2.2)
 datawizard      0.11.0   2024-06-05 [1] CRAN (R 4.4.0)
 DBI             1.2.3    2024-06-02 [1] CRAN (R 4.4.0)
 digest          0.6.35   2024-03-11 [1] CRAN (R 4.3.3)
 dplyr           1.1.4    2023-11-17 [1] CRAN (R 4.3.2)
 emmeans         1.10.2   2024-05-20 [1] CRAN (R 4.4.0)
 estimability    1.5.1    2024-05-12 [1] CRAN (R 4.4.0)
 evaluate        0.23     2023-11-01 [1] CRAN (R 4.3.2)
 fansi           1.0.6    2023-12-08 [1] CRAN (R 4.3.2)
 farver          2.1.2    2024-05-13 [1] CRAN (R 4.4.0)
 fastmap         1.2.0    2024-05-15 [1] CRAN (R 4.4.0)
 forcats         1.0.0    2023-01-29 [1] CRAN (R 4.2.2)
 furrr           0.3.1    2022-08-15 [1] CRAN (R 4.2.2)
 future          1.33.2   2024-03-26 [1] CRAN (R 4.3.3)
 generics        0.1.3    2022-07-05 [1] CRAN (R 4.2.2)
 ggplot2         3.5.1    2024-04-23 [1] CRAN (R 4.3.3)
 globals         0.16.3   2024-03-08 [1] CRAN (R 4.3.3)
 glue            1.7.0    2024-01-09 [1] CRAN (R 4.3.2)
 gt              0.10.1   2024-01-17 [1] CRAN (R 4.3.2)
 gtable          0.3.5    2024-04-22 [1] CRAN (R 4.3.3)
 gtsummary       1.7.2    2023-07-15 [1] CRAN (R 4.3.1)
 haven           2.5.4    2023-11-30 [1] CRAN (R 4.3.2)
 hms             1.1.3    2023-03-21 [1] CRAN (R 4.2.3)
 htmltools       0.5.8.1  2024-04-04 [1] CRAN (R 4.3.3)
 htmlwidgets     1.6.4    2023-12-06 [1] CRAN (R 4.3.2)
 huxtable        5.5.6    2024-02-15 [1] CRAN (R 4.3.2)
 insight         0.20.0   2024-06-04 [1] CRAN (R 4.4.0)
 janitor         2.2.0    2023-02-02 [1] CRAN (R 4.2.2)
 jsonlite        1.8.8    2023-12-04 [1] CRAN (R 4.3.2)
 knitr           1.47     2024-05-29 [1] CRAN (R 4.4.0)
 labeling        0.4.3    2023-08-29 [1] CRAN (R 4.3.1)
 later           1.3.2    2023-12-06 [1] CRAN (R 4.3.2)
 lattice         0.22-6   2024-03-20 [2] CRAN (R 4.4.0)
 lifecycle       1.0.4    2023-11-07 [1] CRAN (R 4.3.2)
 listenv         0.9.1    2024-01-29 [1] CRAN (R 4.3.2)
 lubridate       1.9.3    2023-09-27 [1] CRAN (R 4.3.1)
 magrittr        2.0.3    2022-03-30 [1] CRAN (R 4.2.2)
 markdown        1.13     2024-06-04 [1] CRAN (R 4.4.0)
 MASS            7.3-60.2 2024-04-26 [1] CRAN (R 4.4.0)
 Matrix          1.7-0    2024-04-26 [1] CRAN (R 4.4.0)
 mitools         2.4      2019-04-26 [1] CRAN (R 4.2.2)
 multcomp        1.4-25   2023-06-20 [1] CRAN (R 4.3.1)
 munsell         0.5.1    2024-04-01 [1] CRAN (R 4.3.3)
 mvtnorm         1.2-5    2024-05-21 [1] CRAN (R 4.4.0)
 nlme            3.1-164  2023-11-27 [1] CRAN (R 4.3.2)
 parallelly      1.37.1   2024-02-29 [1] CRAN (R 4.3.2)
 parameters      0.21.7   2024-05-14 [1] CRAN (R 4.4.0)
 pillar          1.9.0    2023-03-22 [1] CRAN (R 4.2.3)
 pkgconfig       2.0.3    2019-09-22 [1] CRAN (R 4.2.2)
 processx        3.8.4    2024-03-16 [1] CRAN (R 4.3.3)
 ps              1.7.6    2024-01-18 [1] CRAN (R 4.3.2)
 purrr           1.0.2    2023-08-10 [1] CRAN (R 4.3.1)
 quarto          1.4      2024-03-06 [1] CRAN (R 4.3.3)
 R6              2.5.1    2021-08-19 [1] CRAN (R 4.2.2)
 ragg            1.3.2    2024-05-15 [1] CRAN (R 4.4.0)
 Rcpp            1.0.12   2024-01-09 [1] CRAN (R 4.3.2)
 readr           2.1.5    2024-01-10 [1] CRAN (R 4.3.2)
 repr            1.1.7    2024-03-22 [1] CRAN (R 4.3.3)
 rlang           1.1.4    2024-06-04 [1] CRAN (R 4.4.0)
 rmarkdown       2.27     2024-05-17 [1] CRAN (R 4.4.0)
 rstudioapi      0.16.0   2024-03-24 [1] CRAN (R 4.3.3)
 sandwich        3.1-0    2023-12-11 [1] CRAN (R 4.3.2)
 sass            0.4.9    2024-03-15 [1] CRAN (R 4.3.3)
 scales          1.3.0    2023-11-28 [1] CRAN (R 4.3.2)
 see             0.8.4    2024-04-29 [1] CRAN (R 4.4.0)
 sessioninfo     1.2.2    2021-12-06 [1] CRAN (R 4.2.2)
 skimr           2.1.5    2022-12-23 [1] CRAN (R 4.2.3)
 snakecase       0.11.1   2023-08-27 [1] CRAN (R 4.3.1)
 stringi         1.8.4    2024-05-06 [1] CRAN (R 4.4.0)
 stringr         1.5.1    2023-11-14 [1] CRAN (R 4.3.2)
 survey          4.4-2    2024-03-20 [1] CRAN (R 4.3.3)
 survival        3.7-0    2024-06-05 [1] CRAN (R 4.4.0)
 systemfonts     1.1.0    2024-05-15 [1] CRAN (R 4.4.0)
 textshaping     0.4.0    2024-05-24 [1] CRAN (R 4.4.0)
 TH.data         1.1-2    2023-04-17 [1] CRAN (R 4.3.0)
 tibble          3.2.1    2023-03-20 [1] CRAN (R 4.2.3)
 tidyr           1.3.1    2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect      1.2.1    2024-03-11 [1] CRAN (R 4.3.3)
 timechange      0.3.0    2024-01-18 [1] CRAN (R 4.3.2)
 tzdb            0.4.0    2023-05-12 [1] CRAN (R 4.3.0)
 utf8            1.2.4    2023-10-22 [1] CRAN (R 4.3.1)
 vctrs           0.6.5    2023-12-01 [1] CRAN (R 4.3.2)
 withr           3.0.0    2024-01-16 [1] CRAN (R 4.3.2)
 xfun            0.44     2024-05-15 [1] CRAN (R 4.4.0)
 xml2            1.3.6    2023-12-04 [1] CRAN (R 4.3.2)
 xtable          1.8-4    2019-04-21 [1] CRAN (R 4.2.2)
 yaml            2.3.8    2023-12-11 [1] CRAN (R 4.3.2)
 zoo             1.8-12   2023-04-13 [1] CRAN (R 4.3.0)

 [1] /home/bruno/.local/share/r
 [2] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────

Reuse

CC BY 4.0