Published on in Vol 2 (2023)

Preprints (earlier versions) of this paper are available at, first published .
Association of Health Care Work With Anxiety and Depression During the COVID-19 Pandemic: Structural Topic Modeling Study

Association of Health Care Work With Anxiety and Depression During the COVID-19 Pandemic: Structural Topic Modeling Study

Association of Health Care Work With Anxiety and Depression During the COVID-19 Pandemic: Structural Topic Modeling Study

Original Paper

1Department of Psychiatry, Grossman School of Medicine, New York University, New York, NY, United States

2Ann S Bowers College of Computing and Information Science, Cornell University, Ithaca, NY, United States

3Research and Development, Talkspace, New York, NY, United States

Corresponding Author:

Matteo Malgaroli, PhD

Department of Psychiatry

Grossman School of Medicine

New York University

1 Park Avenue

8th Floor

New York, NY, 10016

United States

Phone: 1 6467544030


Background: Stressors for health care workers (HCWs) during the COVID-19 pandemic have been manifold, with high levels of depression and anxiety alongside gaps in care. Identifying the factors most tied to HCWs’ psychological challenges is crucial to addressing HCWs’ mental health needs effectively, now and for future large-scale events.

Objective: In this study, we used natural language processing methods to examine deidentified psychotherapy transcripts from telemedicine treatment during the initial wave of COVID-19 in the United States. Psychotherapy was delivered by licensed therapists while HCWs were managing increased clinical demands and elevated hospitalization rates, in addition to population-level social distancing measures and infection risks. Our goal was to identify specific concerns emerging in treatment for HCWs and to compare differences with matched non-HCW patients from the general population.

Methods: We conducted a case-control study with a sample of 820 HCWs and 820 non-HCW matched controls who received digitally delivered psychotherapy in 49 US states in the spring of 2020 during the first US wave of the COVID-19 pandemic. Depression was measured during the initial assessment using the Patient Health Questionnaire-9, and anxiety was measured using the General Anxiety Disorder-7 questionnaire. Structural topic models (STMs) were used to determine treatment topics from deidentified transcripts from the first 3 weeks of treatment. STM effect estimators were also used to examine topic prevalence in patients with moderate to severe anxiety and depression.

Results: The median treatment enrollment date was April 15, 2020 (IQR March 31 to April 27, 2020) for HCWs and April 19, 2020 (IQR April 5 to April 27, 2020) for matched controls. STM analysis of deidentified transcripts identified 4 treatment topics centered on health care and 5 on mental health for HCWs. For controls, 3 STM topics on pandemic-related disruptions and 5 on mental health were identified. Several STM treatment topics were significantly associated with moderate to severe anxiety and depression, including working on the hospital unit (topic prevalence 0.035, 95% CI 0.022-0.048; P<.001), mood disturbances (prevalence 0.014, 95% CI 0.002-0.026; P=.03), and sleep disturbances (prevalence 0.016, 95% CI 0.002-0.030; P=.02). No significant associations emerged between pandemic-related topics and moderate to severe anxiety and depression for non-HCW controls.

Conclusions: The study provides large-scale quantitative evidence that during the initial wave of the COVID-19 pandemic, HCWs faced unique work-related challenges and stressors associated with anxiety and depression, which required dedicated treatment efforts. The study further demonstrates how natural language processing methods have the potential to surface clinically relevant markers of distress while preserving patient privacy.

JMIR AI 2023;2:e47223



During the COVID-19 pandemic, health care workers (HCWs) faced mounting stress as they cared for patients experiencing a disease that, to date, has infected 538 million globally and 85 million in the United States alone [1]. Surges in US infection rates forced hospitals to operate at greater than 100% capacity [2], with COVID-19 hospitalizations in 18 states exceeding 10% of all available beds and 7 states operating at more than 15% overcapacity [3]. As a result, HCWs faced overwhelming workloads, longer hours, increased personal infection risk, equipment shortages, sleep disruption, and at times the need to make ethically challenging decisions, such as rationing care for patients [4-8]. This increased burden on HCWs was further aggravated by the loss of social support due to quarantine policies and the fear of infecting family and friends [6,7,9].

The well-being of HCWs is the foundation of a well-functioning health care system [10-12]. Prior to the pandemic, HCWs already faced higher rates of anxiety, depression [13], and suicidal ideation [14] compared to the general population [13,15]. The sudden increase in professional and personal stress due to COVID-19 put HCWs, an already vulnerable population with barriers to treatment access [16], at further risk for developing symptoms of anxiety and depression [5,6,9]. Prior studies have linked depression and anxiety in HCWs to decreased patient safety and increased medical errors [17-20]. Given the adverse consequences of psychological stress for HCWs and their patients, it is crucial to better understand the core concerns associated with mental health symptoms such as anxiety and depression in HCWs, especially during periods of acute stress like COVID-19 surges. It is especially crucial to study these concerns in ways that preserve the privacy and anonymity of HCWs, given the professional stigma reported by some health care providers who seek mental health treatment [21-23].

Hastened by the pandemic, recent advances have been made in developing and disseminating digital mental health interventions to address acute and long-term treatment barriers, including mobile apps and telehealth platforms connecting patients to mental health providers [24]. Such interventions offer a unique opportunity for understanding and addressing the mental health concerns of vulnerable populations like HCWs. Despite their potential, little research has examined the adoption of digital health treatment by HCWs during COVID-19.

In addition to providing flexible options for clinical engagement, digital treatment delivery enables the automatic collection of large amounts of treatment data, which in turn can be analyzed in an aggregated and deidentified fashion using machine learning (ML) methods. Researchers in digital psychiatry and ubiquitous computing have used ML to develop passive measures for mental health concerns, which can be refined into clinically relevant markers for symptom severity and embedded into treatment pathways [25-27]. ML-based natural language processing (NLP) holds particular promise for the study of mental health concerns, as it allows the study of verbal expressions of distress at scale, capturing clinically relevant linguistic features from unstructured text as the patient-therapist interaction unfolds [28]. Of particular interest in the study of psychotherapy transcripts is topic modeling, an unsupervised NLP method to parse semantic structures (or topics) from large corpora of text without the need for line-by-line annotation [29]. Topic modeling has been used to generate knowledge in multiple areas of science [30], and previous uses of topic modeling in mental health include the detection of depression [31] and anxiety [32], also in the context of the COVID-19 pandemic [33]. In brief, topic modeling imagines that every document within a corpus contains a mixture of corpus-wide distributions of words within a fixed vocabulary. Topic modeling algorithms seek to find the topics that best characterize a given corpus across documents, as a means to understand the core content of potentially difficult texts (such as therapy transcripts) at scale [28]. Structural topic models (STMs) also enable the study of covariates in their influence on topic prevalence, or the proportion of a document associated with a topic, and topical content, or the distribution of words used within a topic. Compared to lexicon-based methods, topic modeling allows assessing context-specific language (such as medical terminology) within the corpus of transcripts. Compared to embeddings, which capture semantic similarity at the word or sentence level, topic modeling can also uncover broader thematic associations across transcripts, to then group individuals based on topical themes emerging from the transcripts. For collections of texts like transcripts of psychotherapy sessions, topic modeling also offers the potential to be more privacy preserving: topic models process text into distributions of keywords and enable researchers to study the semantic content of sensitive therapy transcripts while preserving treatment-seekers’ privacy by minimizing exposure of personally identifiable or sensitive data. Topic modeling can provide empirical insights into the stressors experienced by medical providers during this highly stressful period of the pandemic. Moreover, by linking specific concerns identified via topic modeling with depression and anxiety symptoms measured with validated scales, linguistic features can be developed to serve as passive computational markers [34] of distress, with the potential to highlight areas of risk or need for clinical attention. Identifying the most disruptive risk factors for HCWs would support improvements in treatment planning and inform the selection of mental health resources for HCWs now, during future COVID-19 waves, or other widespread epidemics.

In this study, we examined deidentified treatment transcripts from 820 HCWs and 820 matched controls who received digitally delivered psychotherapy from licensed therapists during the first US surge of the COVID-19 pandemic, between March and July 2020. Our aim was to identify the unique treatment needs of HCWs characterized as treatment topics compared to non-HCW matched controls. We analyzed transcripts using STM to assess the topic in the deidentified transcripts and their associations with symptom levels. Specifically, we used topic modeling to analyze therapeutic conversations during the first 3 weeks of treatment in HCW and controls and identified emerging topics in a privacy-preserving fashion. We also assessed the association of these topics with moderate to severe levels of anxiety and depression (Figure 1).

Figure 1. Schematic overview of topic modeling with fictitious example of a transcript. Topics are generated across the full transcript corpora. Individual topic distributions are then associated with their respective symptom ratings. GAD-7: General Anxiety Disorder Scale-7; ICU: intensive care unit; PHQ-9: patient health questionnaire-9.

Participants and Setting

Our sample consisted of self-referred HCWs from the United States seeking digitally delivered psychotherapy in spring 2020, amidst the first US surge of COVID-19 hospitalizations. HCWs were defined as health care and medical providers (eg, physicians, nurses, residents, emergency medical service providers, and social workers) with an active National Provider Identifier (NPI) profile at the time of treatment. Services were donated by a telehealth platform [35] as part of an initiative to provide 1 month of free treatment to essential HCWs. Eligibility was verified by the platform through employment and NPI verification. In order to distinguish health care–specific and general population stressors related to COVID-19, we included a matched control sample of non-HCWs from the general population seeking the same treatment service as the HCW sample in spring 2020. Non-HCW patients accessed the platform through employee assistance programs, self-referral, and as benefits through individual insurance. From this outpatient pool, a control sample was matched to HCWs based on demographics, symptom scores, US state of residency, and treatment start date. Control matching was performed algorithmically, and matching procedures are described in Multimedia Appendix 1 [5,29,36-49].

Before starting treatment, HCWs and controls received a primary ICD-10 diagnosis based on a standardized intake evaluation by a licensed clinician to identify presenting complaints and treatment history. Following the intake, HCWs and controls were matched to a licensed therapist and received psychotherapy through messages exchanged using a HIPAA (Health Insurance Portability and Accountability Act)-compliant interface for smartphones and computers. The inclusion criteria were (1) living in the United States, (2) being an English speaker, and (3) having regular internet or cellphone access (to access the digitally delivered treatment). Exclusion criteria for both samples were (1) any condition deemed by the intake clinician to require hospitalization; (2) suicidal thoughts or behavior sufficient to be marked a yes on any of questions 3 through 6 (at least thoughts about a potential suicide method) on the Columbia Suicide Severity Rating Scale Lifetime-Recent [36]; (3) current or past diagnoses of bipolar disorder, substance use disorders, schizophrenia spectrum disorders, or psychotic disorders; (4) patients who did not have complete baseline symptom measures; and (5) patients who did not have treatment transcripts available. Last, as exclusion criterion 6, during matching procedures, we excluded from the control group any health care professional. An overview and schematic of the sampling procedure in this study are reported in Multimedia Appendix 1 [5,29,36-49].

The final sample consisted of 820 HCWs and 820 matched controls. The median treatment start date for HCWs was April 15, 2020 (IQR March 31 to April 27, 2020). For the matched control group, the median treatment start date was April 19, 2020 (IQR April 5 to April 27, 2020).

Data Sources and Measures


Psychotherapy treatment transcripts consisted of deidentified messages between patients and their therapists with their corresponding timestamp (ie, date and time of delivery) in masked form for the author role of the text. All transcript data were deidentified using an algorithm to scrub out any personal identifiers, proper nouns, locations, dates, and other potential identifiers. Transcripts were truncated to include only messages sent by patients from the initial intake to their first outcome survey, typically 3 weeks after treatment initiation. HCWs and control transcripts were both preprocessed for analysis: numbers, punctuation, stopwords, and anonymization terms (eg, “{NAME}”) were removed; the remaining words were stemmed and converted to their root form (eg, computing was changed to comput). The “vocabulary” of unique words across the preprocessed transcripts was then made more tractable by removing words that occurred in less than 50 documents and then removing documents that contained no words. The final HCW corpus contained 820 therapy transcripts and a vocabulary of 1208 unique terms across 225,219 tokens. The final control corpus contained 820 transcripts and a vocabulary of 1259 unique terms across 217,321 tokens.

HCW Occupations

NPI information for HCWs in our study was not available as data due to privacy reasons. To assess the distribution of specific health care professions in the HCW sample anonymously, we developed a heuristic classification algorithm. The algorithm detected instances in the transcripts where patients self-identified as HCWs or spoke about their professional roles. Code, heuristics, and accuracy metrics of the heuristic classification algorithm are further reported in Multimedia Appendices 1 [5,29,36-49] and 3.

Psychiatric Symptom Measures

Depression symptoms were measured at the beginning of treatment using the Patient Health Questionnaire-9 (PHQ-9) [50], and anxiety symptoms were measured using the General Anxiety Disorder Scale-7 (GAD-7) [51]. The PHQ-9 assesses for depressive symptoms over the past 2 weeks on a 4-point Likert scale (0=“not at all” to 3=“nearly every day”), with a total maximum score of 27. The GAD-7 examines symptoms of anxiety over the past 2 weeks on a 4-point Likert scale (0=“not at all” to 3=“nearly every day”), with a total maximum score of 21. A score of 10 or more on the PHQ-9 identifies the presence of clinically significant moderate-to-severe depression [50]; a score of 10 or more on the GAD-7 identifies the presence of clinically significant moderate-to-severe anxiety [51].

Data Analysis

Treatment Topic Identification

All analyses were conducted in Python (version 3.9.9) and in R (version 4.1.2) [37], using the package stm [38] for topic modeling. Additional model specifications, diagnostic analyses, model selection procedures, and code for all analyses are reported in Multimedia Appendices 1 [5,29,36-49] and 2.

STMs were used to identify topics in the HCW and matched control corpora. We used a mixed statistical and human validation process to select topics (K=30) for analysis in both HCW and control data sets (see Multimedia Appendix 1 [5,29,36-49] for full details). After identifying these topics, results from the STM were manually coded to characterize their relevance to one of three areas of interest: (1) mental health, (2) COVID-19 pandemic-related disruptions, and (3) health care. Classification of relevant topics was determined through the consensus of a panel of experts consisting of 2 doctoral-level clinical psychologists (MM and TDH), 1 psychiatrist (NMS), and 1 NLP researcher (ET). Topics were examined to understand their content based on their most characteristic words, determined using the harmonic mean of word frequency and exclusivity across topics [39].

Topics and Clinical Levels of Depression and Anxiety

We used STM effect estimators to study the association between topics discussed by each patient with moderate to severe depression or anxiety. Specifically, we ran logistic-normal generalized linear models examining the association between the prevalence of relevant topics in a patient’s transcripts and their binarized psychopathology score, with GAD-7 or PHQ-9 symptom scores ≥10 classed as moderate-to-severe anxiety or depression (Figure 1). The study combined PHQ-9 ≥10 or GAD-7 ≥10 cutoffs to account for the high prevalence of anxiety and depression comorbidity [52]. To estimate the parameters of the generalized linear models, we used a global approximation to the average covariance matrix governing the variational posterior (vs a per-document approximation that was less computationally tractable). Topic prevalence for a particular topic is contrasted for 2 groups within a categorical covariate (none-to-mild vs moderate-to-severe symptoms). For each data set, all topics (K=30) were modeled and reported in Multimedia Appendix 1 [5,29,36-49]; here, we report only those topics manually characterized as relevant.

Ethics Approval

All patients and clinicians gave informed consent to use their data in a deidentified, aggregated format for research purposes as part of the user agreement before they began using the platform. Study procedures were approved by the Cornell University Institutional Review Board (2004009578).

Sample Characteristics

HCW patients (n=820, Table 1) modally identified as female (746/820, 91%). The mean age in the sample was 31.3 (SD 5.7) years. They were distributed across the United States, with the largest concentrations in New York State (114/820, 13.1%) and California (107/820, 13%). Figure 2 reports the distribution of professions in the HCWs identified from the transcripts, with nurses (414/820, 50.5%) and physicians (148/820, 18.1%) being the most frequent health care occupations. For 289 HCWs (35.2%), this was reportedly the first psychotherapy treatment experience. Primary diagnoses given by intake clinicians for HCWs included anxiety disorders (463/820, 56.5%), of which 100 were generalized anxiety disorders (12.2%). Trauma- and stressor-related disorders (275/820, 35.5%) were next most common, with adjustment disorders as the modal diagnosis (219/820, 26.7%) in this category. Finally, depressive disorders (67/820, 8.2%) were least common and included 45 diagnosed with major depressive disorder (5.5%). Based on PHQ-9 and GAD-7 cutoffs, the prevalence of moderate to severe depression in the HCW sample at the beginning of treatment was 43.9% (n=360) and moderate to severe anxiety was 68.5% (n=562) A total of 601 (73.3%) HCWs had either moderate to severe anxiety or depression at baseline. In the matched control sample, 560 (68.3%) had moderate to severe anxiety and 408 (49.8%) had moderate to severe depression, with 601 (73.3%) having either moderate to severe anxiety or depression at baseline. Characteristics for the sample of matched controls (n=820) are reported in Table 1.

Table 1. Demographic and clinical characteristics of health care worker sample (n=820) and matched control sample (n=820).
VariableHealth care workersMatched controls
Age (years), mean (SD)31.3 (5.7)32 (6.6)
Diagnosis, n (%)

Anxiety disorders463 (56.5)429 (52.3)

Trauma- and stressor-related disorders275 (35.5)137 (16.7)

Depressive disorders67 (8.2)225 (27.4)

Other disorders15 (1.8)29 (3.5)
Gender, n (%)

Female746 (91)682 (83.2)

Male69 (8.4)125 (15.2)

Other5 (0.6)13 (1.6)
State, n (%)

California114 (13.1)132 (16.1)

New York107 (13)108 (13.2)

Florida55 (6.7)56 (6.8)

Texas48 (5.9)38 (4.6)

Illinois45 (5.5)33 (4)

Massachusetts41 (5)34 (4.2)

Pennsylvania35 (4.3)34 (4.2)

North Carolina30 (3.7)28 (3.4)

New Jersey30 (3.7)30 (3.7)

Washington28 (3.4)27 (3.9)

Other US states287 (35)300 (36.6)

GAD-7a score, mean (SD)12.4 (4.9)12.3 (5.1)

Moderate to severe (GAD-7 ≥10), n (%)562 (68.5)560 (68.3)

PHQ-9b score, mean (SD)9.4 (5.7)10 (5.8)

Moderate to severe (PHQ-9≥10), n (%)360 (43.9)408 (49.8)

First experience (Yes), n (%)289 (35.2)272 (33.2)

Start date (month/day/year), median (IQR)04/15/2020 (03/31/2020-04/27/2020)04/19/2020 (4/5/2020-4/27/2020)

aGAD-7: General Anxiety Disorder Scale-7.

bPHQ-9: Patient Health Questionnaire-9.

Figure 2. Algorithmically identified distribution of medical professions in the HCW sample (n=820). HCW: health care worker.

Treatment Topics

STM of psychotherapy transcripts identified 30 conversational themes for HCWs and 30 topics for non-HCW controls. Inspection of the topics showed a cluster of themes relevant to mental health and a cluster relevant to health care and the pandemic (Table 2). All topics emerging from the transcripts are reported in Figure 3.

HCWs discussed 4 topics related to practicing medicine. Examination of the most frequent words exclusive to HCWs indicated treatment topics focused on (1) virus-related fears (topic H3: covid, worker, healthcar), (2) working on the hospital floor and intensive care units (H4: unit, hospit, icu), (3) patients and masks (H16: patient, mask, test), and (4) health care roles including resident and attending (H29: resid, remain, attend). In contrast, therapy transcripts from controls contained only 1 topic about the COVID-19 pandemic (C25: pandem, concern, anxiety) and 1 occupational-related topic (C27: team, manag, boss).

HCWs and controls each discussed 5 topics with their therapist related to their mental health, endorsing panic attacks (HCW H2: panic, breath, attack; control C21: breath, sleep, panic), affective disturbances (HCW H15: depress, feel, mood; control C16: felt, feel, self), and grief (HCW H30: death, card, die; control C19: die, experienc, current). HCWs also endorsed sleep disturbances (H13: sleep, night, bed), and stress (H21: stress, challeng, increase). Among health care and mental health topics, HCWs most frequently discussed sleep disturbances (H13: sleep, night, bed) and the hospital floor (H4: unit, hospit, icu). Multimedia Appendix 1 [5,29,36-49] reports the proportions of all topics in the HCWs and control transcripts.

Table 2. Psychotherapy topics referencing mental health, health care, and COVID-19 in health care workers and matched controls.
Sample, category, and topicTop 10 terms (frequency and exclusivity)a
Health care workers (n=820)

Health care

H3covid, worker, healthcar, hospit, patient, physician, current, week, promot, doctor

H4unit, hospit, icu, nurs, virus, news, sick, covid, safe, fear

H16patient, mask, test, shift, unit, wear, staff, icu, ppe, coronavirus

H29resid, remain, attend, program, becom, answer, clinic, mayb, mean, studi

Mental health

H2panic, breath, attack, symptom, anxious, anxieti, exercis, chest, tool, calm

H13sleep, night, bed, shift, asleep, wake, usual, fall, morn, relax

H15depress, feel, mood, anyth, suicid, quarantin, sad, episod, sometim, hard

H21stress, challeng, increas, relief, level, team, stressor, overal, focus, line

H30death, card, die, grief, code, credit, pass, deal, charg, enter
Matched controls (n=820)

Pandemic disruptions

C25pandem, concern, anxieti, situat, cope, corona, group, relat, social, extrem

C11nice, quarantin, late, gym, enjoy, crazi, glad, weather, excit, heavi

C27team, manag, boss, project, task, routin, offic, work, cowork, hour

Mental health

C21breath, sleep, panic, sick, attack, night, anxious, anxieti, worri, calm

C16felt, feel, self, negat, anxious, thought, sad, bad, scare, boyfriend

C9therapi, depress, therapist, issu, anxieti, disord, eat, month, cost, coupl

C2anger, forgiv, discuss, hurt, angri, behavior, intak, lie, said, sexual

C19die, experienc, current, attack, medic, alcohol, rate, daili, health, panic

aMost frequent and exclusive words that distinguish each topic in patients’ transcripts.

Figure 3. Structural topic model estimates of association between psychiatric symptoms and mean topic prevalence. Topics on the right side of the dotted line have higher prevalence in Controls and Health care Workers with moderate to severe anxiety and depression.

Topics and Clinical Levels of Depression and Anxiety

After determining the distribution of topics emerging from treatment transcripts, we examined the association of topics discussed in psychotherapy with patients’ moderate-to-severe anxiety or depression at 3 weeks of treatment. Effect estimates and their 95% CIs are reported in Figure 3. Discussion of the hospital and its locations (H5: unit, hospit, icu) was significantly more prevalent among HCWs with moderate to severe anxiety or depression (topic prevalence=0.035, 95% CI 0.022-0.048; P<.001). This effect was not observed in the matched controls for the pandemic-relevant topic (C25, pandem, concern, anxiety: prevalence=0.003, 95% CI –0.012 to 0.018; P=.67), nor for the work-relevant topic (C27, team, manag, boss: prevalence=–0.005, 95% CI –0.021 to 0.011; P=.55). Other topics were significantly more prevalent for symptomatic HCWs, including endorsing affective disturbances (H15, depress, feel, mood: prevalence=0.014, 95% CI 0.002-0.026; P=.03) and sleep disturbances (H13, sleep, night, bed: prevalence=0.016, 95% CI 0.002-0.030; P=.02). No other mental health and health care topics occurred at significantly higher frequency in HCWs with moderate to severe anxiety or depression, with weak trends observed for discussions related to panic attacks, grief, and mask-related concerns. Controls with moderate to severe symptoms were more likely to discuss affective disturbances (C16: prevalence=0.021, 95% CI 0.005-0.037; P=.01). Numeric estimates for all HCWs and control topics are reported in Multimedia Appendix 1 [5,29,36-49].


In this study, we examined topics for 820 HCWs and 820 matched general-population outpatients undergoing psychotherapy through a telehealth platform in spring 2020 during the first US wave of COVID-19 and their associations with moderate to severe depression and anxiety. In total, 3 weeks of treatment transcripts were examined using NLP methods, enabling elucidation of the content of therapy discussions automatically at scale and in a privacy-preserving way. Results indicated significant differences in the proportion of health care–related topics between HCW and control cohorts, as well as their association with moderate to severe anxiety and depression.

Analysis of the distribution of NLP-derived treatment topics indicated that HCWs extensively discussed health care–related topics in psychotherapy. Specifically, HCWs had 4 conversational themes around health care, while controls only had 1. This finding is consistent with the increased work-related stressors experienced by HCWs during the COVID-19 pandemic, when they were particularly vulnerable to work-related adverse impacts compared to the general population, given the increased professional and personal responsibilities they faced. These unique effects of COVID-19 made HCWs specifically vulnerable to mental health problems compared to the general population [53]. In addition to the effect of potential stressors experienced by HCWs during the COVID-19 pandemic, this finding is also consistent with prior literature indicating that work-related stress is almost twice as prevalent for HCWs as for workers in other fields after controlling for work hours, with physicians at the front line of care at greatest risk [54]. Unique factors that contribute to this include longer hours and greater difficulty with work-life integration compared to other US workers.

Analysis of the prevalence of topics and their association with symptomatology indicated that among HCWs, discussion of hospital settings was significantly associated with moderate to severe anxiety and depression. This association was unique for HCWs and not present in the general-population outpatients, despite shared anxiety, work, and health-related concerns during the pandemic [55]. Discussion of sleep disturbances and mood difficulties were also significantly associated with moderate to severe anxiety and depression in HCWs. These findings confirm the connection between anxiety, depression, and concerns related to being a practicing medical professional during the COVID-19 pandemic [56]. Although not assessed here, possible underlying contributing factors may be hypothesized to include longer exposures to stressful working environments, a higher level of personal responsibility in critical situations, and increased sleep disruption [19]. Sleep deprivation among HCWs has been consistently linked to increases in anxiety, depression, and suicidal ideation [57]. These findings are especially robust for HCWs who work longer hours, who work night shifts, and who have less time off between their shifts [58]. Existing literature supports a similar relationship for how work-related stress and anxiety and depressive symptoms mutually reinforce each other [59].

Strengths and Limitations

Findings of this study are unique due to the large corpus of treatment transcripts from HCWs during the initial phase of the COVID-19 pandemic, and data analytic methods exploring the use of computational linguistics to identify stated risk factors. To the growing body of literature documenting the challenges posed to mental health and well-being by the COVID pandemic, we contribute a proof-of-concept demonstrating that web-based therapy platforms can serve as unique observatories for the mental health needs of hard-to-reach populations like HCWs. This study has several limitations. First, our sample consisted of self-referred patients, and differences in access to telehealth services could reduce the generalizability of results. Second, our sample showed a skew toward female individuals and nursing occupations, although this distribution aligned consistently with US population occupational statistics for HCWs [60]. Third, we focused our analysis on a concatenation of all of a patient’s talk turns during the first 3 weeks of treatment. Future work should focus on complex modeling of topics over time, for example using sequential models to examine topics turn-by-turn, as well as models incorporating therapists’ talk turns. Fourth, our findings emerged from the corpus the STM was trained on and might not generalize when applied to different corpora, such as transcripts in languages other than English. Future studies should consider using pretrained large language models on wider corpora of clinical data for more generalizable topic representations across multiple domains and languages [61]. Fifth, topic associations with symptoms were limited to data from validated self-report measures, and other methods to capture psychiatric symptoms may return different results.

Privacy and Ethics

Important ethical considerations about patient privacy need to be made when accessing sensitive health information such as psychotherapy transcripts. This study included several privacy-preserving measures to reduce risks associated with the study. First, all patients and clinicians gave informed consent to the use of their data in a deidentified and aggregated format for research purposes as part of the user agreement they signed before they began using the platform. All procedures were approved by the university institutional review board. Second, all transcripts were deidentified by the platform prior to the research team accessing the data. Deidentification removed any personal identifiers, like proper nouns, locations, and dates, among other potential identifiers. Third, we limited our analyses to the outputs of STM, which are distributions of common words less likely to reveal private information than the raw text. The first 2 authors (MM and ET) handled the primary analyses and were the only authors to view any portion of raw deidentified text, accessed exclusively as part of model development. Fourth, HCWs’ NPIs and associated information were not accessed as part of the study. Rather, specific health care occupations were identified using named entity recognition on the deidentified transcripts. This solution allowed us to extract occupational information while minimizing access to the raw deidentified transcripts, thus further preserving patient privacy.


Among US HCWs seeking psychotherapy treatment in spring 2020 during the first wave of the COVID-19 pandemic, discussion of workplace-related concerns was uniquely associated with moderate to severe anxiety and depression. The association between health care work and psychiatric symptoms was unique, going beyond other quality-of-life factors potentially related to work such as poor sleep hygiene. We contribute to the literature on the psychological burden associated with health care work by demonstrating that HCW-specific content related to anxiety and depression emerges naturally in the context of web-based psychotherapy. These findings highlight the unique mental health concerns faced by HCWs during the COVID-19 pandemic, a time with significantly increased work demands, lack of social support, and fear of infection from work activity for HCWs and their families. These stressors were in addition to work-related stressors regularly faced by HCWs [54]. The results of this research could help pinpoint the key factors contributing to the high levels of depression and anxiety among HCWs and fill the gaps in care. The increased stress put on HCWs during COVID-19 along with the established link between HCWs’ mental health and societal well-being supports the critical need to prioritize mental health treatment provision for HCWs.

As mental health risk factors were captured automatically from transcripts using NLP methods, the study also serves as a proof of concept for the automated detection of psychological distress in HCWs. One of the main advantages of NLP markers is that they can identify specific language patterns that are associated with anxiety and depression. Unlike traditional assessment methods, such as self-reported surveys and interviews, NLP markers from psychotherapy platforms present a passive and less burdensome way to assess therapy-seekers’ mental health, akin to the digital biomarkers of mental health researchers have developed from wearable and smartphone data. Defining and validating NLP markers of anxiety and depression could lead to more accurate and reliable assessments, which would be beneficial for both patients and health care providers. Moreover, NLP markers could help to better understand the underlying mechanisms of anxiety and depression by teasing patients into different subgroups based on their specific needs and characteristics. By identifying these patterns, we could tailor treatment and intervention strategies to the specific needs of each patient in clinical settings [62,63]. Eventually, NLP methods could support the advancement of personalized medicine approaches where mental health needs can be estimated routinely using automated methods in ecological or real-world settings. This could be achieved by designing digital apps [64] that offer periodic checks to elicit narrative content about potential risk factors and stressors. Transcripts of the narratives could then be analyzed to extract conversational topics associated with probabilities of experiencing distress through NLP techniques such as STM. For example, this approach could identify language patterns focusing on work-related stressors (such as our HCW sample) or behavioral disturbances (eg, poor sleep hygiene), and then offer personalized triage and resource recommendations. Similar work has been conducted in the context of crisis counseling platforms to analyze patients’ messages for suicidal ideation [65,66]. Offering mental health resources at scale through automated recommendations could help HCWs overcome barriers to treatment access including stigma and unpredictable work hours. Given the high-stress nature of the health care profession, there is vast potential for designing automated systems that can proactively evaluate individual needs and provide personalized resources for preventive care.


MM’s research was supported by the National Center for Advancing Translational Sciences and the National Institutes of Health (grants 2KL2TR001446-06A1 and 1K23MH134068-01), Talkspace, and by the American Foundation for Suicide Prevention (grant PRG-0-104-19). ET is supported by a Microsoft Research PhD Fellowship and a Digital Life Initiative Doctoral Fellowship. TDH’s research was supported by National Institutes of Health (awards R44MH124334 and R01MH125179-01). TKC is a cofounder and equity holder of HealthRhythms, Inc, is coemployed by UnitedHealth Group, and has received grants from Click Therapeutics related to digital therapeutics outside the submitted work. NMS reports research support from the Department of Defense, the Patient-Centered Outcomes Research Institute, and the National Institutes of Health; in addition, she reports consulting for Axovant Sciences, Springworks, Praxis Therapeutics, Aptinyx, Genomind, Wolters Kluwer (royalty), and spousal equity in G1 Therapeutics. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Data Availability

Deidentified patient data may be made available upon completion of a data use agreement and data security review with Talkspace. Analytic code describing natural language processing methods and algorithms is available in Multimedia Appendices 2 and 3.

Authors' Contributions

All authors contributed to the study concept and design. MM, TKC, and NMS supervised the study. MM and TDH acquired the data. ET and MM analyzed and interpreted the data, and take responsibility for the integrity of the data and the accuracy of the data analyses. TDH provided administrative, technical, and material support. All authors contributed to the drafting of the paper and its critical revision for important intellectual content.

Conflicts of Interest

TDH is an employee of the platform that provided the data. Talkspace had no role in the analysis, interpretation of the data, or decision to submit the paper for publication. TKC is a cofounder and equity holder of HealthRhythms, Inc.

Multimedia Appendix 1

Study sample flow chart and details on structural topic modeling.

PDF File (Adobe PDF File), 1405 KB

Multimedia Appendix 2

Analytic code: topic modeling.

PDF File (Adobe PDF File), 1756 KB

Multimedia Appendix 3

Analytic code: job identification algorithm.

PDF File (Adobe PDF File), 1217 KB

  1. WHO coronavirus COVID-19 dashboard. World Health Organization. 2021. URL: [accessed 2022-06-21]
  2. Grimm CA. Hospitals reported that the COVID-19 pandemic has significantly strained health care delivery. Office of Inspector General. 2021. URL: [accessed 2023-09-19]
  3. Hospital utilization. HHS Protect Public Data Hub. 2020. URL: [accessed 2021-10-14]
  4. Lai X, Wang M, Qin C, Tan L, Ran L, Chen D, et al. Coronavirus disease 2019 (COVID-2019) infection among health care workers and implications for prevention measures in a tertiary hospital in Wuhan, China. JAMA Netw Open. 2020;3(5):e209666. [FREE Full text] [CrossRef] [Medline]
  5. Luo M, Guo L, Yu M, Jiang W, Wang H. The psychological and mental impact of coronavirus disease 2019 (COVID-19) on medical staff and general public - a systematic review and meta-analysis. Psychiatry Res. 2020;291:113190. [FREE Full text] [CrossRef] [Medline]
  6. Pappa S, Ntella V, Giannakas T, Giannakoulis VG, Papoutsi E, Katsaounou P. Prevalence of depression, anxiety, and insomnia among healthcare workers during the COVID-19 pandemic: a systematic review and meta-analysis. Brain Behav Immun. 2020;88:901-907. [FREE Full text] [CrossRef] [Medline]
  7. Raudenská J, Steinerová V, Javůrková A, Urits I, Kaye AD, Viswanath O, et al. Occupational burnout syndrome and post-traumatic stress among healthcare professionals during the novel coronavirus disease 2019 (COVID-19) pandemic. Best Pract Res Clin Anaesthesiol. 2020;34(3):553-560. [FREE Full text] [CrossRef] [Medline]
  8. Sterling MR, Tseng E, Poon A, Cho J, Avgar AC, Kern LM, et al. Experiences of home health care workers in New York City during the coronavirus disease 2019 pandemic: a qualitative analysis. JAMA Intern Med. 2020;180(11):1453-1459. [FREE Full text] [CrossRef] [Medline]
  9. de Pablo GS, Vaquerizo-Serrano J, Catalan A, Arango C, Moreno C, Ferre F, et al. Impact of coronavirus syndromes on physical and mental health of health care workers: systematic review and meta-analysis. J Affect Disord. 2020;275:48-57. [FREE Full text] [CrossRef] [Medline]
  10. Moazzami B, Razavi-Khorasani N, Moghadam AD, Farokhi E, Rezaei N. COVID-19 and telemedicine: immediate action required for maintaining healthcare providers well-being. J Clin Virol. 2020;126:104345. [FREE Full text] [CrossRef] [Medline]
  11. Patel RS, Bachu R, Adikey A, Malik M, Shah M. Factors related to physician burnout and its consequences: a review. Behav Sci (Basel). 2018;8(11):98. [FREE Full text] [CrossRef] [Medline]
  12. Wallace JE, Lemaire JB, Ghali WA. Physician wellness: a missing quality indicator. Lancet. 2009;374(9702):1714-1721. [FREE Full text] [CrossRef] [Medline]
  13. Mata DA, Ramos MA, Bansal N, Khan R, Guille C, Di Angelantonio E, et al. Prevalence of depression and depressive symptoms among resident physicians: a systematic review and meta-analysis. JAMA. 2015;314(22):2373-2383. [FREE Full text] [CrossRef] [Medline]
  14. Dutheil F, Aubert C, Pereira B, Dambrun M, Moustafa F, Mermillod M, et al. Suicide among physicians and health-care workers: a systematic review and meta-analysis. PLoS One. 2019;14(12):e0226361. [FREE Full text] [CrossRef] [Medline]
  15. Brand SL, Coon JT, Fleming LE, Carroll L, Bethel A, Wyatt K. Whole-system approaches to improving the health and wellbeing of healthcare workers: a systematic review. PLoS One. 2017;12(12):e0188418. [FREE Full text] [CrossRef] [Medline]
  16. Zaçe D, Hoxhaj I, Orfino A, Viteritti AM, Janiri L, Di Pietro ML. Interventions to address mental health issues in healthcare workers during infectious disease outbreaks: a systematic review. J Psychiatr Res. 2021;136:319-333. [FREE Full text] [CrossRef] [Medline]
  17. Fahrenkopf AM, Sectish TC, Barger LK, Sharek PJ, Lewin D, Chiang VW, et al. Rates of medication errors among depressed and burnt out residents: prospective cohort study. BMJ. 2008;336(7642):488-491. [FREE Full text] [CrossRef] [Medline]
  18. Hall LH, Johnson J, Watt I, Tsipa A, O'Connor DB. Healthcare staff wellbeing, burnout, and patient safety: a systematic review. PLoS One. 2016;11(7):e0159015. [FREE Full text] [CrossRef] [Medline]
  19. Weaver MD, Vetter C, Rajaratnam SMW, O'Brien CS, Qadri S, Benca RM, et al. Sleep disorders, depression and anxiety are associated with adverse safety outcomes in healthcare workers: a prospective cohort study. J Sleep Res. 2018;27(6):e12722. [FREE Full text] [CrossRef] [Medline]
  20. West CP, Tan AD, Habermann TM, Sloan JA, Shanafelt TD. Association of resident fatigue and distress with perceived medical errors. JAMA. 2009;302(12):1294-1300. [FREE Full text] [CrossRef] [Medline]
  21. Center C, Davis M, Detre T, Ford DE, Hansbrough W, Hendin H, et al. Confronting depression and suicide in physicians: a consensus statement. JAMA. 2003;289(23):3161-3166. [FREE Full text] [CrossRef] [Medline]
  22. Miles SH. A piece of my mind. a challenge to licensing boards: the stigma of mental illness. JAMA. 1998;280(10):865. [FREE Full text] [CrossRef] [Medline]
  23. Wimsatt LA, Schwenk TL, Sen A. Predictors of depression stigma in medical students: potential targets for prevention and education. Am J Prev Med. 2015;49(5):703-714. [FREE Full text] [CrossRef] [Medline]
  24. Torous J, Bucci S, Bell IH, Kessing LV, Faurholt-Jepsen M, Whelan P, et al. The growing field of digital psychiatry: current evidence and the future of apps, social media, chatbots, and virtual reality. World Psychiatry. 2021;20(3):318-335. [FREE Full text] [CrossRef] [Medline]
  25. Cho CH, Lee T, Kim MG, In HP, Kim L, Lee HJ. Mood prediction of patients with mood disorders by machine learning using passive digital phenotypes based on the circadian rhythm: prospective observational cohort study. J Med Internet Res. 2019;21(4):e11029. [FREE Full text] [CrossRef] [Medline]
  26. Ren B, Xia CH, Gehrman P, Barnett I, Satterthwaite T. Measuring daily activity rhythms in young adults at risk of affective instability using passively collected smartphone data: observational study. JMIR Form Res. 2022;6(9):e33890. [FREE Full text] [CrossRef] [Medline]
  27. Adler DA, Tseng VWS, Qi G, Scarpa J, Sen S, Choudhury T. Identifying mobile sensing indicators of stress-resilience. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2021;5(2):1-32. [FREE Full text] [CrossRef] [Medline]
  28. Malgaroli M, Hull TD, Zech JM, Althoff T. Natural language processing for mental health interventions: a systematic review and research framework. Transl Psychiatry. Oct 06, 2023;13(1):309. [FREE Full text] [CrossRef] [Medline]
  29. Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):77-84. [FREE Full text] [CrossRef]
  30. Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet Allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl. 2019;78(11):15169-15211. [CrossRef]
  31. Gong Y, Poellabauer C. Topic modeling based multi-modal depression detection. Presented at: MM '17: ACM Multimedia Conference; 23 October 2017, 2017;69-76; Mountain View California USA. [CrossRef]
  32. Shen JH, Rudzicz F. Detecting anxiety through reddit. Presented at: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology — From Linguistic Signal to Clinical Reality; August 2017, 2017;58-65; Vancouver, BC. [CrossRef]
  33. Low DM, Rumker L, Talkar T, Torous J, Cecchi G, Ghosh SS. Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during COVID-19: observational study. J Med Internet Res. 2020;22(10):e22635. [FREE Full text] [CrossRef] [Medline]
  34. Insel TR. Digital phenotyping: a global tool for psychiatry. World Psychiatry. 2018;17(3):276-277. [FREE Full text] [CrossRef] [Medline]
  35. Talkspace. URL: [accessed 2023-09-22]
  36. Posner K, Brown GK, Stanley B, Brent DA, Yershova KV, Oquendo MA, et al. The Columbia-suicide severity rating scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry. 2011;168(12):1266-1277. [FREE Full text] [CrossRef] [Medline]
  37. R Core Team. R: A Language and Environment for Statistical Computing. URL: [accessed 2023-09-19]
  38. Roberts ME, Stewart BM, Tingley D. Stm: an R package for structural topic models. J Stat Softw. 2019;91:1-40. [FREE Full text] [CrossRef]
  39. Airoldi EM, Bischof JM. Improving and evaluating topic models and other models of text. J Am Stat Assoc. 2016;111(516):1381-1403. [FREE Full text] [CrossRef]
  40. Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei DM. Reading tea leaves: how humans interpret topic models. Presented at: 22nd International Conference on Neural Information Processing Systems; December 7, 2009, 2009; Vancouver, BC.
  41. Grimmer; Stewart. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Polit Anal. 2013;21(3):267-297. [FREE Full text]
  42. NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. Sociedad Española para el Procesamiento del Lenguaje Natural. URL: [accessed 2023-10-04]
  43. Mimno D, Wallach H, Talley E, Leenders M, McCallum A. Optimizing semantic coherence in topic models. Presented at: 2011 Conference on Empirical Methods in Natural Language Processing; July 2011, 2011; Edinburgh, Scotland, UK. URL:
  44. Murgado AM, Portillo AP, Úbeda PL, Martin M, Ureña-López A. Identifying professions and occupations in health-related social media using natural language processing. Presented at: Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task; June 2021, 2021; Mexico City, Mexico. URL:
  45. Roberts M, Stewart B, Tingley D, Airoldi E. The structural topic model and applied social science. Neural Information Processing Society. URL: [accessed 2023-10-04]
  46. Schoene AM, Basinas I, van Tongeren M, Ananiadou S. A narrative literature review of natural language processing applied to the occupational exposome. Int J Environ Res Public Health. Jul 13, 2022;19(14):8544. [FREE Full text] [CrossRef] [Medline]
  47. Ho D, Imai K, King G, Stuart EA. Matchit: Nonparametric preprocessing for parametric causal inference. J Stat Softw. 2011;42(8):1-28. [FREE Full text] [CrossRef]
  48. Talkspace is donating free therapy to medical workers fighting COVID-19. Talkspace. URL: [accessed 2023-10-04]
  49. Thoemmes FJ, Kim ES. A systematic review of propensity score methods in the social sciences. Multivariate Behav Res. Feb 07, 2011;46(1):90-118. [CrossRef] [Medline]
  50. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-613. [FREE Full text] [CrossRef] [Medline]
  51. Spitzer RL, Kroenke K, Williams JBW, Löwe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med. 2006;166(10):1092-1097. [FREE Full text] [CrossRef] [Medline]
  52. Caspi A, Moffitt TE. All for one and one for all: mental disorders in one dimension. Am J Psychiatry. 2018;175(9):831-844. [FREE Full text] [CrossRef] [Medline]
  53. Krishnamoorthy Y, Nagarajan R, Saya GK, Menon V. Prevalence of psychological morbidities among general population, healthcare workers and COVID-19 patients amidst the COVID-19 pandemic: a systematic review and meta-analysis. Psychiatry Res. 2020;293:113382. [FREE Full text] [CrossRef] [Medline]
  54. Shanafelt TD, Boone S, Tan L, Dyrbye LN, Sotile W, Satele D, et al. Burnout and satisfaction with work-life balance among US physicians relative to the general US population. Arch Intern Med. 2012;172(18):1377-1385. [FREE Full text] [CrossRef] [Medline]
  55. Hull TD, Levine J, Bantilan N, Desai AN, Majumder MS. Analyzing digital evidence from a telemental health platform to assess complex psychological responses to the COVID-19 pandemic: content analysis of text messages. JMIR Form Res. 2021;5(2):e26190. [FREE Full text] [CrossRef] [Medline]
  56. Marvaldi M, Mallet J, Dubertret C, Moro MR, Guessoum SB. Anxiety, depression, trauma-related, and sleep disorders among healthcare workers during the COVID-19 pandemic: a systematic review and meta-analysis. Neurosci Biobehav Rev. 2021;126:252-264. [FREE Full text] [CrossRef] [Medline]
  57. Booker LA, Magee M, Rajaratnam SMW, Sletten TL, Howard ME. Individual vulnerability to insomnia, excessive sleepiness and shift work disorder amongst healthcare shift workers. a systematic review. Sleep Med Rev. 2018;41:220-233. [FREE Full text] [CrossRef] [Medline]
  58. Eldevik MF, Flo E, Moen BE, Pallesen S, Bjorvatn B. Insomnia, excessive sleepiness, excessive fatigue, anxiety, depression and shift work disorder in nurses having less than 11 hours in-between shifts. PLoS One. 2013;8(8):e70882. [FREE Full text] [CrossRef] [Medline]
  59. Bianchi R, Schonfeld IS, Laurent E. Burnout-depression overlap: a review. Clin Psychol Rev. 2015;36:28-41. [FREE Full text] [CrossRef] [Medline]
  60. Laughlin L, Anderson A, Martinez A, Gayfield A. 22 Million employed in health care fight against COVID-19. 2021. URL: [accessed 2023-06-09]
  61. Peinelt N, Nguyen D, Liakata M. tBERT: Topic models and BERT joining forces for semantic similarity detection. Presented at: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; July 2020, 2020;7047-7055; Online. [CrossRef]
  62. Sharma A, Lin IW, Miner AS, Atkins DC, Althoff T. Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nat Mach Intell. 2023;5(1):46-57. [CrossRef]
  63. Zhang J, Mullainathan S, Danescu-Niculescu-Mizil C. Quantifying the causal effects of conversational tendencies. Proc ACM Hum-Comput Interact. 2020;4(CSCW2):1-24. [FREE Full text] [CrossRef]
  64. Aung MH, Matthews M, Choudhury T. Sensing behavioral symptoms of mental health and delivering personalized interventions using mobile technologies. Depress Anxiety. 2017;34(7):603-609. [FREE Full text] [CrossRef] [Medline]
  65. Althoff T, Clark K, Leskovec J. Large-scale analysis of counseling conversations: an application of natural language processing to mental health. Trans Assoc Comput Linguist. 2016;4:463-476. [FREE Full text] [Medline]
  66. Bantilan N, Malgaroli M, Ray B, Hull TD. Just in time crisis response: suicide alert system for telemedicine psychotherapy settings. Psychother Res. Jun 19, 2020;31(3):289-299. [CrossRef] [Medline]

GAD-7: General Anxiety Disorder Scale-7
HCW: health care worker
ML: machine learning
NLP: natural language processing
NPI: National Provider Identifier
PHQ-9: Patient Health Questionnaire-9
STM: structural topic model
HIPAA: Health Insurance Portability and Accountability Act

Edited by H Liu; submitted 12.03.23; peer-reviewed by K Schultebraucks, E Korshakova; comments to author 03.06.23; revised version received 28.06.23; accepted 07.09.23; published 24.10.23.


©Matteo Malgaroli, Emily Tseng, Thomas D Hull, Emma Jennings, Tanzeem K Choudhury, Naomi M Simon. Originally published in JMIR AI (, 24.10.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.