A Mental Health Chatbot with Cognitive Skills for Personalised Behavioural Activation and Remote Health Monitoring

Affiliation.

  • 1 Centre for Data Analytics and Cognition, La Trobe University, Bundoora, VIC 3086, Australia.
  • PMID: 35632061
  • PMCID: PMC9148050
  • DOI: 10.3390/s22103653

Mental health issues are at the forefront of healthcare challenges facing contemporary human society. These issues are most prevalent among working-age people, impacting negatively on the individual, his/her family, workplace, community, and the economy. Conventional mental healthcare services, although highly effective, cannot be scaled up to address the increasing demand from affected individuals, as evidenced in the first two years of the COVID-19 pandemic. Conversational agents, or chatbots, are a recent technological innovation that has been successfully adapted for mental healthcare as a scalable platform of cross-platform smartphone applications that provides first-level support for such individuals. Despite this disposition, mental health chatbots in the extant literature and practice are limited in terms of the therapy provided and the level of personalisation. For instance, most chatbots extend Cognitive Behavioural Therapy (CBT) into predefined conversational pathways that are generic and ineffective in recurrent use. In this paper, we postulate that Behavioural Activation (BA) therapy and Artificial Intelligence (AI) are more effectively materialised in a chatbot setting to provide recurrent emotional support, personalised assistance, and remote mental health monitoring. We present the design and development of our BA-based AI chatbot, followed by its participatory evaluation in a pilot study setting that confirmed its effectiveness in providing support for individuals with mental health issues.

Keywords: artificial intelligence; behavioural activation; chatbot; conversational agents; emotional support; mental health monitoring; mental health support; personalised assistance.

  • Artificial Intelligence
  • Mental Health
  • Mobile Applications*
  • Pilot Projects

Grants and funding

Help | Advanced Search

Computer Science > Computation and Language

Title: mental health assessment for the chatbots.

Abstract: Previous researches on dialogue system assessment usually focus on the quality evaluation (e.g. fluency, relevance, etc) of responses generated by the chatbots, which are local and technical metrics. For a chatbot which responds to millions of online users including minors, we argue that it should have a healthy mental tendency in order to avoid the negative psychological impact on them. In this paper, we establish several mental health assessment dimensions for chatbots (depression, anxiety, alcohol addiction, empathy) and introduce the questionnaire-based mental health assessment methods. We conduct assessments on some well-known open-domain chatbots and find that there are severe mental health issues for all these chatbots. We consider that it is due to the neglect of the mental health risks during the dataset building and the model training procedures. We expect to attract researchers' attention to the serious mental health problems of chatbots and improve the chatbots' ability in positive emotional interaction.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Advertisement

Advertisement

Chatbots to Support Mental Wellbeing of People Living in Rural Areas: Can User Groups Contribute to Co-design?

  • Open access
  • Published: 20 September 2021
  • Volume 6 , pages 652–665, ( 2021 )

Cite this article

You have full access to this open access article

mental health chatbot research paper

  • C. Potts   ORCID: orcid.org/0000-0002-5621-1611 1 ,
  • E. Ennis 2 ,
  • R. B. Bond 1 ,
  • M. D. Mulvenna 1 ,
  • M. F. McTear 1 ,
  • K. Boyd 3 ,
  • T. Broderick 4 ,
  • M. Malcolm 5 ,
  • L. Kuosmanen 6 ,
  • H. Nieminen 6 ,
  • A. K. Vartiainen 6 ,
  • C. Kostenius 7 ,
  • B. Cahill 8 ,
  • A. Vakaloudis 8 ,
  • G. McConvey 9 &
  • S. O’Neill 2  

8166 Accesses

16 Citations

19 Altmetric

Explore all metrics

A Correction to this article was published on 02 October 2021

This article has been updated

Digital technologies such as chatbots can be used in the field of mental health. In particular, chatbots can be used to support citizens living in sparsely populated areas who face problems such as poor access to mental health services, lack of 24/7 support, barriers to engagement, lack of age appropriate support and reductions in health budgets. The aim of this study was to establish if user groups can design content for a chatbot to support the mental wellbeing of individuals in rural areas. University students and staff, mental health professionals and mental health service users ( N  = 78 total) were recruited to workshops across Northern Ireland, Ireland, Scotland, Finland and Sweden. The findings revealed that participants wanted a positive chatbot that was able to listen, support, inform and build a rapport with users. Gamification could be used within the chatbot to increase user engagement and retention. Content within the chatbot could include validated mental health scales and appropriate response triggers, such as signposting to external resources should the user disclose potentially harmful information or suicidal intent. Overall, the workshop participants identified user needs which can be transformed into chatbot requirements. Responsible design of mental healthcare chatbots should consider what users want or need, but also what chatbot features artificial intelligence can competently facilitate and which features mental health professionals would endorse.

Similar content being viewed by others

mental health chatbot research paper

Assessing the Usability of a Chatbot for Mental Health Care

mental health chatbot research paper

Co-creating Requirements and Assessing End-User Acceptability of a Voice-Based Chatbot to Support Mental Health: A Thematic Analysis of a Living Lab Workshop

mental health chatbot research paper

Towards Chatbots to Support Bibliotherapy Preparation and Delivery

Avoid common mistakes on your manuscript.

Introduction

An emerging area of importance is the investigation of how digital technology can support rural mental health care (Benavides-Vaello et al., 2013 ). Chatbots, also known as conversational user interfaces, are a type of technology that can take diverse roles in supporting mental health. They are becoming increasingly popular as digital mental health and wellbeing interventions, with initial evaluations of efficacy showing promise (Hoermann et al., 2017 ; Provoost et al., 2017 ; Vaidyam et al., 2019 ). Chatbots may be geared towards a variety of outcomes such as medication adherence, treatment compliance, aftercare support, delivery of appointment reminders, user empowerment and improvement in the self-management of mental health and wellbeing through monitoring mood or symptom change (Hoermann et al., 2017 ). They can also be used to promote help-seeking (Hoermann et al., 2017 ). However, chatbots bring other potential benefits to supporting mental wellbeing which are widely recognised by practitioners and clients (Benavides-Vaello et al., 2013 ; Palanica et al., 2019 ; Provoost et al., 2017 ; Vaidyam et al., 2019 ). In addition to supporting those with mental ill health, digital technologies are also considered to have potential for preventing mental health problems and for improving the overall mental health of the population (Calvo et al., 2018 ). This is particularly relevant for those rural citizens living in social isolation who face compounded problems such as poor access to mental health services, no 24/7 support, barriers to engagement especially with older men, no age appropriate support, and reductions in health budgets (Benavides-Vaello et al., 2013 ). All of these factors further emphasize the need for resilience building services to avoid crisis interventions (Benavides-Vaello et al., 2013 ).

The evidence base is in the early stages and also product development requires improvement (Hoermann et al., 2017 ; Provoost et al., 2017 ; Vaidyam et al., 2019 ). Further research is necessary to determine how and if a digital technology intervention can be best used in the mental health sector and what developments or limitations need to be incorporated to make the intervention acceptable, effective and financially viable (Hoermann et al., 2017 ). Calvo et al. point out that the strength of digital technology may lie in the ability to provide an individual or personalised intervention and that traditional scales may not be the best way of measuring outcomes for digital interventions (Calvo et al., 2018 ). Queries include whether chatbots can move beyond interactions that are merely factually informative, and be able to incorporate emotional connotations either being overlooked or not understood (Morris et al., 2018 ). Conversational agents are limited in terms of their language comprehension abilities and emotional understanding which is a major source of user dissatisfaction (Morris et al., 2018 ). However, digital technologies are being used to support mental health with chatbots such as WoeBot and Wysa providing psychological assessment or the provision of psychoeducational materials (Fitzpatrick et al., 2017 ; Inkster et al., 2018 ). ‘Shim’ is another mental health chatbot previously designed for a non-clinical population to deliver cognitive behavioural therapy and strategies from positive psychology (Ly et al., 2017 ). There is an opportunity to increase access to a more meaningful style of symptom monitoring via a virtual “therapist” or “concerned friend” in the form of a chatbot. This means that such a technology would be natural, usable, and intuitive since it simulates everyday human-to-human conversation allowing the technology to be adopted by ‘non-digital’ natives. Further research is necessary to try to equip chatbots with an understanding of emotion-based conversation and appropriate empathic responses, to adjust their personality and mimic emotions (Morris et al., 2018 ). The question is whether or not machines will always be perceived as inferior to humans when it comes to emotions (Morris et al., 2018 ).

While many popular mental health chatbots exist, few studies have reported on how user groups can contribute to co-design as it is important to consider the user needs when designing content and features for this application. A few recent studies have involved young people in the design process to co-develop mental health and wellbeing chatbots targeted at under 18 s (Audrey et al., 2021 ; Grové, 2021 ). Another study by Easton et al. reported on co-designing content for a health chatbot by involving patients with lived experiences (Easton et al., 2019 ). However, to the best of our knowledge no study has reported on the involvement of stakeholders, which includes the general population, mental health professionals and service users in co-designing content for a mental health chatbot.

This study is part of a larger project called ‘ChatPal’, in which the objectives include the development and testing of a chatbot to support and promote mental wellbeing in rural areas across Europe. The overall aim of this study is to carry out workshops to establish if user groups can help to design a chatbot to promote good mental wellbeing in the general population, particularly for those living in sparsely populated areas. The objectives of the study are to:

Gather general mental health wellbeing coping strategies recommended by workshop attendees

Gather and contrast views regarding the use of different scales for monitoring mental health, wellbeing and mood

Explore the range of personalities that chatbots can imbue and co-create chatbot personas preferred by the workshop attendees

Elicit the kind of questions asked by workers to clients in a mental health service (e.g. during a formal interaction) and enlist which questions would be suitable for a chatbot

Co-create conversational scripts and user stories to inform dialogue and content design for a chatbot.

Needs analysis workshops were carried out to gather the views of general population, mental health professionals and those with mental ill health. Workshops were based on the living labs methodology, with the idea that the design is not only user-centered but is also carried out by users (Dell’Era & Landoni, 2014 ). The living labs methodology offers advantages over other methods as it enables co-creation and engagement with service users and service providers primarily in the ideation and conceptualisation phases (Bond et al., 2015 ; Mulvenna & Martin, 2013 ); both stages of co-creation, focusing on the design of chatbot.

Recruitment

Recruitment of participants varied based on region. In Northern Ireland, a recruitment email and participant information sheet were sent to students at Ulster University, inviting eligible individuals to attend. A similar approach was used at Action Mental Health (AMH) in Northern Ireland, with a recruitment email and participant information sheet sent to clients and additional recruitment posters put up on AMH premises. In Finland, university students, staff and mental health professionals were emailed invitations to attend the workshops. A snowballing technique, where study subjects recruit other acquaintances to participate, was used in Finland to recruit additional participants. In Scotland, mental healthcare professionals and service users were contacted via email and invited to attend. In Ireland, Cork University of Technology staff and students were contacted via email and invited to attend. In Sweden, welfare professionals working with young people were recruited by phone and e-mail.

For university staff and the general student population in Northern Ireland, Ireland and Scotland, the inclusion criteria was anyone over the age of 18; living in a rural area and with no history of a mental health diagnosis and no previous history of suicidal thoughts or behaviours in the past year. In Sweden, the inclusion criteria for welfare professionals included those working with supporting, aiding and/or treating young person’s mental wellbeing in the region of Norrbotten. In Finland, the inclusion criteria for university staff and students included anyone over the age of 18 and living in a rural area and for healthcare professionals included those over the age of 18; working in a rural region in the area of mental health and wellbeing. The requirements for mental health service users in Northern Ireland and Scotland included those who were users of the mental health/ mental wellbeing service at the time of the workshop; those with a history of mild-moderate anxiety and/or depression; and no suicidal thoughts or behaviours in the past year.

Due to the coronavirus pandemic, the workshops in Finland and Sweden took place virtually. All other workshops were face-to-face and took place prior to the pandemic.

Workshop Details and Analysis

The schedule for the workshop involved a review of current mental health services, coping strategies, mental wellbeing scales, user story requirements, chatbot demo and persona development. The template for the workshops was designed by Ulster University and was structured as follows. At the beginning of the workshop, participants were provided with a single questionnaire to collect demographics and levels of digital health literacies. Participants were then split into small groups, with one rapporteur at each table to take notes and qualitative data. Each table was assigned a series of tasks or topics to discuss for approximately 15 minutes. A total of 10 topics/ tasks were discussed at each table.

Mental wellbeing needs of people living in rural and sparsely populated areas e.g. what affects quality of life for people with mental health difficulties? What are the things that make life good/bad for you?

Pros and cons of current mental health services they may have used or know about. How have mental health services or practitioners helped or hindered recovery? This was asked on a hypothetical basis for students and the general population with no mental health problems.

Everyday coping strategies that participants believe support emotional resilience, higher moods and better overall mental wellbeing. Discussion around medications, side effects, therapeutic benefits and leisure activities and other coping strategies.

Analysis of short mental health survey scales regarding their fitness for purpose in regularly monitoring wellbeing. Participants were presented with scales and discussed their utility for regularly monitoring wellbeing. The scales, which included Clinical Outcomes Routine Evaluation 10 (CORE-10) (Barkham et al., 2013 ), Patient Health Questionnaire-9 (PHQ-9) (Kroenke et al., 2001 ), and Warwick Edinburgh Mental Wellbeing Scale (WEMWBS) (Tennant et al., 2007 ) were chosen as they are commonly administered and could potentially be used by the chatbot. CORE-10 was validated in primary care patients for screening and review. It is easy to administer and is recommended for repeated use across therapy sessions, having a broad coverage, including depression and anxiety but also risk to self and general, social, and close relationship problems (Barkham et al., 2013 ). The PHQ-9 is a reliable measure of depression severity and response to treatment and it has been validated with a large sample of patients from primary care and obstetrics-gynecology clinics (Kroenke et al., 2001 ). WEMWBS was developed to monitor wellbeing, with a focus on positive aspects of mental health (Tennant et al., 2007 ). It has been validated for use in different locations, languages and cultures, and across many different settings for example in health services, workplaces and schools (Tennant et al., 2007 ). Discussions were around what is important in relation to the experience of mental illness, and what should be included in the scales.

Demonstration of chatbot technologies and a mental health chatbot. Videos were shown to participants including demonstrations of Amazon Alexa and Google Assistant as well as an overview video of WoeBot from the creators Youtube channel: ‘Meet WoeBot’. Participants then discussed the positive and negative aspects of chatbot technologies.

Participants provided with hypothetical personalities that a chatbot can imbue and tasked to discuss these whilst providing their preferred persona of a chatbot. Two example personas (Appendix I ) were shared with participants. This allowed for discussions around what characteristics they would like within a chatbot and what role they feel the chatbot should take in terms of gender, personality traits etc. The participants were provided with a blank persona template (Appendix I ) to help with designing the chatbot personality.

Consideration of the kind of questions asked by workers to clients in a mental health service (e.g. during a formal interaction) and questions would be suitable for a chatbot. Discussions focused around what would be important in conversations that a client and therapist might have.

Co-designing chatbot dialogue. Participants discussed how they might converse with a chatbot in general and whether or not they thought that it might be useful in monitoring their wellbeing. This was also discussed in relation to someone who was feeling mentally unwell.

Mood monitoring. Participants were asked how they would like a chatbot to monitor their moods. For example, using questions or emojis or allowing the chatbot to determine mood by analysing user text responses (sentiment analysis).

 Defining chatbot requirements or features. This was done by collecting ‘user stories’ to inform the design of a chatbot. User stories are simply expressed descriptions of a chatbot feature as told from the perspective of a user or related stakeholder of the chatbot service. In the workshops, they were written as short sentences in the form “As a < type of user > , I want < some goal > because < some reason > .” These were written on post-it cards which were collected and shared on white boards for discussion. This was to enable the user-centred co-creation process to thrive.

This template was shared with partners in Ireland, Scotland, Finland, and Sweden so all workshops followed a similar structure, albeit some workshops took place virtually because of the COVID-19 pandemic restrictions on public meetings. Information gathered at each workshop was collated for the overall needs analysis results. Thematic analysis of user stories was conducted using an inductive approach to identify themes for chatbot design.

Participants

A total of 78 participants were recruited to workshops across several European regions, including Northern Ireland ( N  = 21), Scotland ( N  = 14), Ireland ( N  = 24), Sweden ( N  = 5) and Finland ( N  = 14). Participants of the workshops included mental health service users ( N  = 11), university staff and students ( N  = 40) and mental health care professionals ( N  = 27). Participant demographic information was collected at workshops in Northern Ireland, Finland and Sweden (Table 1 ). This information was not available for workshop attendees in Scotland and Ireland.

Coping Strategies

Coping strategies were identified to support emotional resilience, positive mood and better overall mental wellbeing. Everyday coping strategies discussed in the workshops fell under the categories of spirituality, leisure, and others (Table 2 ).

Mental Wellbeing Scales

Common mental health and wellbeing scales including CORE-10 (Barkham et al., 2013 ), PHQ-9 (Kroenke et al., 2001 ) and WEMWBS (Tennant et al., 2007 ) were shown to participants to identify positive and negative aspects and missing items which could help when it comes to choosing which scales to use in the chatbot. Overall, positive aspects that were discussed included that scales were short and to the point; useful to show changes over time if administered regularly; important for getting a general overview; useful starting point; able to help identify problems; and easy to understand. Negative aspects included that perhaps there were not enough questions to assess wellbeing; scales may be inaccurate or lead to a ‘false diagnosis’; certain questions could be triggers for person; regular use could affect answers; not personalised or too impersonal. Participants also felt that there were missing aspects to the scales presented, such as the lack of positive questions and questions specific to individual needs; options for multiple choice questions and tick box answers; lack of questions on emotions; missing questions around suicidal intentions.

Chatbot Personas and Interactions

Participants were presented with video demonstrations on chatbot technology and shown examples of current popular mental health chatbots. This facilitated a discussion on the strengths and weaknesses of chatbot technologies (Table 3 ). Accessibility and functionality were identified as both positive and negative aspects. Availability, universality, functionality, and anonymity were discussed as benefits of a chatbot service (Table 3 ). Additional quotes from participants on the strengths of chatbots include:

Some people might open up to it more because it’s not human and they don’t feel judged. You can be more honest with it. This might be good for people who could do with face to face human support but aren’t quite ready for it—this might be the first step to speak to the chatbot. It could help people who are working as well—because you can access quickly and easily—even for mental health workers! It’s interesting to think about workers because they can’t access services that are only open 9 to 5. This could be a way of complementing those services. I suppose it would be easiest to access on the phone, its discrete, you can do it anywhere you can take it with you. I can see a way of using it with our older service users… I can imagine a way of just… using it to talk—a way of having a conversation; just to talk to someone… I would have to have a lot more understanding of the mechanics of it and the type of conversation it might then be having with my older service users before I would recommend it or signpost them to it. You are gauging whether it’s right for someone… If it’s around social isolation—the man I saw last week is [over 90], lives alone, and doesn’t want to leave the house so just in terms of giving him some companionship or giving him something to talk about…

Negative attributes identified by participants included robotic intelligence and inflexibility, some also felt they are impersonal (Table 3 ). Additional quotes from participants on the weaknesses of chatbots include:

I wouldn’t talk to the chatbot about things if I was having a very bad mental health day, I need a person. I would talk to it if I was having an ok day—it would depend how wobbly you are, how ok your day is. It concerned me, what if someone is thinking about suicide or self-harm? What can this chatbot do to help? This is a very different situation to someone just saying ‘I fancy a chat about movies because I’m a bit lonely’. How does [the chatbot] pick up on suicidal ideation? At what point does it pick up on certain things? Can it tune in to if things aren’t right with a person? That worries me a bit.

Each table was given hypothetical personalities that a chatbot can imbue and tasked with discussing the personas. Participants were asked to provide their preferred chatbot traits and qualities. The collated responses of participants were used to develop an overall chatbot persona with desired age, gender, personality, and character traits (Fig.  1 ). Overall, participants preferred the chatbot to be female or general neutral, aged around 30 years old (Fig.  1 ). The desired personality was a conversational agent that had a positive outlook, was widely accessible for different groups of people, and provided support to the user. Participants were keen to have a chatbot that was reliable, provided suitable answers and useful information but also one that also knows when to listen and prompt users. Participants also felt it was important to build a rapport with the chatbot so the interactions felt personal and that the chatbot could understand and be aware of the context of the conversation.

figure 1

Desirable chatbot persona based on collated participant feedback

The types and examples of initial and follow-up interactions that individuals would like to have with a chatbot were discussed (Table 4 ).

User Stories

User stories were collected from participants, which are simply descriptions of a chatbot feature or requirement. These were collected in the form of short sentences, “As a < user type > , I want < some goal > because < some reason > . Based on the user stories, key themes were identified (Table 5 ) which can inform chatbot design by defining requirements or writing dialogues to fit these themes.

Principal Findings

The aim of this work is to assess if a chatbot for mental wellbeing could be co-designed with user groups through workshops across several European countries. This study benefited from the inclusion of participants who were engaged in services for their mental illness as well as those who self-declared that they were not experiencing a mental illness. Both groups are important to consider as the former have experience of face-to-face services, whereas the latter may be potential users of the future. User needs were identified at the workshops, which included different coping strategies for promoting overall good mental wellbeing, which could be provided as suggestions to the user. Alternatively, the suggested coping strategies could be used as a basis for developing content. There was agreement around the inclusion of validated mental health scales within the chatbot. Participants noted things that they felt are missing from the scales, such as a lack of positive questions, but these missing aspects or questions could be presented to the user as part of the conversation. Collectively, a chatbot that personified a female or gender-neutral character in their thirties is preferred. Participants felt it is important that the chatbot has generally positive personality traits as well as the ability to understand and connect with the user. The initial conversations with the chatbot could seek to build a rapport with the user to establish trust. Participants liked the idea of the chatbot regularly checking-in with the user, asking questions about emotional state or mood and tracking this over time. For repeated use of the chatbot, participants felt that reflecting on previous conversations would be beneficial. Many thought that the chatbot should provide a space to share thoughts and feelings but also provide information. This could be mental health education or simply sharing helpful tips or tools that could be used in everyday life. User retention and engagement with digital technologies can be challenging, however, participants suggested including gamification within the app which could combat this problem. Finally, given the risk that conversational agents may not respond appropriately to potential crisis situations around mental health or suicidal intent, it was suggested that the chatbot should have keyword triggers that signpost to external resources.

Link with Previous Work

Chatbots were discussed as a place to simply share feelings. This would align with the concept of expressive writing around negative emotional experiences, which has been shown to be potentially important in maintaining mental health (Sabo Mordechay et al., 2019 ). Practicing gratitude can improve overall positive behaviour and emotions (Armenta et al., 2017 ) and gratitude diaries have suggested benefits in several contexts including the management of suicidal crises (Ducasse et al., 2019 ), post discharge psychiatric inpatients (Suhr et al., 2017 ) and occupational stress management in health care professionals (Cheng et al., 2015 ). Chatbots may provide a useful platform for such interventions, and the view would be to build in means of allowing the individual to self-monitor their wellbeing.

In individuals who are mentally unwell, there is often what is referred to as ‘low perceived need’ (Mojtabai et al., 2011 ), which means the individual typically does not recognise the intensity of their own illness. If chatbots were able to monitor wellbeing in terms such as visual analogue scales or something as simple as saying to the individual that their scores are intensifying, this may assist in promoting self-awareness and early intervention. Xu et al. (Xu et al., 2018 ) provided a review of current interventions to seek help for mental health problems and concluded that some interventions show efficacy in promoting formal help seeking, but the evidence for changes in informal help seeking is limited. Given the difficulties associated with mental health care services, for example waiting lists and the distance that people may have to travel in rural areas, digital technologies could play a role in both providing help and promoting help-seeking, particularly in an informal context. Availability, anonymity, and accessibility were noted as potential advantages to chatbots. However, potential issues such as empathy, being impersonal or rigid and internet access were noted for consideration. These results further strengthen the need for government investment in the provision of broadband, particularly now in view of Covid-19, as it could facilitate equal access to mental health care support. Chatbots can provide an anonymous platform to discuss mental health, which could be helpful for those who struggle to open up. For example, a recent study reported that soldiers returning from combat redeployment were two to four times more likely to report mental ill health symptoms on an anonymous survey compared to a non-anonymous survey (Warner et al., 2011 ). In regards to empathy, a recent study looked at the effectiveness of an empathic chatbot on mood following experiences of social exclusion (Gennaro et al., 2020 ). The authors found that participants using the chatbot which would respond empathetically had a more positive mood than those using a chatbot where responses were simply just acknowledged (Gennaro et al., 2020 ). Further research is needed in this area, as the challenge of being able to express empathy within chatbots is well recognised.

Chatbot personality is an important design consideration, and the desired user persona for chatbots may depend on the domain. In a recent scoping review on mental health chatbots, 3 studies that Abd-Alrazaq et al. looked at found that users would like to personalise their own chatbot by choosing the gender and appearance (Abd-Alrazaq et al., 2021 ). Another recent paper reported that young people wanted a chatbot with a gender neutral name that was inspiring, charismatic, fun, friendly and had an empathic and humorous personality (Grové, 2021 ). In our study, desirable features included a human persona who was female or gender neutral, aged approximately mid-thirties with an extroverted and supportive personality. Individuals wanted a platform to share thoughts in which the chatbot just listened or understood, which isn’t surprising as individuals in distress often do not share their deepest thoughts with close family members or close friends. Individuals in suicidal crises often report feelings such as perceived burdensomeness and thwarted belongingness (O’Connor & Nock, 2014 ). In these states, they typically do not feel a connection to their usual support networks and perceive themselves as a source of burden, which hinders them from disclosing their mental distress. Indeed, this issue around disclosure of mental illness and mental distress is particularly prevalent among mental health professionals themselves (Tay et al., 2018 ).

The scales used in current clinical settings were described as capturing many critical elements of the experience of mental ill health, but many other elements were noted as missing. Potentially useful additions included the ability to individualise the interaction, to have a diary and to specifically ask about suicidal intent. Initially many feared that the discussion of suicidal ideation might encourage such behaviours, but the research consistently shows that it is important to ask this question in an open way with ‘Question, Persuade and Refer’ being a well acknowledged approach (Aldrich et al., 2018 ).

Participants identified several coping strategies which they felt could play a role in supporting emotional resilience. Chatbots may play a role in promoting the actual use of these coping strategies, many of which have an evidence base and are supported by leading bodies such as the World health Organisation (WHO) (World Health Organisation, 2019 ) and the National Institute of Clinical Excellence (NICE) (National Institute of Clinical Excellence, 2019 ). In their times of crisis, males in particular typically show maladaptive coping strategies (e.g. consumption of alcohol or drugs or social withdrawal) (Department of Health Northern Ireland, 2019 ; O’Neill et al., 2014 ) and seek psychological help less than women (Addis & Mahalik, 2003 ). Gender differences in coping behaviours are evident in the literature, and women have been found to utilise more coping strategies than males (Tamres et al., 2002 ). A mental health chatbot could potentially help with this, as males could be more likely to open up to a chatbot if they were reluctant to attend face-to-face services.

Implications

The results of the present study highlight what potential users of a mental wellbeing chatbot want or need. This is just one aspect to reflect on in relation to the design and development of mental health chatbots. It is crucial to look at approaches for responsible mental health chatbot design which could consider three things (1) what users say they need, (2) what chatbots and features mental health professionals would endorse, and (3) what AI chatbots can do well (Fig.  2 ). For example, chatbots can easily handle scripted dialogues with pre-defined replies or limited free text responses, and if users wanted a chatbot to self-diagnose or screen then it could be used to collect symptoms and use a decision flow to suggest a diagnosis. However, professionals may not be in support of this which could limit its credibility and widespread adoption. Alternatively, chatbots could be used for answering questions and signposting to paid mental health services, however, users may not want this type of application to direct to paid services and thus may avoid the technology altogether. Another example is a chatbot that supports free text, attempting to detect when a user is feeling depressed and tries to respond in a way that improves the persons mood. This may be endorsed by professionals but given the limitations of AI the responses may be inappropriate if the chatbot failed to understand what the user said or if it gave inappropriate advice. Therefore, a successful digital intervention could be thought of as the intersection between what users want and say they need, what professionals advocate and what AI does well as shown in Fig.  2 .

figure 2

Stakeholder-centered approach for responsible mental health chatbot design

Limitations and Future Directions

In this study, people with previous suicidal thoughts and behaviours in the past year were not eligible to take part in the workshops. This is because we did not want any of the topics around mental health discussed in the workshops to cause distress to any participants. Nonetheless, we did include individuals with reported mental ill health as these are potential end users of this type of application.

The challenge now falls to disciplines such as computing and psychology to come together and advance the current provisions to match the features noted in the needs analysis. This is no easy feat as many practical and ethical issues need consideration. One of the main challenges with chatbot technologies in general lies with natural language processing (NLP), particularly in regards to free text (Kocaballi et al., 2020 ). Previous studies that have trialled mental health chatbots have reported issues with NLP including repetitiveness, shallowness and limitations in understanding and responding appropriately (Inkster et al., 2018 ; Ly et al., 2017 ). Another challenge is building technologies that are capable of competently responding to disclosures of intentions to harm the self or another. Previous work has looked at using machine learning approaches to detect suicidal ideation and self-harm from textual analysis of social media posts (Burnap et al., 2015 ; Roy et al., 2020 ). Future work could utilise similar methodologies in chatbots that are capable of competently responding to such disclosures. Other questions need to be addressed in the future. For example, How do we equip chatbots to respond to emotional statements, considering the wide array of human emotions and how these emotions are expressed? How do we provide follow-up care in a manner that matches the needs of the individual? To what extent is empathy necessary in the interaction or might the utility of chatbots lie primarily in providing the individual with a means to monitor their own wellbeing and any changes in it, and then signpost them to appropriate support services. This may be a very useful starting point given the well documented issues surrounding help seeking and service engagement.

Overall, potential users recognise that chatbots may play a role in supporting mental health and they have clearly outlined their needs. In summary, user needs that can be used to inform chatbot design include: different coping strategies to promote good mental wellbeing; use of validated mental health scales; ask positive questions; provide educational content; reflect on previous conversations; elements of gamification; and keyword triggers to signpost to external resources. The desired persona was a female or gender neutral character, aged around 30, that could build a rapport and regularly check in with the user, allow them to track their mood and share thoughts. It is now important to transform these user needs into chatbot requirements whilst also considering which chatbot features AI can competently facilitate and which features mental health professionals would endorse. Future work must also consider the practical and ethical issues with chatbot technologies.

Change history

02 october 2021.

A Correction to this paper has been published: https://doi.org/10.1007/s41347-021-00226-2

Abd-Alrazaq, A. A., Alajlani, M., Ali, N., et al. (2021). Perceptions and opinions of patients about mental health chatbots: Scoping review. Journal of Medical Internet Research, 23 , e17828.

Article   Google Scholar  

Addis, M. E., & Mahalik, J. R. (2003). Men, masculinity, and the contexts of help seeking. American Psychologist, 58 , 5–14. https://doi.org/10.1037/0003-066X.58.1.5

Aldrich, R. S., Wilde, J., & Miller, E. (2018). The effectiveness of QPR suicide prevention training. Health Education Journal, 77 , 964–977. https://doi.org/10.1177/0017896918786009

Armenta, C. N., Fritz, M. M., & Lyubomirsky, S. (2017). Functions of positive emotions: Gratitude as a motivator of self-improvement and positive change. Emotion Review, 9 , 183–190. https://doi.org/10.1177/1754073916669596

Audrey, M., Temcheff, C. E., Léger, P.-M., et al. (2021). Emotional reactions and likelihood of response to questions designed for a mental health chatbot among adolescents: Experimental study. JMIR Human Factors, 8 (1), e24343. https://doi.org/10.2196/24343

Barkham, M., Bewick, B., Mullin, T., et al. (2013). The CORE-10: A short measure of psychological distress for routine use in the psychological therapies. Counselling and Psychotherapy Research, 13 , 3–13. https://doi.org/10.1080/14733145.2012.729069

Benavides-Vaello, S., Strode, A., & Sheeran, B. C. (2013). Using technology in the delivery of mental health and substance abuse treatment in rural communities: A review. Journal of Behavioral Health Services and Research, 40 , 111–120.

Bond, R. R., Mulvenna, M. D., Finlay, D. D., & Martin, S. (2015). Multi-faceted informatics system for digitising and streamlining the reablement care model. Journal of Biomedical Informatics, 56 , 30–41. https://doi.org/10.1016/j.jbi.2015.05.008

Article   PubMed   Google Scholar  

Burnap, P., Colombo, G., & Scourfield, J. (2015). Machine classification and analysis of suicide-related communication on Twitter. In HT 2015—Proceedings of the 26th ACM Conference on Hypertext and Social Media . https://doi.org/10.1145/2700171.2791023

Calvo, R. A., Dinakar, K., Picard, R., et al. (2018). Toward impactful collaborations on computing and mental health. Journal of Medical Internet Research . https://doi.org/10.2196/jmir.9021

Article   PubMed   PubMed Central   Google Scholar  

Cheng, S. T., Tsui, P. K., & Lam, J. H. M. (2015). Improving mental health in health care practitioners: Randomized controlled trial of a gratitude intervention. Journal of Consulting and Clinical Psychology, 83 , 177–186. https://doi.org/10.1037/a0037895

de Gennaro, M., Krumhuber, E. G., & Lucas, G. (2020). Effectiveness of an empathic chatbot in combating adverse effects of social exclusion on mood. Frontiers in Psychology . https://doi.org/10.3389/FPSYG.2019.03061

Dell’Era, C., & Landoni, P. (2014). Living lab: A methodology between user-centred design and participatory design. Creativity and Innovation Management, 23 , 137–154. https://doi.org/10.1111/caim.12061

Department of Health Northern Ireland. (2019). Protect life 2: A strategy for preventing suicide and self harm in Northern Ireland 2019–2024. Retrieved September 30, 2020, from https://www.health-ni.gov.uk/sites/default/files/publications/health/pl-strategy.PDF

Ducasse, D., Dassa, D., Courtet, P., et al. (2019). Gratitude diary for the management of suicidal inpatients: A randomized controlled trial. Depression and Anxiety, 36 , 400–411. https://doi.org/10.1002/da.22877

Easton, K., Potter, S., Bec, R., et al. (2019). A virtual agent to support individuals living with physical and mental comorbidities: Co-design and acceptability testing. Journal of Medical Internet Research, 21 , e12996. https://doi.org/10.2196/12996

Fitzpatrick, K. K., Darcy, A., & Vierhile, M. (2017). Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): A randomized controlled trial. JMIR Mental Health, 4 , e19. https://doi.org/10.2196/mental.7785

Grové, C. (2021). Co-developing a mental health and wellbeing chatbot with and for young people. Front Psychiatry, 11 , 606041. https://doi.org/10.3389/fpsyt.2020.606041

Hoermann, S., McCabe, K. L., Milne, D. N., & Calvo, R. A. (2017). Application of synchronous text-based dialogue systems in mental health interventions: Systematic review. Journal of Medical Internet Research, 19 , e267.

Inkster, B., Sarda, S., & Subramanian, V. (2018). An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study. JMIR Mental Health . https://doi.org/10.2196/12106

Kocaballi, A. B., Quiroz, J. C., Rezazadegan, D., et al. (2020). Responses of conversational agents to health and lifestyle prompts: Investigation of appropriateness and presentation structures. Journal of Medical Internet Research, 22 , e15823. https://doi.org/10.2196/15823

Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16 , 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x

Ly, K. H., Ly, A. M., & Andersson, G. (2017). A fully automated conversational agent for promoting mental well-being: A pilot RCT using mixed methods. Internet Interventions, 10 , 39–46. https://doi.org/10.1016/j.invent.2017.10.002

Mojtabai, R., Olfson, M., Sampson, N. A., et al. (2011). Barriers to mental health treatment: Results from the National Comorbidity Survey Replication. Psychological Medicine, 41 , 1751–1761. https://doi.org/10.1017/S0033291710002291

Morris, R. R., Kouddous, K., Kshirsagar, R., & Schueller, S. M. (2018). Towards an artificially empathic conversational agent for mental health applications: System design and user perceptions. Journal of Medical Internet Research, 20 , e10148. https://doi.org/10.2196/10148

Mulvenna, M., & Martin, S. (2013). Living labs: Frameworks and engagement. Smart Innovation, Systems and Technologies, 18 , 135–143. https://doi.org/10.1007/978-3-642-34219-6_15

National Institute of Clinical Excellence. (2019). Overview | Depression in adults: recognition and management | Guidance | NICE. Retrieved Sept 30, 2020, from https://www.nice.org.uk/Guidance/CG90

O’Connor, R. C., & Nock, M. K. (2014). The psychology of suicidal behaviour. The Lancet Psychiatry, 1 , 73–85.

O’Neill, S., Corry, C. V., Murphy, S., et al. (2014). Characteristics of deaths by suicide in Northern Ireland from 2005 to 2011 and use of health services prior to death. Journal of Affective Disorders, 168 , 466–471. https://doi.org/10.1016/j.jad.2014.07.028

Palanica, A., Flaschner, P., Thommandram, A., et al. (2019). Physicians’ perceptions of chatbots in health care: Cross-sectional web-based survey. Journal of Medical Internet Research, 21 , 1–10. https://doi.org/10.2196/12887

Provoost, S., Lau, H. M., Ruwaard, J., & Riper, H. (2017). Embodied conversational agents in clinical psychology: A scoping review. Journal of Medical Internet Research, 19 , e151.

Roy, A., Nikolitch, K., McGinn, R., et al. (2020). A machine learning approach predicts future risk to suicidal ideation from social media data. NPJ Digital Medicine, 3 , 1–12. https://doi.org/10.1038/s41746-020-0287-6

Sabo Mordechay, D., Nir, B., & Eviatar, Z. (2019). Expressive writing—Who is it good for? Individual differences in the improvement of mental health resulting from expressive writing. Complementary Therapies in Clinical Practice, 37 , 115–121. https://doi.org/10.1016/j.ctcp.2019.101064

Suhr, M., Risch, A. K., & Wilz, G. (2017). Maintaining mental health through positive writing: Effects of a resource diary on depression and emotion regulation. Journal of Clinical Psychology, 73 , 1586–1598. https://doi.org/10.1002/jclp.22463

Tamres, L. K., Janicki, D., & Helgeson, V. S. (2002). Sex differences in coping behavior: A meta-analytic review and an examination of relative coping. Personality and Social Psychology Review, 6 , 2–30. https://doi.org/10.1207/S15327957PSPR0601_1

Tay, S., Alcock, K., & Scior, K. (2018). Mental health problems among clinical psychologists: Stigma and its impact on disclosure and help-seeking. Journal of Clinical Psychology, 74 , 1545–1555. https://doi.org/10.1002/jclp.22614

Tennant, R., Hiller, L., Fishwick, R., et al. (2007). The Warwick-Dinburgh mental well-being scale (WEMWBS): Development and UK validation. Health and Quality of Life Outcomes . https://doi.org/10.1186/1477-7525-5-63

Vaidyam, A. N., Wisniewski, H., Halamka, J. D., et al. (2019). Chatbots and conversational agents in mental health : A review of the psychiatric landscape. The Canadian Journal of Psychiatry . https://doi.org/10.1177/0706743719828977

Warner, C. H., Appenzeller, G. N., Grieger, T., et al. (2011). Importance of anonymity to encourage honest reporting in mental health screening after combat deployment. Archives of General Psychiatry, 68 , 1065–1071. https://doi.org/10.1001/ARCHGENPSYCHIATRY.2011.112

World Health Organisation. (2019). mhGAP Intervention Guide—Version 2.0. Retrieved Sept 30, 2020, from https://www.who.int/publications/i/item/mhgap-intervention-guide---version-2.0

Xu, Z., Huang, F., Kösters, M., et al. (2018). Effectiveness of interventions to promote help-seeking for mental health problems: Systematic review and meta-analysis. Psychological Medicine, 48 , 2658–2667.

Download references

Acknowledgements

The authors would like to thank all the clients, participants, project members, supporters, and researchers at Ulster University, University of Eastern Finland, Norrbotten Association of Local Authorities, Region Norrbotten, Luleå University of Technology, NHS Western Isles, Action Mental Health, Munster Technological University, and Health Innovation Hub Ireland, for participating in this research. The ChatPal consortium acknowledges the support provided by the Interreg VB Northern Periphery & Arctic Programme project number 345.

Author information

Authors and affiliations.

School of Computing, Ulster University, Newtownabbey, UK

C. Potts, R. B. Bond, M. D. Mulvenna & M. F. McTear

School of Psychology, Ulster University, Derry-Londonderry, UK

E. Ennis & S. O’Neill

School of Art, Ulster University, Belfast, UK

Department of Sport, Leisure and Childhood Studies, Munster Technological University, Cork, Ireland

T. Broderick

NHS Western Isles, Stornoway, UK

Department of Nursing Science, University of Eastern Finland, Kuopio, Finland

L. Kuosmanen, H. Nieminen & A. K. Vartiainen

Department of Health Sciences, Luleå University of Technology, Luleå, Sweden

C. Kostenius

Nimbus Research Centre, Munster Technological University, Cork, Ireland

B. Cahill & A. Vakaloudis

Action Mental Health, Newtownards, UK

G. McConvey

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to C. Potts .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article unfortunately contained a mistake. The name of the author M.D. Mulvenna is now corrected in the author group.

Appendix I: Example chatbot personas created for Woebot and Wysa, and blank template for completion by participants.

figure 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Potts, C., Ennis, E., Bond, R.B. et al. Chatbots to Support Mental Wellbeing of People Living in Rural Areas: Can User Groups Contribute to Co-design?. J. technol. behav. sci. 6 , 652–665 (2021). https://doi.org/10.1007/s41347-021-00222-6

Download citation

Received : 11 March 2021

Revised : 22 July 2021

Accepted : 30 August 2021

Published : 20 September 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s41347-021-00222-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Mental health
  • Co-creation
  • Conversational agents
  • Conversation design
  • Living labs
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • JMIR Mhealth Uhealth
  • v.10(10); 2022 Oct

Validity of Chatbot Use for Mental Health Assessment: Experimental Study

Anita schick.

1 Department of Public Mental Health, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany

Jasper Feine

2 Institute of Information Systems and Marketing, Karlsruhe Institute of Technology, Karlsruhe, Germany

Stefan Morana

3 Junior Professorship for Digital Transformation and Information Systems, Saarland University, Saarbruecken, Germany

Alexander Maedche

Ulrich reininghaus, associated data.

Sensitivity analyses.

Mental disorders in adolescence and young adulthood are major public health concerns. Digital tools such as text-based conversational agents (ie, chatbots) are a promising technology for facilitating mental health assessment. However, the human-like interaction style of chatbots may induce potential biases, such as socially desirable responding (SDR), and may require further effort to complete assessments.

This study aimed to investigate the convergent and discriminant validity of chatbots for mental health assessments, the effect of assessment mode on SDR, and the effort required by participants for assessments using chatbots compared with established modes.

In a counterbalanced within-subject design, we assessed 2 different constructs—psychological distress (Kessler Psychological Distress Scale and Brief Symptom Inventory-18) and problematic alcohol use (Alcohol Use Disorders Identification Test-3)—in 3 modes (chatbot, paper-and-pencil, and web-based), and examined convergent and discriminant validity. In addition, we investigated the effect of mode on SDR, controlling for perceived sensitivity of items and individuals’ tendency to respond in a socially desirable way, and we also assessed the perceived social presence of modes. Including a between-subject condition, we further investigated whether SDR is increased in chatbot assessments when applied in a self-report setting versus when human interaction may be expected. Finally, the effort (ie, complexity, difficulty, burden, and time) required to complete the assessments was investigated.

A total of 146 young adults (mean age 24, SD 6.42 years; n=67, 45.9% female) were recruited from a research panel for laboratory experiments. The results revealed high positive correlations (all P <.001) of measures of the same construct across different modes, indicating the convergent validity of chatbot assessments. Furthermore, there were no correlations between the distinct constructs, indicating discriminant validity. Moreover, there were no differences in SDR between modes and whether human interaction was expected, although the perceived social presence of the chatbot mode was higher than that of the established modes ( P <.001). Finally, greater effort (all P <.05) and more time were needed to complete chatbot assessments than for completing the established modes ( P <.001).

Conclusions

Our findings suggest that chatbots may yield valid results. Furthermore, an understanding of chatbot design trade-offs in terms of potential strengths (ie, increased social presence) and limitations (ie, increased effort) when assessing mental health were established.

Introduction

Mental disorders are a leading cause of disease burden in high-income countries and first emerge in adolescence and young adulthood [ 1 ]. Thus, mental health in young people is a major public health concern [ 2 ]. However, psychological help remains difficult to access [ 3 ]. To address this problem, digital technologies provide a scalable alternative for accessing low-threshold psychological assessments, digital diagnostics, and interventions [ 4 ]. In particular, digital technologies can support the early detection of symptoms, diagnostics, and treatment as they may improve access to mental health services for difficult-to-reach populations without requiring on-site visits using desktop PCs, tablets, or mobile devices [ 5 ].

Text-based conversational agents (ie, chatbots) are a promising digital technology in this context [ 6 - 12 ]. Chatbots interact with users via natural language [ 13 ], keeping individuals engaged in the task at hand, thereby increasing adherence [ 10 , 14 ]. Chatbots as software-based systems enabling asynchronous interactions have received increasing attention during the COVID-19 pandemic to provide information about infection numbers, rules, and restrictions [ 15 ], thereby improving health literacy and reducing the burden on the health care system. In addition, chatbots have been investigated in several studies and applied to assess or monitor mental health [ 16 ], deliver information for improving mental health literacy [ 9 , 14 , 15 , 17 ], and assist and compound therapy sessions as guided or blended care [ 18 - 22 ]. Irrespective of the popularity of chatbots, reviews of their application in the context of (mental) health emphasize the quasi-experimental nature of studies and the need to empirically evaluate their impact [ 7 , 16 , 23 - 26 ]. Specifically, for wider application, the extent to which a new mode for assessing a construct (eg, chatbots assessing psychological distress) converges with established assessment modes of the same construct (ie, the convergent validity) needs to be demonstrated. In addition, discriminant validity (ie, the extent to which a construct can be distinguished from another, unrelated construct) needs to be examined. However, to date, no study has specifically examined the validity of chatbot use in assessing mental health.

This is particularly relevant, as there is evidence that individuals preconsciously attribute human characteristics to chatbots because of increased perceived social presence [ 27 - 30 ]. Social presence can be defined as “the degree of salience of the other person in a mediated communication and the consequent salience of their interpersonal interactions” [ 31 ]. Thus, individuals may feel a sense of personal, sociable, and sensitive human contact during a computer-mediated interaction. Although an increase in perceived social presence in face-to-face interviews has been found to increase response biases [ 32 - 35 ], self-reported assessments associated with reduced social presence have demonstrated reliability and validity compared with, for example, face-to-face assessments [ 36 - 40 ]. However, the natural language interaction style of chatbots may yield response biases such as socially desirable responding (SDR) [ 32 , 41 , 42 ], where participants disclose less socially sensitive information, which might be of special interest when applying for mental health assessment.

Previous evidence indicates that SDR may increase when individuals expect their responses to be immediately reviewed and evaluated by a researcher [ 33 , 43 , 44 ]. If chatbots are perceived as human actors [ 42 , 45 ], this may lead individuals to believe that their responses are immediately reviewed and evaluated. This may bias the results compared with web-based assessments that are not presented with a natural language interface and would limit the application of chatbots in remote settings, in which information is not immediately shared with a clinician. Consequently, it is necessary to investigate whether SDR is increased in settings where individuals do or do not expect their responses to be immediately reviewed when assessed by chatbots.

Finally, there is evidence that chatbots may not necessarily reduce participants’ efforts to complete the assessments [ 46 , 47 ]. Although the completion of assessments delivered via established assessment modes is simple (eg, by ticking a box or clicking a button), chatbots require more complex natural language interactions. This may increase the cognitive resources and duration required for assessments using chatbots [ 46 , 47 ]. Thus, it is necessary to investigate whether individuals using a chatbot perceive assessments as more effortful (ie, as being more complex, difficult, and associated with more burden), as well as whether they require more time to complete assessments than when using established modes.

This study aimed to investigate (1) the convergent and discriminant validity of assessments using chatbots, (2) the effect of assessments using chatbots on SDR, and (3) the effort of assessments using chatbots compared with established paper-and-pencil and web-based assessment modes. Specifically, we proposed the following hypotheses: chatbots applied to assess mental health (ie, psychological distress and problematic alcohol use) in healthy young adults will show high convergent validity with established assessment modes and high discriminant validity (hypothesis 1); increase SDR compared with established assessment modes (hypothesis 2a); increase SDR compared with established modes, especially in settings where individuals do not expect their responses to be immediately reviewed by the research team (hypothesis 2b); and be perceived as more effortful (ie, complex, difficult, and associated with more burden) and will require more time to complete than established assessment modes (hypothesis 3).

Experimental Design

A laboratory experiment applying a randomized mixed design with 3 within-subject conditions and 2 between-subject conditions was conducted. The within-subject manipulation comprised three assessment modes: (1) paper-and-pencil mode, (2) desktop computer using a typical web-based screening mode (web-based), and (3) assessment on a desktop computer screen using a chatbot (chatbot). For the between-subject manipulation, we randomly assigned participants to two conditions: participants in condition A (low-stake condition) were informed that their responses were not immediately reviewed by the research team, and participants in condition B (high-stake condition) were informed that their responses were immediately reviewed and may require a follow-up interaction with the research team.

Procedure and Manipulation

The experimental procedure is illustrated in Figure 1 . First, participants were assigned to 1 of the 2 conditions. We conducted 6 experimental sessions on 2 consecutive days, with 3 sessions assigned to condition A (low-stake condition) and 3 sessions assigned to condition B (high-stake condition). After signing the informed consent form, participants were seated in front of a desktop computer screen in single air-conditioned and soundproof test chambers. Second, participants listened to a prerecorded voice message explaining the experimental procedure and the instructions. Participants in condition B were informed of their individual participation numbers. The number was displayed on the computer screen throughout the experiment: in the web-based mode, LimeSurvey [ 48 ] displayed the participant number at the top of the screen; in the paper-and-pencil mode, participants had to write their participant number on the questionnaire; and in the chatbot mode, participants were addressed with their participant number (ie, “Hello participant 324352”) displayed in the chat window below their responses.

An external file that holds a picture, illustration, etc.
Object name is mhealth_v10i10e28082_fig1.jpg

Experimental procedure.

Next, the computer screen was automatically turned on, and the experiment began with a pre-experiment questionnaire using LimeSurvey [ 48 ]. Subsequently, mental health was assessed using the 3 different modes in a counterbalanced order ( Figure 2 ). The web-based mode used the default LimeSurvey question format. The paper-and-pencil mode comprised a printout of the digital version, which was placed in an envelope in each chamber. After completing the paper-and-pencil mode, the participants were asked to place the questionnaire in the envelope and seal the envelope with adhesive tape. The chatbot mode was developed using the Microsoft Bot Framework [ 49 ] and was integrated into LimeSurvey. The chatbot presented the items one after another and offered 2 ways of responding, either by natural language or by selecting a value (implemented as a button). The chatbot incorporated the following social cues to further increase perceived social presence [ 28 , 30 ]: an anthropomorphic icon [ 50 ], the capability to engage in small talk [ 51 ], a dynamically calculated response delay based on the length of the response [ 30 ], and a typing indicator (3 moving dots indicating that a message is being prepared) [ 52 ]. Microsoft’s personality chat small talk package was used to enable a small talk interaction. This knowledge base was implemented in Microsoft’s QnA Maker and was connected to the chatbot. When the QnA model identified a high match with an incoming user message, the chatbot answered with an appropriate small talk phrase. However, the chatbot’s capabilities were restricted, and no sophisticated conversations were possible. For example, the small talk included greetings such as “Hi/Hello/Good Morning!” and “How are you?”; however, the small talk did not account for the context. After answering with a small talk phrase, the chatbot always repeated the prior question. In addition, we did not record the log files of the chats. On the continuum of machine-like to human-like appearance, we chose an intermediate design to avoid the induction of negative affect toward the chatbot, which has been postulated for the increased human-likeness of robots according to the uncanny valley theory by Mori [ 53 ]. In addition, we chose the name indicator Chatbot , as robotic names have been reported to be positively perceived [ 6 ].

An external file that holds a picture, illustration, etc.
Object name is mhealth_v10i10e28082_fig2.jpg

Investigated assessment modes (displayed in German).

Finally, the participants answered a postexperiment questionnaire using LimeSurvey. They were then debriefed and received their compensation.

In the pre-experiment questionnaire, we assessed demographic variables (eg, sex, age, and education), followed by questions on participants’ prior experience with using specific technologies (ie, internet and chatbots) with regard to health questions. Next, their experience with paper-and-pencil and web-based surveys, as well as with chatbots, was assessed on a scale ranging from 1 (no experience) to 5 (very much experience).

Balanced Inventory of Desirable Responding

On the one hand, we applied the short form of the Balanced Inventory of Desirable Responding (BIDR) scale, which comprises two subscales: self-deceptive enhancement and impression management [ 54 , 55 ] to capture SDR. The 18 items were rated on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). We calculated the total score for each scale and the BIDR total score, which ranged from 1 to 126.

On the other hand, we operationalized SDR as a response shift; that is, a change in participant’s mental health scores between repeated assessments in different modes.

Mental Health Measures

Mental health was assessed using the following measures in all 3 modes.

Kessler Psychological Distress Scale

Psychological distress in the past month was measured using the Kessler Psychological Distress Scale (K10) [ 56 ]. This 10-item self-report questionnaire is rated on a Likert scale ranging from 1 (never) to 5 (always). The K10 total score was calculated. Strong psychometric properties of the K10 have been reported [ 56 ].

Brief Symptom Inventory

We used the short form of the Brief Symptom Inventory (BSI-18) [ 57 , 58 ] to assess psychological distress in the past 7 days. Participants indicated whether they had experienced 18 symptoms, comprising 3 dimensions: somatization, depression, and anxiety. The items were rated on a scale from 1 (not at all) to 5 (very much). We calculated the total score indicating general distress (BSI–General Severity Index) [ 58 ].

Alcohol Use Disorders Identification Test-3

We assessed alcohol use by applying the Alcohol Use Disorders Identification Test (AUDIT)–3 questionnaire [ 59 , 60 ], which has been shown to perform similarly well as the AUDIT-10 in detecting risky drinking behavior [ 60 ]. The items were presented on a 5-point scale with different labels asking about the amount of alcohol consumption. The total AUDIT-3 score was calculated.

The time at the beginning and end of data collection in each mode was recorded. In the postexperiment questionnaire, participants had to rank the 3 modes regarding complexity, difficulty, and burden. Subsequently, we asked participants to rate others’ discomfort when answering each item of the mental health measures, thereby deriving a measure of subjective sensitivity in line with Bradburn et al [ 61 ].

Attention and Manipulation Checks

In the attention check, participants had to select a specific item on a Likert scale to verify that they carefully followed the instructions (“Please select the answer very often”). To test the within-subject manipulation, we investigated differences in the perceived social presence of each mode using the 4 items by Gefen and Straub [ 62 ], which were rated on a 7-point Likert scale. The internal consistency of the perceived social presence of the 3 modes was high (Cronbach α>.89).

Furthermore, participants had to indicate in the postexperiment questionnaire whether their answers were immediately reviewed, in line with Fisher [ 44 ] (between-subject manipulation check).

Power Analysis and Recruitment

An a priori analysis in G*Power software (Heinrich-Heine-Universität Düsseldorf) [ 63 ] estimated a total sample size of 116 (α=.05; f =0.15; Cohen d =0.95). For recruitment, we invited individuals registered with the university’s research panel, comprising mainly students from the Karlsruhe Institute of Technology. The experiment lasted 45 minutes on average and participants were compensated for their participation with €8 (US $8.06) after the experiment.

Statistical Analysis

SPSS Statistics (version 25; IBM Corp) and STATA (version 16.0; StataCorp) were used to analyze the data. Participant characteristics were summarized using means and SDs for continuous variables and frequencies and percentages for dichotomous variables. To investigate differences between groups, we calculated the ANOVAs for individuals’ tendency to respond as socially desirable (BIDR) and the perceived sensitivity of each measure (K10, BSI-18, and AUDIT-3). Furthermore, differences between prior experience with, as well as the perceived social presence of, modes were investigated by calculating repeated-measures ANOVAs (rmANOVAs). As data on prior experience ( χ 2 2 =46.4; P <.001) and perceived social presence ( χ 2 2 =49.5; P <.001) violated the assumptions of sphericity, Huynh-Feldt corrections were applied.

The internal consistency of the mental health measures for each mode was evaluated using Cronbach α. Next, the test-retest reliabilities of the chatbot-based, paper-and-pencil–based, and desktop-based assessment modes were evaluated by calculating intraclass correlation coefficients (ICCs) ranging from 0 (no agreement) to 1 (perfect agreement).

To test hypothesis 1 on the discriminant and convergent validity of assessment modes, we calculated Pearson correlations and applied Bonferroni correction to account for multiple testing. In line with the multitrait-multimethod approach by Campbell and Fiske [ 64 ], we tested 3 independent assessment modes with 2 different constructs—psychological distress (K10 and BSI-18) and problematic alcohol use (AUDIT-3)—to derive discriminant and convergent validity. Validity is indicated by a correlation coefficient of ≥0.50 [ 63 ].

To test hypothesis 2a, we calculated repeated-measures analyses of covariance (rmANCOVAs) with the within-subject factor mode (paper-and-pencil, web-based, and chatbot) and the following covariates: (1) perceived sensitivity of the items and (2) individuals’ tendency to respond socially desirable (BIDR). Sex was also included as a control variable in all the analyses. Lavene test revealed the homogeneity of variances for all 3 measures. As the AUDIT-3 data violated the assumptions of sphericity ( χ 2 2 =13.2; P =.001), the Huynh-Feldt correction was applied in the rmANCOVA.

To test hypothesis 2b, rmANCOVAs with the within-subject factor mode (paper-and-pencil, web-based, and chatbot) and condition (A and B) as additional covariates were calculated. Lavene test revealed the homogeneity of variances for all modes. Again, the AUDIT-3 data violated the assumption of sphericity ( χ 2 2 =13.4; P =.001), and the Huynh-Feldt correction was applied.

To test hypothesis 3 on the effort of assessment, we analyzed the ranked-ordered data on complexity, difficulty, and burden by calculating Friedman tests and Dunn-Bonferroni post hoc signed-rank tests for pairwise comparisons. Differences in the duration to complete the assessments were investigated by calculating rmANOVAs with the within-subject factor mode (paper-and-pencil, web-based, and chatbot). As the data violated the assumptions of sphericity ( χ 2 2 =9.1; P =.01), the Huynh-Feldt correction was applied.

Ethics Approval

The experiment took place at the Karlsruhe Decision and Design Lab, adhering to its procedural and ethical guidelines. No ethics approval was applied for as participants were recruited from the registered participant panel of healthy students. Individuals voluntarily participated after being fully informed about the study procedures and signing the informed consent form. No identifying data were collected.

Sample Characteristics

We invited all individuals registered in the university’s research panel to participate in the experiment. A total of 155 individuals participated in the study, of whom 9 (5.8%) participants were excluded as they failed the attention check, indicating that they may not have followed the instructions of the experiment or had not read the individual items carefully. Consequently, 146 participants were included in the analysis, of whom 72 (49.3%) were in condition A and 74 (50.7%) were in condition B.

The sample characteristics and control variables are presented in Table 1 . Overall, we investigated a sample of young students from which most participants had a high school or bachelor’s degree. In addition, two-thirds of the participants (100/146, 68.5%) indicated that they had used the internet to access information on mental health before. However, only 4.1% (6/146) of participants replied having interacted with a chatbot in a health-related context before. Prior experience with assessment modes differed across the 3 modes, as revealed by the rmANOVA ( F 1.58, 229.39 =225.23; P <.001). Post hoc analyses with a Bonferroni adjustment further showed that the experience with chatbots (mean 1.73 , SD 1.02) was lower than the experience with paper-and-pencil surveys (mean 3.45 , SD 0.85), as well as the experience with web-based surveys (mean 3.52 , SD 0.82, all P< .001). Experience with paper-and-pencil surveys did not significantly differ from that with web-based surveys ( P =.78). Individuals’ tendency to respond socially desirable, as measured using the BIDR, did not differ between conditions ( F 1,144 =0.131; P =.72) and was centered on the mean ( W 146 =0.98; P =.09). The perceived sensitivity of the items of the 3 mental health measures did not differ between the 2 conditions (all P >.47) but differed between the 3 measures ( F 1.41, 88.22 =105.64; P <.001). Post hoc analyses with Bonferroni adjustment indicated that AUDIT-3 items (mean 3.39, SD 1.07) were rated as more sensitive than K10 items (mean 2.59, SD 0.66; P< .001), as well as BSI-18 items (mean 2.33 , SD 2.33, P< .001). Furthermore, the K10 items (mean 2.59, SD 0.66) were perceived to be more sensitive than the BSI-18 items (mean 2.33 , SD 0.58; P< .001).

Sample characteristics (N=146).

a Number of participants who previously used technology in a health-related context.

b BIDR: Balanced Inventory of Desirable Responding.

c BIDR-SDE: Balanced Inventory of Desirable Responding–Self-deceptive enhancement.

d BIDR-IM: Balanced Inventory of Desirable Responding–Impression management.

e K10: Kessler Psychological Distress Scale.

f BSI-18: Brief Symptom Inventory-18.

g AUDIT-3: Alcohol Use Disorders Identification Test-3.

Manipulation Checks

With regard to the within-subject manipulation, the results of the rmANOVA revealed a significant effect of mode on perceived social presence ( F 1.56 , 226.67 =61.96; P< .001), with social presence rated highest in the chatbot mode (mean 2.74, SD=1.51) compared with the web-based mode (mean 1.48 , SD 0.88; P< .001) and paper-and-pencil mode (mean 1.79, SD 1.21; P< .001).

Responses to the between-subject manipulation check showed that 93.2% (136/146) of participants provided a correct answer—2.7% (4/146) of individuals with wrong answers were in condition A and 4.1% (6/146) were in condition B—and were aware of their condition. Consequently, we concluded that both within-subject and between-subject manipulations were successful.

Reliability of Chatbots for Mental Health Assessments

Table 2 displays the mean, SD, Cronbach α, and ICC for the mental health measures in each mode by condition. The ICCs of the paper-based, desktop-based, and chatbot modes were high and ranged between 0.96 and 1.00, indicating excellent agreement across modes and a high test-retest reliability. Cronbach α did not strongly vary between modes and ranged between 0.74 and 0.92, indicating an acceptable to excellent internal consistency of the measures.

Internal consistency and test-retest reliability of mental health assessments.

a ICC: intraclass correlation coefficient.

b K10: Kessler Psychological Distress Scale.

c BSI-18: Brief Symptom Inventory-18.

d AUDIT-3: Alcohol Use Disorders Identification Test-3.

Validity of Assessments Using Chatbots (Hypothesis 1)

As depicted in Table 3 , there were strong positive correlations between the measures of psychological distress (K10 and BSI-18) assessed by the different modes, with correlation coefficients ranging from 0.83 to 0.96, indicating convergent validity. Furthermore, there were strong positive correlations between the AUDIT-3 scores assessed using the different modes. There were no significant correlations among AUDIT-3, K10, and BSI-18 after Bonferroni correction, indicating discriminant validity between the different constructs.

Pearson correlation of questionnaires and modes. Higher numbers reflect a stronger association between variables.

a K10: Kessler Psychological Distress Scale.

b BSI-18: Brief Symptom Inventory-18.

c AUDIT-3: Alcohol Use Disorders Identification Test-3.

d Unadjusted P value; the Bonferroni corrected significance level was computed by dividing the unadjusted P value by the total number of tests; that is, P =.05/45=.0011.

SDR to Chatbots in Mental Health Assessments (Hypotheses 2a and 2b)

Addressing hypothesis 2a, the rmANCOVA on the effect of mode on mental health assessment revealed no main effect of mode on K10 ( F 2,284 =0.35; P =.71). Moreover, there was no interaction between mode and social desirability ( F 2,284 =0.80; P =.45) or perceived sensitivity of the items ( F 2,284 =0.43; P =.65); however, there was a significant interaction with sex ( F 2,284 =3.21; P =.04). The second mental distress measure, the BSI-18, showed similar results. The rmANCOVA revealed no significant main effect of mode on general distress ( F 2,248 =0.90; P =.41). Again, there was no interaction between mode and social desirability ( F 2, 284 =1.7; P =.19), sensitivity ( F 2,284 =0.23; P =.80), or sex ( F 2,284 =2.66; P =.07). Similarly, the rmANCOVA on AUDIT-3 scores revealed no significant main effect of mode ( F 1 . 90 , 269.57 =0.00; P =1.00), as well as no interaction of mode with social desirability ( F 1.90 , 269.57 =0.01; P =.99), perceived sensitivity of items ( F 1 . 90 , 269.57 =0.24; P =.77), or sex ( F 1.90 , 269.57 =0.33; P =.71).

The effect of the condition on mental health assessment (hypothesis 2b) was investigated using a second set of rmANCOVAs. The results revealed no significant interaction effect between mode and condition on psychological distress assessed by K10 ( F 2,282 =0.91; P =.41), general distress assessed using the BSI ( F 2,282 =0.29; P =.75), or alcohol use assessed by AUDIT-3 ( F 1.91, 269.14 =0.55; P =.57).

Difficulty of Assessments Using Chatbots (Hypothesis 3)

Table 4 shows the mean rating of complexity, difficulty, and burden. A Friedman test revealed a significant difference between the difficulty associated with the modes ( χ 2 2 =13.5; P =.001). Dunn-Bonferroni post hoc tests showed that the assessment by a chatbot was rated as significantly more difficult than using the paper-and-pencil mode ( z =3.63; P =.001). Furthermore, there was a statistically significant difference in perceived complexity depending on the mode ( χ 2 2 =10.15; P =.006). Again, Dunn-Bonferroni post hoc tests showed that the chatbot assessment was ranked as more complex than the paper-and-pencil assessment ( z =3.16; P =.005). In terms of burden, a Friedman test indicated that there was a statistically significant difference ( χ 2 2 =12.4; P =.002), and Dunn-Bonferroni post hoc tests further revealed that the web-based assessment required significantly less effort than the chatbot ( z =2.64; P =.03) and the paper-and-pencil assessment ( z =−3.34; P =.003). The analysis of duration revealed a significant effect of mode ( F 1.91, 276.68 =186.60; P< .001). Post hoc analyses with Bonferroni adjustment revealed that the pairwise differences between all modes were significant ( P< .001). The longest duration was logged to complete the chatbot assessment and the shortest duration was required to complete the web-based assessment.

Effort of assessment modes.

Principal Findings

This study examined the validity, effect on SDR, and effort required for the completion of chatbot-based assessments of mental health. The results revealed that all assessments of mental health (K10, BSI, and AUDIT) in each mode showed acceptable to excellent internal consistency and high test-retest reliability. High positive correlations between the measures of the same construct across different assessment modes indicated the convergent validity of the chatbot mode, and the absence of correlations between distinct constructs indicated discriminant validity (hypothesis 1). Although assessment modes were not affected by social desirability (hypothesis 2a), chatbot assessment was higher for perceived social presence. There was no evidence of an interaction between condition and mode, indicating that social desirability did not increase because of expectations around immediate follow-up contact with a researcher in the chatbot assessment mode (hypothesis 2b). Finally, in terms of participants’ effort (hypothesis 3), the assessment using a chatbot was found to be more complex, difficult, and associated with more burden than the established modes, resulting in a longer duration to complete.

Limitations

The present findings must be considered in light of several limitations. First, the selection of a student sample may have resulted in the low external validity of the laboratory experiment. According to previous mental health assessments in the general population, our sample showed only moderate distress [ 65 ]. There is evidence that individuals disclose more information on sensitive topics such as health risk behavior in clinical settings [ 66 ]. Future research should further investigate the application of chatbots in clinical samples, as the present findings on social desirability or perceived social presence of chatbots do not readily generalize to clinical populations.

Second, we reduced the effect of between-person differences by selecting a within-person design, which had several limitations. Each participant completed questionnaires in all 3 modes, with an average break between modes of approximately 1 minute. During the break, participants rated their social presence and read the instructions in the next experimental section. The break may have been too short to minimize memory effects. In addition, all measures used Likert scales, which may have increased memory effects because of their simplicity. To address this limitation, we completely counterbalanced the order of the 3 modes in the experimental procedure. Furthermore, in a sensitivity analysis using data from only the first mode presented to the participants, we did not find any differences, which further supports the reported results ( Multimedia Appendix 1 , Table S1). However, other factors such as the need for consistent responses may have overcome social desirability. Again, a longer break between assessments or a between-subject design could be applied in future experiments.

Third, the lack of an effect of mode on change in mental health scores may have been a result of the experimental design or chatbot design. As mentioned previously, we did not assess social pressure; however, individuals showed stronger SDR in high-stakes assessment situations. Thus, the assessment of social pressure is recommended for future studies. Furthermore, in this experiment, the chatbot followed a procedural dialog flow using Likert scales and, in addition to basic small talk capabilities using several social cues [ 30 ], was unable to answer questions about topics other than the assessments. Although we demonstrated a higher perceived social presence of the chatbot, this may not have been sufficient to resemble the communication flow of a human interviewer. In addition, the perceived social presence of the chatbot may have led to increased expectations of participants in terms of the chatbot’s interactivity and natural language capabilities [ 28 ]. Thus, the chatbot may have raised expectations that may not have been met [ 67 ]. Consequently, future research should investigate different chatbot designs that support less restricted non–goal-oriented natural language interactions. In this regard, further experiments should evaluate the influence of social and empathic responses on mental health assessments.

Fourth, this study investigated the convergent and discriminant validity of measures and modes to assess the constructs of psychological distress and alcohol use. We aimed to reduce the participant burden by selecting only 3 measures of mental health. However, other even less related constructs could have been investigated to facilitate the evaluation of discriminant validity. This issue should be addressed in future research.

Finally, the longer duration of completing the assessment using a chatbot may have resulted from participants potentially entering their responses by typing or using the menu option. In this study, we did not assess the method of entering data that was used. In future research, either one response option should be favored or the 2 response options may be compared by applying a microrandomized design.

Comparison With Prior Work

The use of chatbots for mental health assessment is an emerging field, and robust investigations of their positive and potential negative effects are required [ 16 ]. Given that recent studies have shown the feasibility of the application of chatbots in general, particularly in relation to monitoring [ 15 ], offering information on, as well as delivering interventions for, improving mental health [ 62 , 63 ], there is a need for methodological research on the use of chatbots in this context [ 7 , 16 , 23 - 26 ]. This appears to be particularly important in cases where chatbots may be seen as social actors (ie, human interviewers) evoking social desirability. Therefore, it needs to be shown that using chatbots for assessing mental health does not result in biased outcomes.

The application of chatbots has been previously shown to affect the collected data and either reduce [ 68 - 70 ] or increase [ 42 ] the SDR compared with assessments by human interviewers. Other studies have found that chatbot assessments may result in comparable results with established modes [ 8 , 46 , 71 ]. However, some studies have found this effect only in adult samples [ 72 ] or depending on the chatbot’s visual and linguistic design [ 42 , 73 ]. In this context, chatbots with high conversational abilities or a more human-like embodiment have been shown to elicit more SDR to socially sensitive questions than established modes [ 42 , 73 ]. However, this was not the case when a chatbot with fewer human-like conversational abilities was presented [ 42 , 73 ], which is consistent with findings of this study. Thus, an assessment using a chatbot with the presented design and procedural dialog flow does not seem to induce additional SDR. Despite this finding, it may be of interest to develop chatbots with high conversational abilities as these may enhance adherence and increase compliance, for example, in digital interventions [ 8 , 11 , 21 , 24 ]. This is particularly important for delivering interventions and building stable human-chatbot interactions [ 51 ]. Therefore, further research on chatbots is required, for example, in which different conversational interaction strategies may be applied. A promising approach may be to enable reciprocal self-disclosure, in which the chatbot reveals sensitive information, as this has been shown to result in a reciprocal effect on promoting individuals’ self-disclosure [ 70 ], as well as perceived intimacy and enjoyment [ 74 ]. Another promising approach may be the application of contingent interaction strategies, as individuals disclose more information on a website if contingent questions depending on previous interactions are displayed [ 75 ]. Moreover, voice-based conversational agents may improve response quality to sensitive questions [ 76 ]. However, more research on the design of voice-based conversational agents for mental health assessment is required [ 77 ]. In addition, unconstrained natural language input to conversational agents poses safety risks that must be evaluated thoroughly. As recently shown by Bickmore et al [ 78 ], voice-based assistants failed more than half of the time when presented with medical inquiries. Therefore, further evaluation of human-computer interactions and education about the capabilities of conversational agents is required.

In contrast to previous findings on assessments using chatbots reporting higher data quality or more engagement [ 8 , 9 , 11 , 47 , 69 ], we showed that chatbot assessments were more difficult, complex, and associated with more burden to complete than assessments using established modes. In addition, more time was required to complete the assessments. The latter has been previously shown [ 47 ] and may result from the increased cognitive demand of a communication flow, where an individual must decode and aggregate the impression-bearing and relational functions conveyed in computer-mediated communication [ 79 ]. In addition, increased effort may result from individual preferences or prior experiences with chatbots in other contexts. It has been shown that populations with high health literacy rates prefer established modes because of their efficiency and ability to proceed at their own pace [ 46 ]. This may be particularly relevant in a sample of young students. Furthermore, this finding is in line with the communication literature arguing that simple tasks may be conducted more efficiently through learner media [ 80 ]. Thus, simple tasks such as selecting Likert scale items in mental health questionnaires may be more efficiently conducted through the use of established modes such as paper-and-pencil or web-based assessments [ 81 ]. This may imply that the best application area of chatbots in mental health may not be symptom monitoring or screening but rather providing information or delivering an intervention in unstructured natural language interactions. Recent evidence supports the use of chatbot-based interventions as they have been found to perform equally well as standard treatment methods (eg, face-to-face and telephone counseling) [ 7 ].

This work provides further evidence on the use of chatbots to assess mental health on site in clinics but also in asynchronous remote medical interactions (eg, at home) [ 17 , 70 , 82 ]. As the assessment modes between conditions did not differ, the results show that the application of a chatbot results in valid responses, regardless of whether the data are immediately reviewed and evaluated by a human actor [ 70 , 83 ]. Therefore, chatbots have the potential to reduce the workload in clinical settings by providing valid remote assessments, which is especially necessary for situations in which the medical system is at its limits. As stated by Miner et al [ 15 ], chatbots may be a digital solution that may help provide information, monitor symptoms, and even reduce psychosocial consequences during the COVID-19 pandemic. Recently, several chatbots for monitoring COVID-19 symptoms have been published, as reviewed by Golinelli et al [ 84 ]. In contrast to other mental health apps, chatbots have the advantage of providing communication that may additionally help to reduce loneliness during means of physical distancing [ 85 , 86 ]. For example, it has been shown that users may develop a strong social relationship with a chatbot when it expresses empathetic support [ 21 , 51 , 85 , 87 - 90 ]. Moreover, promising real-world examples of empathetic mental health chatbots have shown their effectiveness in practice, such as the mobile app chatbots Wysa [ 85 ], Woebot [ 6 ], and Replika [ 91 ]; however, they have also raised ethical concerns [ 10 ]. Thus, the application of chatbots in mental health research and practice may depend on the specific application (symptom monitoring vs guided intervention) and its potential advantages (ie, increased social presence) and disadvantages (ie, increased effort) while respecting users’ privacy and safety.

These findings provide evidence of the validity of chatbots as digital technology for mental health assessment. In particular, when paper-and-pencil assessments are not applicable (eg, remote assessments in eHealth settings) or when it may be beneficial to increase perceived social presence (eg, to establish a long-term user-chatbot relationship), chatbots are promising alternatives for valid assessment of mental health without leading to socially desirable responses. However, as participants’ efforts have increased, future research on appropriate chatbot designs and interaction flow is necessary to fully leverage their advantages in compounding digital care.

Acknowledgments

The authors would like to thank all the participants. This work was funded by a ForDigital grant from the Ministry of Science, Research, and Arts of the State of Baden-Württemberg, Germany. UR was supported by a Heisenberg professorship (number 389624707) funded by the German Research Foundation. The authors would like to thank the reviewers for their valuable comments on this manuscript.

Abbreviations

Multimedia appendix 1.

Conflicts of Interest: None declared.

Chatbot for Mental health support using NLP

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. (PDF) IRJET- Chatbot for Monitoring Mental Health

    mental health chatbot research paper

  2. (PDF) Therapy Chatbot: A Relief From Mental Stress And Problems

    mental health chatbot research paper

  3. (PDF) Empathic Chatbot: Emotional Intelligence for Mental Health Well-being

    mental health chatbot research paper

  4. (PDF) Artificial Intelligence for Chatbots in Mental Health

    mental health chatbot research paper

  5. (PDF) A Mental Health Chatbot with Cognitive Skills for Personalised

    mental health chatbot research paper

  6. Sensors

    mental health chatbot research paper

COMMENTS

  1. (PDF) Artificial Intelligence for Chatbots in Mental Health

    Abstract. With the help of artificial intelligence, the way humans are able to understand each other and give a response accordingly, is fed into the chatbot systems, i.e. into systems that are ...

  2. Effectiveness and Safety of Using Chatbots to Improve Mental Health

    There is a shortage of mental health human resources, poor funding, and mental health illiteracy globally [5,6]. This lack of resources is especially evident in low-income and middle-income countries where there are 0.1 psychiatrists per 1,000,000 people , compared to 90 psychiatrists per 1,000,000 people in high-income countries .

  3. Artificially intelligent chatbots in digital mental health

    Areas covered . We summarize the current landscape of DMHIs, with a focus on AI-based chatbots. Happify Health's AI chatbot, Anna, serves as a case study for discussion of potential challenges and how these might be addressed, and demonstrates the promise of chatbots as effective, usable, and adoptable within DMHIs.Finally, we discuss ways in which future research can advance the field ...

  4. Chatbots and Conversational Agents in Mental Health: A Review of the

    Studies were included that involved a chatbot in a mental health setting focusing on populations with or at high risk of developing depression, anxiety, schizophrenia, bipolar, and substance abuse disorders. Results: From the selected databases, 1466 records were retrieved and 8 studies met the inclusion criteria.

  5. Chatbots and Conversational Agents in Mental Health: A Review of the

    Chatbots are an emerging field of research in psychiatry, but most research today appears to be happening outside of mental health. While preliminary evidence speaks favourably for outcomes and acceptance of chatbots by patients, there is a lack of consensus in standards of reporting and evaluation of chatbots, as well as a need for increased ...

  6. An Overview of Chatbot-Based Mobile Mental Health Apps: Insights From

    Mental Health Chatbots as an Emerging Technology. A chatbot is a system that can converse and interact with human users using spoken, written, and visual languages [].In recent years, chatbots have been used more frequently in various industries, including retail [], customer service [], education [], and so on because of the advances in artificial intelligence (AI) and machine learning (ML ...

  7. (PDF) Artificial Intelligence-Enabled Chatbots in Mental Health: A

    Clinical applications of Artificial Intelligence (AI) for mental health care have experienced a meteoric rise in the past few years. AI-enabled chatbot software and applications have been ...

  8. An overview of the features of chatbots in mental health: A scoping

    Research regarding chatbots in mental health is nascent. There are numerous chatbots that are used for various mental disorders and purposes. Healthcare providers should compare chatbots found in this review to help guide potential users to the most appropriate chatbot to support their mental health needs. ... A chatbot is a system that is able ...

  9. PDF Artificial Intelligence-Enabled Chatbots in Mental Health: A Systematic

    Article are not a research article, e.g., an editorial note Non-related Chatbot and AI used to solve the problems that does not related to mental health Wrongly related Papers that not related to ...

  10. Chatbot features for anxiety and depression: A scoping review

    The number of mobile health (mHealth) apps often incorporate multiple techniques and features such as chatbots, such apps focused on mental health has rapidly increased; a 2015 World Health Organization (WHO) survey of 15,000 mHealth apps revealed that 29% focus on mental health diagnosis, treatment, or support. 10 Public health organisations ...

  11. Artificial Intelligence for Chatbots in Mental Health: Opportunities

    Abstract. With the help of artificial intelligence, the way humans are able to understand each other and give a response accordingly, is fed into the chatbot systems, i.e. into systems that are supposed to communicate with a user. The bot understands the user's query and triggers an accurate response. In the healthcare domain, such chatbot ...

  12. A Mental Health Chatbot with Cognitive Skills for Personalised

    Mental health issues are at the forefront of healthcare challenges facing contemporary human society. These issues are most prevalent among working-age people, impacting negatively on the individual, his/her family, workplace, community, and the economy. Conventional mental healthcare services, alth …

  13. To chat or bot to chat: Ethical issues with using chatbots in mental health

    When existing systems are repurposed or retired, an evaluative process of weighing risk and benefit should be repeated. It might also be worth considering patient and public involvement in mental health chatbot development and research 66,67 to anticipate and respond to risks, and to maximise the benefits, of chatbots for end-users.

  14. A Mental Health Chatbot with Cognitive Skills for Personalised

    1. Introduction. Close to a billion people worldwide have experienced a mental illness, ranging from the most common conditions of anxiety and depression to psychotic and personality disorders [].Mental illnesses cause a significant degradation of the affected individual's quality of life, as well as his/her contribution to society and the economy.

  15. Sensors

    Mental health issues are at the forefront of healthcare challenges facing contemporary human society. These issues are most prevalent among working-age people, impacting negatively on the individual, his/her family, workplace, community, and the economy. Conventional mental healthcare services, although highly effective, cannot be scaled up to address the increasing demand from affected ...

  16. The Rise of the Mental Health Chatbot

    The Value of Mental Health Chatbots. Mental illness impacts many facets of life. Within the workforce, 6.8% of people are directly impacted by depression and as a result $11,936 is lost annually on average per affected employee due to absenteeism, disability, and lack of productivity.

  17. [2201.05382] Mental Health Assessment for the Chatbots

    For a chatbot which responds to millions of online users including minors, we argue that it should have a healthy mental tendency in order to avoid the negative psychological impact on them. In this paper, we establish several mental health assessment dimensions for chatbots (depression, anxiety, alcohol addiction, empathy) and introduce the ...

  18. Chatbots to Support Mental Wellbeing of People Living in ...

    Mental Wellbeing Scales. Common mental health and wellbeing scales including CORE-10 (Barkham et al., 2013), PHQ-9 (Kroenke et al., 2001) and WEMWBS (Tennant et al., 2007) were shown to participants to identify positive and negative aspects and missing items which could help when it comes to choosing which scales to use in the chatbot.Overall, positive aspects that were discussed included that ...

  19. Artificial Intelligence Chatbot for Depression: Descriptive Study of

    Tess. Tess is a mental health chatbot designed by X2AI that is trained to react to the user's emotional needs by analyzing the content of conversations and learning about the user that she is chatting with. Users can chat with Tess in multiple ways, such as through text message conversations or Facebook Messenger.

  20. A Survey of Mental Health Chatbots using NLP

    We also propose MentalEase, a mobile application which uses NLP techniques to provide not only conversational aid but also a toolbox of helpful features to keep mental health in place. By integrating mental health assessment tools into the chatbot interface, along with regular therapy it can help patients deal with mild anxiety and depression ...

  21. Validity of Chatbot Use for Mental Health Assessment: Experimental

    Reliability of Chatbots for Mental Health Assessments. Table 2 displays the mean, SD, Cronbach α, and ICC for the mental health measures in each mode by condition. The ICCs of the paper-based, desktop-based, and chatbot modes were high and ranged between 0.96 and 1.00, indicating excellent agreement across modes and a high test-retest reliability.

  22. Chatbot for Mental health support using NLP

    Mental health issues are a growing concern worldwide, and seeking support for these issues can be difficult due to various reasons. Chatbots have emerged as a promising solution to provide accessible and confidential support to individuals facing mental health issues. With recent advances in technology, digital interventions designed to supplement or replace in-person mental health services ...