Significance of Phonological Features in Speech Emotion Recognition

  • Published: 15 July 2020
  • Volume 23 , pages 633–642, ( 2020 )

Cite this article

  • Wei Wang 1 ,
  • Paul A. Watters 2 ,
  • Xinyi Cao 1 ,
  • Lingjie Shen 1 &
  • Bo Li   ORCID: orcid.org/0000-0002-3330-8103 3  

476 Accesses

9 Citations

Explore all metrics

A novel Speech Emotion Recognition (SER) method based on phonological features is proposed in this paper. Intuitively, as expert knowledge derived from linguistics, phonological features are correlated with emotions. However, it has been found that they are seldomly used as features to improve SER. Motivated by this, we set our goal to utilize phonological features to further advance SER’s accuracy since they can provide complementary information for the task. Furthermore, we will also explore the relationship between phonological features and emotions. Firstly, instead of only based on acoustic features, we devise a new SER approach by fusing phonological representations and acoustic features together. A significant improvement in SER performance has been demonstrated on a publicly available SER database named Interactive Emotional Dyadic Motion Capture (IEMOCAP). Secondly, the experimental results show that the top-performing method for the task of categorical emotion recognition is a deep learning-based classifier which generates an unweighted average recall (UAR) accuracy of 60.02%. Finally, we investigate the most discriminative features and find some patterns of emotional rhyme based on the phonological representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

research journal article on the phonological features of a language

Similar content being viewed by others

research journal article on the phonological features of a language

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Priyadarsini Samal & Mohammad Farukh Hashmi

research journal article on the phonological features of a language

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Essam H. Houssein, Asmaa Hammad & Abdelmgeid A. Ali

research journal article on the phonological features of a language

Automatic speech recognition: a survey

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

Appiah, A. Y., Zhang, X., Ayawli, B. B. K., & Kyeremeh, F. (2019). Long short-term memory networks based automatic feature extraction for photovoltaic array fault diagnosis. IEEE Access, 7 , 30089–30101.

Article   Google Scholar  

Badshah, A. M., Rahim, N., Ullah, N., Ahmad, J., Muhammad, K., Lee, M. Y., S. KwonSung, Baik, W. (2017). Deep features-based speech emotion recognition for smart affective services. Multimedia Tools and Applications, pp 1–19.

Beckman, M. E., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The Original ToBI System and the Evolution of the ToBi Framework. Prosodic Typololgy: The Phonology of Intonation and Phrasing, pp. 9–54.

Bhowmik, T., & Mandal, S. K. D. (2018). Manner of articulation based Bengali phoneme classification. International Journal of Speech Technology, 21 (2), 233–250.

Busso, C., Bulut, M., & Lee, C. C. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42 (4), 335.

Cao, H. (2014). Prosodic cues for emotion: analysis with discrete characterization of intonation. Speech prosody. pp. 1147–1152.

Ekman, P. (1992). Are there basic emotions? Psychological Review, 99 (3), 550–553.

Eyben, F. (2010). Opensmile: the munich versatile and fast open-source audio feature extractor. In ACM International Conference on Multimedia , pp. 1459–1462.

Fayek, H. M., Lech, M., & Cavedon, L. (2017). Evaluating deep learning architectures for Speech Emotion Recognition. Neural Networks, S089360801730059X.

Fernandez, R. (2004). A computational model for the automatic recognition of affect in speech. Massachusetts Institute of Technology

Han, W. J. (2014). Review on speech emotion recognition. Journal of Software.

Huang, Z., Xue, W., Mao, Q., & Zhan, Y. (2017). Unsupervised domain adaptation for speech emotion recognition using PCANet. Multimedia Tools & Applications, 76 (5), 6785–6799.

Iliev, A. I., Zhang, Y., & Scordilis, M. S. (2007). Spoken emotion classification using ToBI features and GMM. In International Workshop on Systems, Signals and Image Processing, 2007 and Eurasip Conference Focused on Speech and Image Processing, Multimedia Communications and Services , pp. 495–498.

Jin, Q. Li, C., Chen, S. (2015). Speech emotion recognition with acoustic and lexical features. In IEEE International Conference on Acoustics , pp. 4749–4753

Lee, C. C., Mower, E., & Busso, C. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53 (9–10), 1162–1171.

Mariooryad, S., & Busso, C. (2013). Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Transactions on Affective Computing, 4 (2), 183–196.

Mirsamadi, S., Barsoum, E., Zhang, C. (2017). Automatic speech emotion recognition using recurrent neural networks with local attention. In IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 2227–2231.

Müller, C. (2010). The INTERSPEECH 2010 paralinguistic challenge. INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September, pp. 2794–2797.

Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58 , 240–242.

Rosenberg, A. (2010). AuToBI—A tool for automatic ToBI annotation. INTERSPEECH 2010, Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 146–149.

Schröder, M. (1992). ToBI: A standard for labeling English prosody. In International Conference on Spoken Language Processing , ICSLP 1992, Banff, Alberta, Canada.

Schuller, B., Steidl, S., & Batliner, A. (2009). The Interspeech 2009 Emotion Challenge. In INTERSPEECH 2009, Conference of the International Speech Communication Association , pp. 312–315.

Schuller, B., Vlasenko, B., & Eyben, F. (2009) Acoustic emotion recognition: A benchmark comparison of performances. In IEEE Workshop on Automatic Speech Recognition & Understanding . ASRU 2009, pp. 552–557, 2009.

Shah, M., Chakrabarti, C., Spanias, A. (2014). A multi-modal approach to emotion recognition using undirected topic models. IEEE International Symposium on Circuits and Systems, pp. 754–757,.

Shen, L., & Wang, W. (2018). Improving speech emotion recognition based on tobi phonological representation. In PATTERNS 2018, The Tenth International Conference on Pervasive Patterns and Applications, pp.1–5.

Download references

The project was funded by the Innovative Research Group Project of the National Natural Science Foundation of China (CN) (Grant No. BCA150054) and the Faculty Startup Funds Award of the University of Southern Mississippi (US).

Author information

Authors and affiliations.

School of Education Science, Nanjing Normal University, Nanjing, JS, 210097, China

Wei Wang, Xinyi Cao & Lingjie Shen

Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC, 3350, Australia

Paul A. Watters

School of Computer Sciences and Computer Engineering, University of Southern Mississippi, 730 East Beach Blvd, Long Beach, MS, 39560, USA

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Bo Li .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Wang, W., Watters, P.A., Cao, X. et al. Significance of Phonological Features in Speech Emotion Recognition. Int J Speech Technol 23 , 633–642 (2020). https://doi.org/10.1007/s10772-020-09734-7

Download citation

Received : 27 December 2019

Accepted : 08 July 2020

Published : 15 July 2020

Issue Date : September 2020

DOI : https://doi.org/10.1007/s10772-020-09734-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Speech emotion recognition
  • Phonological features
  • Feature analysis
  • Acoustic features
  • Find a journal
  • Publish with us
  • Track your research

ORIGINAL RESEARCH article

Constraints on novel word learning in heritage speakers.

Yuxin Ge,

  • 1 Linguistics and English Language, Lancaster University, Lancaster, United Kingdom
  • 2 Linguistics Research Centre, NOVA University Lisbon, Lisbon, Portugal
  • 3 Department of Spanish and Portuguese, University of Toronto, Toronto, ON, Canada
  • 4 LEAD Graduate School and Research Network, University of Tübingen, Tübingen, Germany
  • 5 Department of Psychology, Lancaster University, Lancaster, United Kingdom

Introduction: Recent research on word learning has found that adults can rapidly learn novel words by tracking cross-situational statistics, but learning is greatly influenced by the phonological properties of the words and by the native language of the speakers. Mandarin-native speakers could easily pick up novel words with Mandarin tones after a short exposure, but English-native speakers had specific difficulty with the tonal components. It is, however, unclear how much experience with Mandarin is needed to successfully use the tonal cue in word learning. In this study, we explored this question by focusing on the heritage language population, who typically are exposed to the target language at an early age but then develop and switch to another majority language. Specifically, we investigated whether heritage Mandarin speakers residing in an English-speaking region and speaking English as a dominant language would be able to learn novel Mandarin tonal words from statistical tracking. It helps us understand whether early exposure to the target feature is sufficient to promote the use of that feature in word learning later in life.

Methods: We trained 30 heritage Mandarin speakers with Mandarin pseudowords via a cross-situational statistical word learning task (CSWL).

Results and discussion: Heritage Mandarin speakers were able to learn the pseudowords across multiple situations, but similar-sounding words (i.e., minimal pairs) were more difficult to identify, and words that contrast only in lexical tones (i.e., Mandarin lexical tone) were distinguished at chance level throughout learning. We also collected information about the participants’ heritage language (HL) experience and usage. We did not observe a relationship between HL experience/usage and performance in tonal word learning, suggesting that HL exposure does not necessarily lead to an advantage in learning the target language.

Introduction

Language learners can rapidly pick up new words from the surrounding environment, most of the time without explicit instruction. This is impressive given the highly variable environment in which language learning happens. Quine (1960) illustrated this word learning challenge by referring to the well-known “Gavagai” conundrum. The first time a learner encounters a new word, the meaning is usually unclear because the word could refer to anything in the environment. Without any explicit information, the word-referent mapping is ambiguous. How do learners deal with this referential ambiguity problem in real life?

Research on statistical learning has found a potential solution to the Gavagai problem: child and adult learners can keep track of the linguistic information across multiple situations to aid word learning, an ability commonly referred to as cross-situational word learning (CSWL; e.g., Suanda and Namy, 2012 ; Monaghan et al., 2019 ; Rebuschat et al., 2021 ; Escudero et al., 2022 ). That is, when the same word occurs again, learners can track the always-co-occurring referent and, over time, form an association between the word and the referent. However, recent studies have shown that CSWL is greatly influenced by the phonological properties of the words ( Escudero et al., 2016 ; Tuninetti et al., 2020 ; Ge et al., 2024 ). Words that sound similar (e.g., phonological minimal pairs like bag vs. beg in English; pāo vs. gāo in Mandarin) generated difficulty in CSWL (e.g., Escudero et al., 2016 ), as well as the presence of non-native phonological features when adults learn an additional language (L2) via CSWL(e.g., Escudero et al., 2022 ; Ge et al., 2024 ; Ge et al., under review 1 ). For example, L1 Mandarin speakers could learn Mandarin pseudowords from CSWL exposure regardless of the existence of tonal minimal pairs, but L1 English speakers had great difficulty with these non-native minimal pairs ( Ge et al., 2024 ). This is because Mandarin-native speakers had extensive experience with the Mandarin tonal feature since childhood and could make use of the tonal categories in identifying words, but English-native speakers had no experience with tones and did not have the tonal representations. One question that arises is how much experience with the target feature would then be needed to develop the phonological representations and consequently use the feature in word learning.

To address this question, we targeted the heritage speaker population who are typically exposed to a minority (heritage) language at home in childhood, but start to rapidly acquire a different societal/majority language at the onset of school and become dominant in the societal/majority language. Specifically, we tested heritage speakers of Mandarin who were born to at least one Mandarin-speaking parent and resided in English-speaking countries from birth. These participants had early experience with the (Mandarin) tonal feature but then, later in life, had relatively limited use of lexical tones given that their majority language (English) is non-tonal. The performance of heritage speakers is particularly interesting because human sensitivity to sounds is largely shaped and tuned to their native languages at an early age, and hence experience with the target feature in early years might make a great difference even when exposure to the feature reduces later in life ( Kuhl, 2004 ; Hartshorne et al., 2018 ). To summarize, in this study, we examined whether and how heritage speakers learn novel words from their heritage language (HL) via statistical tracking, and how they are affected by sounds that only exist in their HL but not in the majority language (i.e., lexical tones). Additionally, we tested whether the degree of HL experience and usage has an impact on word learning outcomes.

Statistical word learning

Language learners can extract statistical regularities of different aspects of the language from the linguistic input (e.g., Maye and Gerken, 2000 ; Maye et al., 2002 , for sound discrimination; Saffran et al., 1996 , for word segmentation; see Siegelman, 2020 ; Isbilen and Christiansen, 2022 ; Williams and Rebuschat, 2022 , for reviews). As for word learning, this involves tracking word-referent co-occurrences across encounters. A cross-situational statistical learning paradigm has often been used to examine word learning under implicit learning conditions where there is ambiguity in words’ referents (e.g., Yu and Smith, 2007 ; Smith and Yu, 2008 ; Suanda et al., 2014 ; Rebuschat et al., 2021 ; Escudero et al., 2022 ). For example, in Yu and Smith’s (2007) seminal study, adult learners were first presented with multiple words and pictures in each learning trial, and then tested whether they could make use of the word-picture co-occurrence information across learning events to acquire the appropriate mappings. After only 6 min of exposure, learners could match pictures to words at above-chance level even in highly ambiguous conditions where four words and four pictures were presented in each learning trial.

However, this rapid learning effect has been found to reduce when there are phonological overlaps between words, which can be found in most vocabulary inventories (e.g., Escudero et al., 2016 , 2022 ; Tuninetti et al., 2020 ). For example, when being presented with two pictures and two minimal pair words in each learning trial, Escudero et al. (2016) reported that learners’ performance was inhibited—especially when the words were vowel minimal pairs (e.g., /dit/−/dɪt/)—compared to non-minimal pair presentations (e.g., /bɔn/−/dit/). This phonological similarity effect was even more profound when it came to L2 word learning. When the same CSWL task with English pseudo-minimal pairs (e.g., /dit/−/dɪt/, /bɔn/−/tɔn/) was presented to English-native and Mandarin-native speakers, it was observed that English-native speakers’ overall word learning performance was better than the Mandarin-native speakers in different minimal pair types ( Escudero et al., 2022 ). Thus, the existence of non-native English contrasts influenced Mandarin-native speakers’ word learning outcomes. Similar evidence came from Australian English speakers learning Dutch and Brazilian Portuguese pseudo-minimal pairs ( Tuninetti et al., 2020 ). Vowel minimal pairs were created based on Dutch and Brazilian Portuguese vowel inventories (e.g., /piχ/−/pyχ/, /fεfe/−/fefe/, respectively). As predicted, based on the Second Language Linguistic Perception model (L2LP— Escudero, 2005 ) and the Perceptual Assimilation-L2 model (PAM-L2— Best and Tyler, 2007 ), some of the vowel pairs were defined as perceptually easier as they could be mapped to two separate Australian English vowel categories (e.g., Dutch /i/−/ɑ/ contrast might be mapped to AusEnglish /i/−/ɔ/), and some other vowel pairs were classified as perceptually difficulty as they had no clear corresponding Australian English contrasts (e.g., Dutch /i/−/y/ contrast). Learners performed better with perceptually easy pairs compared to the difficult pairs, indicating that the degree of perceptual cross-linguistic similarity associated with non-native segments influenced non-native statistical word learning.

Ge et al. (2024) found that the non-native phonology effect in CSWL was not only associated with segmental but also suprasegmental features. In addition to the segmental minimal pairs as in previous research (e.g., Escudero et al., 2022 ), Ge et al. (2024) involved tonal minimal pairs (i.e., two words that differ only in lexical tone: /pa1mi1/ vs. /pa4mi1/ with numbers referring to Mandarin Tone 1 and Tone 4), which is a suprasegmental feature absent in non-tonal languages like English. A slightly different CSWL design is used to more closely resemble the minimal pairs learners encounter in the real world. Only one word was presented in each trial together with multiple referents, hence, minimal pairs were not presented side by side to participants in a single trial. This mirrors natural language learning situations in that minimal pairs tend not to occur in immediate proximity but need to be acquired by tracking the contrastive phonological features across situations. Through a short cross-situational exposure of 10 min, participants who were English-native speakers successfully identified word-referent mappings in consonantal, vocalic and non-minimal pairs, as the segmental features in the stimuli were designed to be familiar to English speakers, but not in the tonal pairs. Participants who were Mandarin-native speakers, on the other hand, were able to identify words in the tonal pairs after the same amount of exposure. These previous findings all suggest a significant role of phonology in statistical word learning and that L2 learners might encounter difficulty in picking up words from the environment because of the non-native sounds.

Such difficulty has been found even when specific phonetic (perceptual) training on the target non-native contrasts is included (Ge et al., under review) (See footnote 1). For example, in Ge et al., under review (See footnote 1), native speakers of English were provided with perceptual training on Portuguese consonant and vowel contrasts (e.g., /l/−/ʎ/, /n/−/ɲ/, /e/−/ɛ/, /o/−/ɔ/), and then trained on Portuguese pseudowords containing these contrasts via CSWL. The perceptual training did improve learners’ perceptual discrimination of the non-native contrasts, but this improvement did not transfer to word learning – the English-native speakers still had difficulty with non-native minimal pairs in word learning. This finding indicates that L2 learners’ difficulty comes from not simply perceptual issues, but also the lack of phonological representation of the novel sounds. As widely reported in infant speech development literature, during as early as the first year of life, humans start to tune in to their native sound system(s) and their sensitivity to non-native sounds and categories greatly reduces (e.g., Werker and Tees, 1984 ; Kuhl, 2004 ; Watson et al., 2014 ). This perceptual tuning persists into adulthood and might contribute to the difficulties in L2 word learning. Previous studies observed a phonetic-phonological-lexical continuity, indicating that categorical perception of non-native sounds was associated with performance in non-native word learning and processing (e.g., Wong and Perrachione, 2007 ; Ling and Grüter, 2022 ; Laméris et al., 2023 ). Hence, if the narrowing process in early years does play a significant role, one question that follows is whether exposure to the target language in early years would facilitate word learning (in the same language) later in life, as early exposure might allow learners to develop the necessary perceptual sensitivities and phonological categories.

A particular population that is perfect to study this research question is heritage speakers because of their special language profile. Like all native speakers of a language, heritage speakers have early exposure to the language, which would allow them to develop sensitivities to the language-specific phonological contrasts, but they switch to another dominant language after the early years and usually have limited HL use afterwards. It thus allows us to specifically test whether early exposure to the target language plays a role in later word learning. In other words, we explored whether heritage speakers’ phonological representations that are developed early in life remain accessible and help them learn new words from their HL in adulthood.

Phonological advantages in heritage speakers

HL research has observed phonological advantages among heritage speakers in both speech perception and production compared to late L2 learners, and closer performance to native speakers in some dimensions (e.g., Lukyanchenko and Gor, 2011 ; Chang, 2016 , for speech perception; Au et al., 2002 ; Chang et al., 2011 , for speech production; Flores et al., 2017 , for accentedness). For example, heritage Korean speakers who grew up in an English-speaking environment showed greater sensitivity to unreleased stops as it is an obligatory feature in Korean ( Chang, 2016 ). Although unreleased final stops are present in American English, it is not considered the canonical form and English speakers rely more on released stops in word recognition. It was found that heritage Korean speakers’ identification of the unreleased stops (in Korean and English) was comparable to L1 Korean speakers and was better than L1 English speakers. This suggests that early exposure to the phonological contrasts did persist into adulthood and facilitate sound recognition later in life. As for speech production, for instance, Chang et al. (2010) reported that compared to L2 Mandarin learners, heritage Mandarin speakers’ back vowel production (e.g., Mandarin /u/) was closer to native Mandarin speakers (though not the same). In addition to the segmental features, some research also found an advantageous performance in heritage speakers’ suprasegmental realizations (e.g., Yang, 2015 ; Chang and Yao, 2016 , 2019 , for lexical tone; Kim, 2020 , for lexical stress). Regarding lexical tone, for example, Yang (2015) examined the perception and production of Mandarin tones by native Mandarin speakers, heritage Mandarin speakers, and L2 learners. Heritage speakers’ perception of tones lay in between the native and the L2 groups: heritage speakers exhibited a more stable categorical perception of the four tones than L2 learners, although they do not completely resemble native Mandarin speakers’ perceptual patterns. Work on Mandarin speech production showed that heritage Mandarin speakers’ production of tones also fell in the intermediate state between native and L2 speakers in general ( Chang and Yao, 2016 ). In some dimensions, heritage speakers’ tonal production resembles more native speakers (e.g., T3 low falling-rising tone turning point), whereas in some other dimensions, heritage speakers’ production was in between the native and L2 groups (e.g., tone shortening in multisyllabic contexts). Overall, although heritage speakers do not pattern exactly the same as native speakers, much research evidence has shown that they are at least closer to native speakers in terms of speech perception and production than L2 learners are.

However, it is not clear if heritage speakers can make use of such phonological advantages at the lexical level to assist novel word learning in the HL. As discussed in the previous section, phonologically similar words pose difficulties for L2 learners when they lack the appropriate phonological representations. Here, we hypothesize that heritage speakers’ advantages in speech perception and recognition would further facilitate their acquisition of phonologically overlapping words in the target language. In this study, we focus on a suprasegmental feature that has been found to be difficult for late L2 learners in word learning—lexical tones ( Ge et al., 2024 ). L2 learners of Mandarin were found to fail in learning tonal minimal pair words from implicit exposure, whereas L1 Mandarin speakers could pick up novel tonal minimal pairs rapidly in the same situation. Our prediction is that heritage Mandarin speakers would be able to learn tonal minimal pairs to some extent because of their better categorical tonal perception, but whether they could match native speakers’ performance largely depends on their individual HL experience.

Research questions and predictions

In the current study, we investigate the cross-situational learning of Mandarin pseudowords by adult heritage speakers of Mandarin who were born and reside in English-speaking countries. The following research questions are addressed:

RQ1: Do minimal pairs and phonological contrasts that do not exist in heritage speakers’ majority language (i.e., the tonal contrasts) pose difficulty during cross-situational learning?
RQ2: Does the degree of heritage language experience and usage influence learning outcomes?

For RQ1, based on previous literature, we predicted that minimal pairs would be more difficult to learn compared to non-minimal pairs, and minimal pairs with phonological contrasts that do not exist in heritage speakers’ dominant language would generate the greatest difficulty in learning ( Escudero et al., 2022 ; Ge et al., 2024 ). Specifically, we predicted that minimal pairs that contrast in lexical tones would be the most difficult (i.e., with the lowest accuracy), followed by minimal pairs that differ in consonants and vowels. The non-minimal pairs would be relatively easy to learn. However, we expected the heritage Mandarin speakers to show some degree of learning of the tonal minimal pairs.

For RQ2, we predicted that greater experience and usage of HL would be associated with better learning of the tonal minimal pairs, as participants with greater Mandarin experience and usage would have more exposure to the tonal contrasts and might be more sensitive to the tonal minimal pairs.

Participants

Thirty bilingual speakers of Mandarin Chinese and English participated in this study. The sample size was inferred from Ge et al. (2024) , 2 where the same stimuli and CSWL task were used and a significant learning effect was observed. Participants were recruited through email advertisements within university communities in Toronto, Canada, and through Prolific. 3 Participants had to be at least 18 years old, bilingual speakers of English and Mandarin Chinese, and born in an English-speaking country (Canada or United States). An additional prerequisite was that participants needed to have at least one parent who was a native speaker of Mandarin Chinese. One participant was excluded because they were born in Hong Kong and only moved to an English-speaking country at the age of four. Thus, 29 participants were included in the data analysis (11 F, 17 M 1 preferred not to say). The mean age was 29.97 (SD = 8.60, ranging from 18 to 62 years). Regarding language background, 14 participants reported knowing additional languages/varieties other than Mandarin or English (e.g., Cantonese, 4 French, Italian, Shanghainese, and Spanish). Nine participants reported having one Mandarin-native parent, and 20 participants with two Mandarin-native parents. Further details on participants’ HL experience and use can be found in the results section.

Heritage language experience questionnaire

We collected information about participants’ HL (i.e., Mandarin) experience using Tomić et al.’s (2023) Heritage Language Experience Questionnaire (HeLEx). The questionnaire was designed to capture the quantity and quality of HL exposure and use in different social contexts (e.g., family, external family (i.e., family outside the household), work, community, leisure). It also asked for participants’ background information (e.g., gender, age, history of language learning, parents’ language) and educational information (e.g., language used at different levels of schooling). Additionally, there were questions regarding participants’ language attitudes and code-switching attitudes and behaviors, though we did not include these attitude-related questions in the analyses because language attitude is not the focus of the current study.

For the HeLEx data, we followed Tomić et al.’s (2023) instructions and derived a set of HL experience and usage measures, including HL experience (i.e., frequency of use) and proficiency 5 in four different modalities (reading, writing, speaking, listening), proportion of HL use in different social contexts (family, external family, work, community, leisure), language dominance, language entropy, 6 proportion of HL use when accounting for actual time spent in each context (i.e., weighted HL use), and diversity of HL interlocutors (i.e., proportion of HL proficient and/or dominant interlocutors).

Cross-situational word learning task

The CSWL task involved 12 pseudowords and 12 referent pictures. All pseudowords were disyllabic, with CVCV structures, which satisfies the phonotactic constraints of both Mandarin Chinese and English. The pseudowords contained phonemes that were similar between the two languages. The choice of the phonotactics and phonemes ensured that the target feature, lexical tone, was the only feature that exist in participants’ heritage language but not in the majority language. Each syllable in the pseudowords carried a lexical tone which was either Tone 1 (high-level) or Tone 4 (high-falling) in Mandarin Chinese, thus creating a simplified lexical tone system.

Six consonants /p, t, k, l, m, f/ and four vowels /a, i, u, ei/ were combined to form eight distinct base syllables (/pa, ta, ka, li, lu, lei, mi, fa/), which were further paired to form six minimally distinct base words (/pami, tami, kami, lifa, lufa, leifa/). Three of the base pseudowords differed in the consonant of the first syllable (/pami, tami, kami/) and the other three differed in the vowel of the first syllable (/lifa, lufa, leifa/). These base words were then superimposed with lexical tones. The first syllable of each of the six base words was paired with either T1 or T4, and the second syllable always carried T1. This created additional tonal minimal pair contrasts (e.g., /pa1mi1 / vs. /pa4mi1 / ). Therefore, a total of 12 pseudowords were created (full list shown in Table 1 ). The pseudowords (with their corresponding referent objects) were later paired to create consonantal, vocalic, tonal, and non-minimal pair trials, and each pseudoword-referent mapping could occur in different trial types based on the paired foil. All pseudowords had no corresponding meanings in English or Mandarin Chinese. The audio stimuli were produced by a female native speaker of Mandarin Chinese. The mean length of the audio stimuli was 800 ms.

www.frontiersin.org

Table 1 . Pseudowords in the consonant set and the vocalic set.

Twelve pictures of novel objects were selected from Horst and Hout’s (2016) NOUN database and used as referents. The pseudowords were randomly mapped to the objects, and we created four lists of word-referent mappings to minimize the influence of a particular mapping being easily memorisable. Each participant was randomly assigned to one of the mappings.

The visual and auditory stimuli are available at: https://osf.io/q6354/ .

All participants were directed to the experiment platform Gorilla 7 to complete the task and the questionnaire. After providing informed consent, participants completed the CSWL task, which took approximately 10 min. In the CSWL task, participants were told that they would hear one word and see two pictures of referent objects on the screen. Their task was to decide, as quickly and accurately as possible, which object the pseudoword referred to. They were instructed to press ‘Q’ on the keyboard if they thought the object on the left was the correct referent of the word and ‘P’ for the object on the right.

In each trial, participants first saw a fixation cross at the centre of the screen for 500 ms. They were then presented with two objects on the screen (one on the left side and one on the right) and were played a single pseudoword. After the pseudoword was played, participants were prompted to enter their response on the keyboard (Q or P). The objects remained on the screen during the entire trial, but the pseudoword was only played once. The next trial only started after participants made a choice for the current one. No feedback was provided after each response. We recorded the keyboard responses in each trial to calculate accuracy and response times. This allowed us to keep track of participants’ performance throughout the CSWL task, and hence there were no separate training and testing phases. Figure 1 provides an example of a CSWL trial.

www.frontiersin.org

Figure 1 . Example of cross-situational word learning (CSWL) trial. Participants were presented with two objects and played a single pseudoword. They had to decide if the pseudoword referred to the object on the left or the object on the right.

There were four types of CSWL trials. In non-minimal pair (non-MP) trials, the two objects presented on the screen referred to pseudowords that were phonologically distinct (e.g., /pa1mi1/ and /li4fa1/). In consonantal minimal pair (cMP) trials, the two objects on the screen referred to pseudowords that differed in only one consonant contrast (e.g., /pa1mi1/ and /ta1mi1/). In vocalic minimal pair (vMP) trials, the two objects referred to pseudowords that differed in only one vowel contrast (e.g., /li1fa1/ and /lu1fa1/). And in tonal minimal pair (tMP) trials, the two objects referred to pseudowords that differed only in lexical tone (e.g., /pa1mi1/ and /pa4mi1/). This manipulation allowed us to determine if and how phonological overlap between the pseudowords affected word learning. Each object was paired with different foils according to the trial type. For instance, the object for pa1mi1 was paired with the (foil) object for ta1mi1 in a consonantal minimal pair trial; and the same object for pa1mi1 was paired with the (foil) object for pa4mi1 in a tonal minimal pair trial.

Each participant completed six CSWL blocks, with each pseudoword-object mapping occurring twice per block. There were thus 24 trials per block, and 144 trials in total. The four trial types (non-MP, cMP, vMP, tMP) occurred six times per block. The order of trials within each block was randomized for each participant as was the sequence in which the six blocks occurred. The correct referent picture was presented on the left side in half of the trials and on the right side in the other half of the trials.

After the CSWL task, participants completed the HeLEx questionnaire. When all tasks were completed, participants recruited from Prolific were directed back to the Prolific website and were granted compensation. Participants recruited through emailing received the vouchers via email.

Data analysis

We excluded participants who failed to successfully complete the initial sound check (one participant failed, and 30 participants passed the sound check). We also excluded individual responses that lasted over 30 s (11 out of 4,176 individual responses were removed, leaving a total of 4,165 data points for analysis). This was because they failed to follow the instruction to respond as quickly and accurately as possible. After excluding these data points, we visualized the data using R ( R Core Team, 2022 ) for general descriptive patterns. We then used generalized linear mixed effects modeling for statistical data analysis. Mixed effects models were constructed from the null model (containing only random effects of item and participant) to models containing fixed effects, and the dependent variable was accuracy in the CSWL task. We tested if each of the fixed effects of trial type, block, and their interaction improved model fit using log-likelihood comparisons between models. A quadratic effect of block was also tested for its contribution to model fit, as learning may have been non-linear over training. Additionally, we tested if adding the derived measures from the HeLEx questionnaire as fixed effect to the mixed-effect models improved model fit.

The anonymized data and R scripts are available at: https://osf.io/q6354/ .

Performance on the cross-situational word learning task

Figure 2A presents the overall proportion of correct responses in the CSWL task. Participants performed significantly above chance from Block 1 (mean accuracy = 0.59, t  = 4.61, p  < 0.001). For the different minimal pair trials ( Figure 2B ), accuracy was the highest in non-minimal pair trials, followed by consonantal and vocalic minimal pair trials. Performance in the tonal trials was the lowest and remained close to chance level (0.53) until the end of the CSWL task.

www.frontiersin.org

Figure 2 . Mean proportion of correct pictures selected in each learning block—overall (A) and in different trial types (B) . The dotted line represents chance level. Error bars represent 95% confidence intervals.

We ran generalized linear mixed effects models to examine performance accuracy across learning blocks. Compared to the model with only random effects, adding the fixed effect of learning block did not improve model fit significantly (χ 2 (1) = 0.944, p  = 0.331). Adding trial type (consonant, vowel, tone, non-minimal pair) improved model fit (χ 2 (3) = 28.298, p  < 0.001), but the block*trial type interaction (χ 2 (3) = 4.365, p  = 0.225) did not improve fit further. This indicates that the overall performance differed significantly across trial types, but the learning trajectories (i.e., improvement across blocks) did not differ significantly in different trial types. The quadratic effect for block did not result in a significant difference (χ 2 (4) = 2.109, p  = 0.716). The best-fitting model is reported in Table 2 . Note that, whereas block did not contribute to explaining variance significantly when considered as a single fixed effect, it was significant in the model when trial type was also included (as shown in Table 2 ).

www.frontiersin.org

Table 2 . Best fitting model for accuracy in CSWL, showing fixed effects.

We computed a set of measures of HL use derived from the four modalities (reading, writing, speaking, hearing) and five contexts (family, external family, work, community, leisure) of language use. Tables 3 , 4 summarize the results.

www.frontiersin.org

Table 3 . Heritage language experience across four modalities.

www.frontiersin.org

Table 4 . Heritage language (Mandarin) use in five contexts.

Participants reported higher Mandarin proficiency and use in speaking and hearing compared to reading and writing. As for language dominancy, only one participant reported to be Mandarin-dominant in speaking and another participant being Mandarin-dominant in hearing/understanding. Overall, more participants were dominant in English in all modalities. In terms of the context of language use, participants reported more Mandarin use with families and external families, and relatively little Mandarin use in working conditions.

The relationship between heritage language background and CSWL

To investigate whether the proficiency and use of Mandarin influence the outcomes in learning novel tonal words (i.e., performance at the final block), we ran several sets of mixed-effect models with the derived measures from HeLEx as fixed effects.

For the measure of Mandarin use across modalities, we carried out three sets of analyses to explore the fixed effects of (1) Mandarin proficiency, (2) frequency of Mandarin usage, (3) usage-based and proficiency-based Mandarin dominance in the four modalities. ANOVA comparison between models containing fixed effects and the random effect model showed no significant differences, indicating that none of these fixed effects significantly explain variance in word learning outcomes.

As for the measures of Mandarin use in the five contexts, we ran four sets of analyses and tested if (1) the proportion of Mandarin use, (2) the proportion of Mandarin interaction, (3) language entropy, (4) the weighted proportion of Mandarin use (accounting for the actual time spent in each context) in the different contexts explained performance in the tonal trials. However, we did not find any significant predictors of performance from the derived measures.

Exploratory analyses

Since we did not observe any significant influence of the individual HeLEx measures on participants’ learning outcomes in tonal trials, we carried out additional exploratory analyses based on other responses in the questionnaire. Firstly, we explored if having one or two Mandarin-native parent influences learners’ performance, as having two Mandarin-native parents may provide a more Mandarin-dominant environment at home. Mixed-effects models containing parent language as a fixed effect showed no significant improvement compared to the random effect model (χ 2 (1) = 0.0801, p  = 0.78). This means that the number of Mandarin-speaking parent did not explain variance in word learning outcome. Secondly, we coded whether or not participants used Mandarin at preschool, primary school, secondary school, post-secondary and post-graduate levels, and extracurricular Mandarin classes to test the effect of Mandarin schooling. Model comparisons revealed no significant effect of any of the variables.

Exploratory factor analysis

Given the large number of observed variables derived from the questionnaire, we decided to carry out an exploratory factor analysis and examine whether some of the variables could be grouped into a smaller number of factors for further analyses. We planned to run two rounds of factor analysis, one for the modality-related variables (see Table 3 ) and another for context-related variables (see Table 4 ). This is because mixing the variables across modalities and the variables across contexts might make the resulting factors less interpretable.

For the modality-related variables, we first checked the correlations between HL experience and experience-based dominance measures, as well as between HL proficiency and proficiency-based dominance measures. The results suggested that the measures are very strongly correlated ( r  > 0.90), which was expected because they were derived from the same set of original questions. Thus, we took out the dominance measures and only entered HL experience and HL proficiency across modalities into the factor analysis. The exploratory factor analysis suggested three factors: Factor 1 relates to measures of written language experience and proficiency (i.e., reading/writing experience, reading/writing proficiency), Factor 2 relates to measures of oral language experience (i.e., speaking/hearing experience), and Factor 3 relates to measures of oral language proficiency (i.e., speaking/hearing proficiency). Table 5 summarizes the output factor loadings of each measure.

www.frontiersin.org

Table 5 . Factor loadings for modality-related variables.

We then entered the three factors as fixed effects into the generalized mixed effect models mentioned above to explore if the grouped factors predicted participants’ learning outcomes. Similar to our previous findings, ANOVA comparisons between models containing fixed effects of the three factors and the random effect model showed no significant differences, meaning that the three modality-related factors did not significantly explain variance in word learning outcomes.

In addition, we ran a decision tree analysis to explore and visualize the hierarchical contribution of the three factors to word learning outcomes. Figure 3 presents the results of the decision tree model. Higher Factor 2 score (oral experience) and Factor 1 score (written experience and proficiency) seemed to lead to a path to higher accuracy in tonal trials at the final block (when Factor 2 > = 0.49 and Factor 1 > = 0.31, accuracy = 0.75), though only a small proportion of data fell under this rule. Overall, however, the decision tree model did not provide clear relations between the factors and the tonal word learning outcomes.

www.frontiersin.org

Figure 3 . Decision tree model based on the three modality-related factors.

We then tried to fit the same factor analysis and follow-up tests on the context-related measures. However, there was no good factor solution for the context-related measures (Kaiser-Meyer-Olkin test suggested that data was not suitable for factor analysis)—indicating that the individual measures of context of use should be kept separate. Thus, no further analyses based on the derived factors were conducted.

Comparison with English-native and Mandarin-native participants

To further understand Mandarin heritage speakers’ word learning performance, we ran exploratory analyses combining data from the current study and data from Ge et al. (2024) since the two studies employed the same method and stimuli. This allowed us to compare Mandarin heritage speakers’ learning trajectory with English-native participants (who had no tonal experience) and Mandarin-native participants (who had continuous, extensive tonal experience). Generalized linear mixed effects models revealed that, compared to the model with only random effects, adding the fixed effect of block (χ 2 (1) = 21.012, p  < 0.001), trial type (χ 2 (3) = 28.532, p  < 0.001), and the 3-way block*trial type*language group interaction (χ 2 (11) = 42.459, p  < 0.001) significantly improve model fit. The effect of language group (English-native, Mandarin-native, Mandarin heritage) did not improve fit (χ 2 (2) = 0.824, p  = 0.662).

We then explored the 3-way interaction in detail and ran separate mixed effects models for each trial type to test whether the group performances differed in any particular trial types. In the tonal trials, we observed a significant effect of language group (χ 2 (2) = 6.851, p  = 0.033). The effect of block (χ 2 (1) = 3.386, p  = 0.066) and the block*language group interaction (χ 2 (2) = 0.020, p  = 0.990) was not significant. The best-fitting model summarized in Table 6 shows that the Mandarin-native group performed significantly better than the English-native group (the reference group) in tonal trials, whereas the Mandarin heritage group did not show significant divergence from the English-native group. This language group effect, however, was not significant in other trial types (consonantal χ 2 (2) = 3.370, p  = 0.185; vocalic χ 2 (2) = 2.254, p  = 0.324; non-minimal pair χ 2 (2) = 3.149, p  = 0.207).

www.frontiersin.org

Table 6 . Best fitting model for accuracy in tonal trials, combining data from the present study and data from Ge et al. (2024) .

In this study, we explored how heritage speakers learn novel words from their HL via a cross-situational, statistical learning process and whether the degree of HL experience predicts learning outcomes. Heritage speakers could rapidly learn words that contain special phonological features which exist only in their HL but not in their dominant language (i.e., lexical tone for heritage Mandarin speakers residing in English-speaking environments). However, when this specific feature is the only informative cue to distinguish words (i.e., in the case of tonal minimal pairs), heritage speakers seem to encounter greater difficulties.

RQ1: Do minimal pairs and phonological contrasts that do not exist in heritage speakers’ majority language pose difficulty during cross-situational learning?

Results suggested that learners’ performance was greatly influenced by the presence of minimal pair words. As predicted, learners performed better in non-minimal pair trials as compared to minimal pair trials, which is consistent with previous findings on CSWL of minimal pairs in other languages (e.g., Escudero et al., 2022 ). Moreover, we observed a difference in performance on segmental minimal pairs and tonal minimal pairs. Heritage Mandarin speakers’ performance in tonal minimal pair trials was the lowest and remained at chance level throughout the experiment, whereas performance in consonantal and vocalic minimal pair trials improved over time. The lack of learning effect in tonal trials was contrary to our prediction that early exposure to Mandarin would allow the heritage speakers to develop tonal representations and be able to use tonal cues in word learning. Our combined data analysis with Ge et al. (2024) demonstrated that the Mandarin heritage speakers’ learning pattern was similar to English-native speakers with no tonal experience, where tonal minimal pairs were particularly difficult, and performance in tonal trials was significantly lower than that of Mandarin-native speakers.

These findings could be explained from two perspectives – the nature of the stimuli and the participants’ language profile. Firstly, the stimuli in the experiment were designed to have segments that are similar between English (the dominant language) and Mandarin (the heritage language), and also include a tonal feature that is specific to Mandarin. Since our participants were English-dominant, they might weigh more the segmental cues in their linguistic repertoire and attend more to the segmental features in the task. Previous research also suggested that even Mandarin-native speakers tend to rely more on segmental than tonal information in word processing (e.g., Cutler and Chen, 1997 ; Yip, 2001 ; Sereno and Lee, 2015 ). This might contribute to the divergence in the learning trajectories of segmental and tonal minimal pairs. Secondly, although the group of heritage speakers we recruited reported relatively high proficiency in Mandarin listening (rating 3.34 out of 4) and speaking (rating 2.86 out of 4), they were still significantly more dominant in English in all language modalities (see Table 3 , HL dominance), and had very little Mandarin use outside of the family (including external family) context (see Table 4 ). This might explain why their performance in the learning task at the group level resembles that of the English-native speakers in previous research ( Ge et al., 2024 ).

Furthermore, considering previous findings on heritage Mandarin speakers’ perception and production of Mandarin tones (e.g., Chang and Yao, 2016 , 2019 ), there is another possibility that derives from heritage speakers’ distinct tonal representations. Although heritage speakers of Mandarin tend to possess categorical representations of tones that are closer to native Mandarin speakers, they are usually not entirely the same as native speakers (e.g., Yang, 2015 ). Therefore, even though the heritage Mandarin speakers in the experiment possess sensitivity to tonal variations, their categorization of the specific contrast (i.e., T1–T4) might be different from the native speakers in certain acoustic dimensions, resulting in the difficulty in tonal minimal pair learning. Additionally, the selection of the tones used in the stimuli was based on previous experiment testing English-native speakers’ identification of Mandarin tones. Hao (2018) reported that English-native learners of Mandarin could identify T1 and T4 at word-initial positions better compared to T2 and T3, and hence these tones are likely to be easier in the disyllabic environment of this experiment. However, it is possible that the identification difficulty of the tones is different for heritage Mandarin speakers. Further research is needed to examine how tonal contexts (the preceding and following tones) affect heritage speakers’ perception in particular.

According to the HeLEx questionnaire results, we did not find a clear relationship between participants’ Mandarin experience or usage and their performance in the tonal word learning task. Specifically, the derived measures from the questionnaire did not predict how well participants respond to tonal minimal pairs. The questionnaire measures focused on how much and how well participants use Mandarin in their daily communications, that is, the use of Mandarin in various contexts. When using Mandarin for communicative purposes, lexical tones are not the only focus because information from the context can be delivered even when lexical tones are not always correctly realized. However, in the word learning task, there was no contextual information and participants had to learn isolated words. For the tonal minimal pair trials in particular, a misperception of lexical tone would lead to failure in word identification. It is possible that heritage Mandarin speakers might rely more on contextual information in tonal perception than native speakers. Thus, a direct link between the questionnaire measures and the word learning outcomes was missing because they measured tonal abilities in different communicative situations.

Another noteworthy finding is that our factor analysis suggested a grouping of the derived measures of HL modality use, highlighting a distinction between written and oral language proficiency and use. Questionnaires like HeLEx usually contain a large number of measures to thoroughly record participants’ language profiles. Our results suggested that some individual measures (even across the original categories) could be highly correlated and hence reasonably grouped into one single factor to facilitate further statistical analyses and predictions of the influence of HL on learning and behavior.

Limitations and further directions

In the CSWL task, learning performance reflects the combined abilities at both the perceptual and lexical levels. Since we do not have a separate measure of tonal perception, it is unclear whether the difficulty comes from heritage Mandarin speakers’ different tonal representations and categorizations. Thus, further studies could add tone identification tasks to examine whether more accurate identification would be associated with better word learning. It would also be interesting to test tone identification at both the pre-lexical level (e.g., identification of isolated tonal syllables without meaning) and the lexical level (e.g., identification of tones in real words), since it indicates how well participants process tonal information when meanings are attached. Moreover, it would be worth testing whether greater HL experience and usage is directly linked to better tone identification ability.

Furthermore, it would be interesting to recruit participants from more diverse HL backgrounds. In our current sample, most participants were highly English-dominant. Future studies could compare whether heritage speakers who are more balanced in their English and Mandarin proficiency would perform differently and be more able to learn the tonal minimal pairs.

We found that heritage speakers of Mandarin learned Mandarin novel words in a similar pattern to English-native learners of Mandarin. They could pick up new words from a short exposure by tracking the statistics of input, but learning was reduced when minimal pairs were present. The greatest difficulty was associated with tonal minimal pairs. The degree of HL experience and usage did not seem to predict tonal word learning outcomes. Our results contribute to the understanding of heritage speakers’ behaviors when learning and processing the target language. It suggests that heritage exposure does not necessarily lead to an advantage in learning the target language, and the amount of exposure may not be the key factor influencing learning outcomes, though further research into the role of diverse HL exposure is needed.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: Open Science Framework (OSF): https://osf.io/q6354/ .

Ethics statement

The studies involving humans were approved by Faculty of Arts and Social Sciences and Lancaster Management School’s Research Ethics Committee Lancaster University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

YG: Conceptualization, Formal analysis, Methodology, Writing – original draft. AR: Conceptualization, Methodology, Writing – review & editing. PR: Conceptualization, Methodology, Supervision, Funding acquisition, Writing – review & editing. PM: Formal analysis, Writing - review & editing, Supervision.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. We gratefully acknowledge the financial support provided by Lancaster University’s Camões Institute Cátedra for Multilingualism and Diversity, the Foundation for Science and Technology (FCT, grant reference [2022.04013.PTDC]), and the Linguistics Research Centre of NOVA University Lisbon (CLUNL, UIDB/LIN/03213/2020 and UIDP/LIN/03213/2020 funding program).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

1. ^ Ge, Y., Correia, S., Fernandes, J., Hanson, K., Rato, A., and Rebuschat, P. (under review). Does phonetic training benefit word learning? Available at: https://osf.io/preprints/psyarxiv/5zspu .

2. ^ The power analysis of Ge et al.’s (2024) study with the same CSWL task is available at: https://osf.io/2j6pe/ .

3. ^ www.prolific.com

4. ^ Among these additional languages, Cantonese and Shanghainese are tonal. Thus, we carried out an analysis to test whether the eight participants who spoke additional tonal languages performed differently from the others who did not know other tonal languages. However, adding additional tonal experience as a fixed effect in our model on CSWL accuracy did not significantly improve model fit (χ 2 (1) = 0, p  = 1), nor did the 3-way interaction between block, additional tonal experience and trial type (χ 2 (7) = 11.177, p  = 0.131). Thus, for the main analyses, we will not include additional tonal experience as a factor.

5. ^ HL experience was calculated from questions on frequency of HL use, for example, how often do you speak it. HL proficiency was based on questions such as how well do you speak it.

6. ^ Language entropy measures the level of language diversity in a particular context (e.g., family, external family, work, community, leisure) ( Gullifer and Titone, 2020 ; Tomić et al., 2023 ). Higher language entropy in a given context means higher diversity in language use.

7. ^ www.gorilla.sc

Au, T. K., Knightly, L. M., Jun, S.-A., and Oh, J. S. (2002). Overhearing a language during childhood. Psychol. Sci. 13, 238–243. doi: 10.1111/1467-9280.00444

PubMed Abstract | Crossref Full Text | Google Scholar

Best, C. T., and Tyler, M. D. (2007). “Nonnative and second-language speech perception: commonalities and complementarities” in Second language speech learning: The role of language experience in speech perception and production . eds. M. J. Munro and O.-S. Bohn (Amsterdam: John Benjamins), 13–34.

Google Scholar

Chang, C. B. (2016). Bilingual perceptual benefits of experience with a heritage language. Biling. Lang. Congn. 19, 791–809. doi: 10.1017/S1366728914000261

Crossref Full Text | Google Scholar

Chang, C. B., Haynes, E. F., Yao, Y., and Rhodes, R. (2010). “The phonetic space of phonological categories in heritage speakers of Mandarin” in Proceedings from the 44th annual meeting of the Chicago linguistic society: The Main session . eds. M. Bane, J. Bueno, T. Grano, A. Grotberg, and Y. McNabb (Chicago, IL: Chicago Linguistic Society), 31–45.

Chang, C. B., and Yao, Y. (2016). Toward an understanding of heritage prosody: acoustic and perceptual properties of tone produced by heritage, native, and second language speakers of mandarin. Herit. Lang. J. 13, 134–160. doi: 10.46538/hlj.13.2.4

Chang, C. B., and Yao, Y. (2019). “Production of neutral tone in mandarin by heritage, native, and second language speakers” in Proceedings of the 19th international congress of phonetic sciences . eds. S. Calhoun, P. Escudero, M. Tabain, and P. Warren (Canberra, Australia: Australasian Speech Science and Technology Association Inc.), 2291–2295.

Chang, C. B., Yao, Y., Haynes, E. F., and Rhodes, R. (2011). Production of phonetic and phonological contrast by heritage speakers of mandarin. J. Acoust. Soc. Am. 129, 3964–3980. doi: 10.1121/1.3569736

Cutler, A., and Chen, H.-C. (1997). Lexical tone in Cantonese spoken-word processing. Percept. Psychophys. 59, 165–179. doi: 10.3758/BF03211886

Escudero, P. (2005). Linguistic perception and second language acquisition: Explaining the attainment of optimal phonological categorization . [Doctoral dissertation, Utrecht University]. LOT Dissertation Series 113.

Escudero, P., Mulak, K. E., and Vlach, H. A. (2016). Cross-situational learning of minimal word pairs. Cogn. Sci. 40, 455–465. doi: 10.1111/cogs.12243

Escudero, P., Smit, E. A., and Mulak, K. E. (2022). Explaining L2 lexical learning in multiple scenarios: cross-situational word learning in L1 Mandarin L2 English speakers. Brain Sci. 12:1618. doi: 10.3390/brainsci12121618

Flores, C., Rinke, E., and Rato, A. (2017). Comparing the outcomes of early and late acquisition of European Portuguese: an analysis of morpho-syntactic and phonetic performance. Herit. Lang. J. 14, 124–149. doi: 10.46538/hlj.14.2.2

Ge, Y., Monaghan, P., and Rebuschat, P. (2024). The role of phonology in non-native word learning: evidence from cross-situational statistical learning. Biling. Lang. Congn. 1–16. doi: 10.1017/S1366728923000986

Gullifer, J. W., and Titone, D. (2020). Characterizing the social diversity of bilingualism using language entropy. Biling. Lang. Congn. 23, 283–294. doi: 10.1017/S1366728919000026

Hao, Y. C. (2018). Contextual effect in second language perception and production of mandarin tones. Speech Comm. 97, 32–42. doi: 10.1016/j.specom.2017.12.015

Hartshorne, J. K., Tenenbaum, J. B., and Pinker, S. (2018). A critical period for second language acquisition: evidence from 2/3 million English speakers. Cognition 177, 263–277. doi: 10.1016/j.cognition.2018.04.007

Horst, J. S., and Hout, M. C. (2016). The novel object and unusual name (NOUN) database: a collection of novel images for use in experimental research. Behav. Res. Methods 48, 1393–1409. doi: 10.3758/s13428-015-0647-3

Isbilen, E. S., and Christiansen, M. H. (2022). Statistical learning of language: a Meta-analysis into 25 years of research. Cogn. Sci. 46:e13198. doi: 10.1111/cogs.13198

Kim, J. (2020). Discrepancy between heritage speakers' use of suprasegmental cues in the perception and production of Spanish lexical stress. Biling. Lang. Congn. 23, 233–250. doi: 10.1017/S1366728918001220

Kuhl, P. K. (2004). Early language acquisition: cracking the speech code. Nat. Rev. Neurosci. 5, 831–843. doi: 10.1038/nrn1533

Laméris, T. J., Llompart, M., and Post, B. (2023). Non-native tone categorization and word learning across a spectrum of L1 tonal statuses. Biling. Lang. Congn. , 1–15. doi: 10.1017/S1366728923000871

Ling, W., and Grüter, T. (2022). From sounds to words: the relation between phonological and lexical processing of tone in L2 mandarin. Second. Lang. Res. 38, 289–313. doi: 10.1177/0267658320941546

Lukyanchenko, A., and Gor, K. (2011). “Perceptual correlates of phonological representations in heritage speakers and L2 learners” in Proceedings of BUCLD 35, vol. 2 . eds. N. Danis, K. Mesh, and H. Sung (Somerville: Cascadilla Press), 414–426.

Maye, J., and Gerken, L. (2000). Learning phonemes without minimal pairs. Proceedings of the 24th annual Boston university conference on language development (Vol. 2, pp. 522–533).

Maye, J., Werker, J. F., and Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition 82, B101–B111. doi: 10.1016/S0010-0277(01)00157-3

Monaghan, P., Schoetensack, C., and Rebuschat, P. (2019). A single paradigm for implicit and statistical learning. Top. Cogn. Sci. 11, 536–554. doi: 10.1111/tops.12439

Quine, W. V. O. (1960). Word and object . Cambridge, MA: MIT Press.

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: https://www.R-project.org/ .

Rebuschat, P., Monaghan, P., and Schoetensack, C. (2021). Learning vocabulary and grammar from cross-situational statistics. Cognition 206:104475. doi: 10.1016/j.cognition.2020.104475

Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science 274, 1926–1928. doi: 10.1126/science.274.5294.1926

Sereno, J. A., and Lee, H. (2015). The contribution of segmental and tonal information in mandarin spoken word processing. Lang. Speech 58, 131–151. doi: 10.1177/0023830914522956

Siegelman, N. (2020). Statistical learning abilities and their relation to language. Lang. Linguist. Compass 14:e12365. doi: 10.1111/lnc3.12365

Smith, L., and Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition 106, 1558–1568. doi: 10.1016/j.cognition.2007.06.010

Suanda, S. H., Mugwanya, N., and Namy, L. L. (2014). Cross-situational statistical word learning in young children. J. Exp. Child Psychol. 126, 395–411. doi: 10.1016/j.jecp.2014.06.003

Suanda, S. H., and Namy, L. L. (2012). Detailed behavioral analysis as a window into cross-situational word learning. Cogn. Sci. 36, 545–559. doi: 10.1111/j.1551-6709.2011.01218.x

Tomić, A., Rodina, Y., Bayram, F., and De Cat, C. (2023). Documenting heritage language experience using questionnaires. Front. Psychol. 14:1131374. doi: 10.3389/fpsyg.2023.1131374

Tuninetti, A., Mulak, K. E., and Escudero, P. (2020). Cross-situational word learning in two foreign languages: effects of native language and perceptual difficulty. Front. Commun. 5:602471. doi: 10.3389/fcomm.2020.602471

Watson, T. L., Robbins, R. A., and Best, C. T. (2014). Infant perceptual development for faces and spoken words: an integrated approach. Dev. Psychobiol. 56, 1454–1481. doi: 10.1002/dev.21243

Werker, J. F., and Tees, R. C. (1984). Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav. Dev. 7, 49–63. doi: 10.1016/S0163-6383(84)80022-3

Williams, J. N., and Rebuschat, P. (2022). “Implicit learning and SLA: a cognitive psychology perspective” in The Routledge handbook of second language acquisition and psycholinguistics . eds. A. Godfroid and H. Hopp (New York, NY: Routledge)

Wong, P. C., and Perrachione, T. K. (2007). Learning pitch patterns in lexical identification by native English-speaking adults. Appl. Psycholinguist. 28, 565–585. doi: 10.1017/S0142716407070312

Yang, B. (2015). Perception and production of mandarin tones by native speakers and L2 learners . Berlin, Germany: Springer Verlag.

Yip, M. (2001). Phonological priming in Cantonese spoken-word processing. Psychologia 44, 223–229. doi: 10.2117/psysoc.2001.223

Yu, C., and Smith, L. (2007). Rapid word learning under uncertainty via cross-situational statistics. Psychol. Sci. 18, 414–420. doi: 10.1111/j.1467-9280.2007.01915.x

Keywords: statistical learning, cross-situational word learning, heritage speaker, heritage language phonology, lexical tone

Citation: Ge Y, Rato A, Rebuschat P and Monaghan P (2024) Constraints on novel word learning in heritage speakers. Front. Psychol . 15:1379736. doi: 10.3389/fpsyg.2024.1379736

Received: 01 February 2024; Accepted: 04 April 2024; Published: 17 April 2024.

Reviewed by:

Copyright © 2024 Ge, Rato, Rebuschat and Monaghan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yuxin Ge, [email protected]

† These authors have contributed equally to this work and share senior authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Int J Environ Res Public Health

Logo of ijerph

Phonological Awareness as the Foundation of Reading Acquisition in Students Reading in Transparent Orthography

Vesela milankov.

1 Department of Special Education and Rehabilitation, Faculty of Medicine, University of Novi Sad, 21000 Novi Sad, Serbia; [email protected]

Slavica Golubović

2 Faculty of Special Education and Rehabilitation, University of Belgrade, 11000 Belgrade, Serbia; [email protected]

Tatjana Krstić

3 Department of Psychology, Faculty of Medicine, University of Novi Sad, 21000 Novi Sad, Serbia; [email protected]

Špela Golubović

Associated data.

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Phonological skills have been found to be strongly related to early reading and writing development. Accordingly, the aim of this study was to examine the extent to which the development of phonological awareness facilitates reading acquisition in students learning to read a transparent orthography. Our research included 689 primary school students in first through third grade (Mean age 101.59 months, SD = 12,690). The assessment tools used to conduct this research include the Phonological Awareness Test and the Gray Oral Reading Test. According to the results from the present study, 13.7% of students have reading difficulties. Students with reading difficulties obtained low scores in phonological awareness within each subscale compared to students who do not have reading difficulties ( p < 0.01). Components of phonological awareness which did not singled out as strongly related to early reading success include Phoneme Segmentation, Initial Phoneme Identification, and Syllable Merging. Thus, understanding the nature of the relationship between phonological awareness and reading should help effective program design that will be aimed at eliminating delayed development in children’s phonological awareness while they are still in preschool.

1. Introduction

During their development, a large number of children do not experience difficulties in learning early literacy skills. However, a number of children struggle with reading challenges over a long period of time, whereas some of them do not even reach the expected level of these skills [ 1 ]. Orthographic transparency is an important factor influencing reading acquisition and refers to the relationship between written symbols—graphemes representing speech sounds—and phonemes. The research on alphabet writing systems indicates that in language, orthographic transparency is increasingly recognized as important when determining the degree of difficulties with learning to read. A transparent orthography has a simple one-to-one relationship while less transparent orthographies are those in which the relationship is more complex.

Orthographic decoding is not equal in all languages. Arabic and Hebrew writing systems, for example, require more developed visuospatial abilities and better visual attention skills, necessary for decoding, compared to those in English [ 2 , 3 , 4 , 5 , 6 ].

According to the orthographic depth hypothesis, shallow orthographies are more easily able to support a word recognition process that includes phonology, and children learn to read relatively quickly. Phonological awareness and the ability to decode words into their constituent sounds are cited as predictors of literacy. In contrast, deep orthographies encourage a reader to process printed words by referring to their morphology through the visual-spelling structure of the printed word, thus having much more difficulty in decoding words. As a result, reading is slower to learn [ 7 ].

The tradition of literacy in the Republic of Serbia is neither as long nor as rich as, for example, in England, Germany, France or Russia, but the orthography of the Serbian language is simple and transparent and should not create problems in mastering it. Nevertheless, a large number of completely or functionally illiterate individuals or social groups are registered in Serbia. What is worrying is the large percentage of children who remain functionally illiterate [ 8 ].

Over the last thirty years, particular attention has been paid to phonological and phonemic awareness that lays the foundation for success in acquiring and developing reading skills [ 9 , 10 , 11 , 12 , 13 , 14 , 15 ].

Phonological awareness involves the auditory and oral manipulation of sounds, whereas phonemic awareness refers to the ability to understand the relationship between the written symbols—letters that represent the sounds in spoken words—and it could be therefore said that a phonological awareness is a broader notion than phonemic awareness [ 16 , 17 , 18 ].

Phonological awareness is the ability to identify, process, and manipulate phonological units that compose spoken words of different complexity and size [ 19 ]. It includes the understanding of different ways that the words in our spoken language can be broken down into their various components which can be then manipulated [ 1 ].

Phonological awareness in children, especially in the early stages of reading, improves and accelerates learning to read, and at the age of six it is a strong predictor of their future reading ability. Hence, a child’s level of phonological awareness acquisition accounts for the child’s readiness to read. Equally, phonemic awareness has been singled out as a crucial factor when it comes to reading nonsense words and text comprehension [ 20 , 21 ].

In addition, syntactic awareness has been found to be an important support for phonological awareness and a significant predictor of reading accuracy. This is especially important for children who read more complex texts and have likewise overcome reading and decoding individual words. Thus, phonological skills are important predictors of the initial reading and writing development [ 11 , 15 , 16 ]. Deficit in ability to understand the phonological structure of language has been found to be the primary cause of developing later reading difficulties [ 10 , 12 , 13 , 14 , 22 ]. Thus, phonological awareness has been regarded as the major and most widely accepted cause of dyslexia so far [ 23 ].

Phonological deficit explains reading difficulties as a consequence of impaired phonological processing skills which manifests itself as a deficit in the ability to understand the relationship between graphemes and phonemes [ 24 , 25 ].

Consequently, children with reading difficulties have been shown to have impaired phonological awareness, but may also have impaired verbal short-term memory, rapid automatized naming, speech perception, and perception of visual forms [ 26 , 27 , 28 , 29 ]. Deficits in phonological awareness and automatic rapid naming skills are particularly pronounced among individuals with dyslexia [ 30 ].

In other words, they performed less-well on tasks that are fundamentally dependent on distribution of phonological representations which affect accessibility and revocation of phonological information: phonological processing capabilities resulting from ill-formed phonological perceptions [ 15 , 31 , 32 ].

Although the relation between phonological skills and reading rely on the assumption that it is bidirectional, the nature of these skills and their precise effects on the development of reading has been a matter of a constant controversy [ 22 ]. Thus, the aim of this study was to investigate the extent to which the development of phonological awareness contributes to reading acquisition in students reading in transparent orthography. Additionally, our aim was to adapt the reading test in relation to linguistic and cultural differences, while retaining metric characteristics such as reliability and validity. We expected that children who have a problem with phonological awareness to have more difficulty reading.

2. Materials and Methods

2.1. participants.

The research comprises primary school students from first through third grade from three towns. The towns, which have approximately the same number of inhabitants, are located in the northern part of the Republic of Serbia. All schools are state-owned institutions with the same educational system. Prior to conducting this research, ethical approval was obtained from Ethics Committee of the Medical Faculty, The University of Novi Sad (Decision No. 52/14), and then the written permission from school principals in which the research was conducted, as well as the written consent from parents of students involved in the research.

First, invitations to participate in this study were sent to all parents (a total of 1136), of which 946 (83.27%) gave their written consent to participate. Subsequently, students engaged in inclusive education were excluded from the sample, as well as students with uncorrected sensory impairments (hearing and vision), students with intellectual impairment, neurological and psychiatric disorders or severe motor impairments, students with speech and language disorders and students whose first language is not Serbian. The research began with a total of 810 (85.62%) students, but complete responses were obtained from 689 (85.06%) students, which is the final number of students included in this study.

The research comprised 116 first grade girls and 128 boys, 94 s grade girls and 116 boys, while 109 girls and 126 boys were from the third grade ( Table 1 ). There is a gender balance among the groups of the first, second and third-grade students (χ2 = 0.35; p = 0.83).

Sample of participants.

Data, when initially obtained, were analyzed using χ2-test, and the results of the study indicate that there is no significant difference between students of both genders in any of the three groups of students formed based on the grade level (χ2 = 0.35; p = 0.83). In order to exclude students whose intellectual abilities are below average from the sample we used data from school protocols obtained by Raven’s testing progressive matrices [ 33 ] for participants enrolled in Grades 1, 2 and 3. All students were tested before starting school. The use of this data was approved by the school principal, as well as the students’ parents who confirmed this by giving their written consent. Available data on students were whether their achievement on the test was below average, average or above average. The study inclusion criteria involved students with an IQ of 90 and >90. Children who had below-average achievement when enrolling in school did not take this test included in the study.

All students come from urban areas and both students and their parents are native Serbian speakers. Regarding the parental educational levels, the highest percentage of parents had secondary level of education (63%), while 29% had higher level of education and 8% had primary school education.

2.2. Instruments

The Phonological Awareness Test—FONT—was used to measure phonological awareness [ 19 ]. The test contains six items, within each are eight types of tasks, for which permission has been obtained from the author. The tasks included: syllable merging, syllable segmentation, initial phoneme identification, rhyme recognition, phoneme segmentation, final phoneme identification, phoneme deletion and phoneme substitution. The test was constructed and validated for the Bosniak and Serbian languages, and it was therefore applicable for assessing students who participated in our research. According to the authors of the test, the specified time required to assign tasks is limited, but in their pilot study the scheduled time proposed to assign all eight tasks, with six items each, was about 30–45 min.

In our study, the scheduled time required to assign the tasks was about 25–35 min. The responses are assessed at each age level. The categorization with the respect to the achieved success in the test is the following: Below Average, Low Average, Average, High Average and Above Average. Accordingly, students were classified into these categories according to their test achievements.

In testing, all tasks have been solely formatted to assess tasks of oral production. In view of this, they were chosen to represent the most typical and appropriate forms of words in Serbian language. The test does not include tasks that denote non-words in Serbian, such as items that require phoneme manipulation in the middle of the word, or could lead to a change in the type of word, noun gender, or the case. Each task contains six items that are scored based on false-true response options. In the test, items (except for control items) were ranked according to the item difficulty. According to the author of the test, a very high test reliability was determined (Cronbach’s ά = 0.96) after standardization, with approximately normal distribution of corrected item-total correlations with acceptable range and average value, which indicates good internal consistency and item discrimination within the included age interval. First, for the purpose of the present research we conducted a pilot study to validate the test on the sample of 46 first grade primary school students (23 boys and 23 girls), average age 7.08 years, standard deviation (SD) 0.50). The results of the pilot study showed high reliability of the test (Cronbach’s ά = 0.95), as well as that there are no differences in relation to gender and phonological awareness, and subsequently the instrument was applied to the entire sample. The validation has been performed since the Ekavian subdialect is used in the Republic of Serbia, while the authors of the test use the Ijekavian subdialect represented in Bosnia and Herzegovina. The difference between the two sub-dialects could affect the understanding of the words presented in the test.

The results obtained on the entire sample show high reliability for both the instrument as a whole and for the subscales ( Table 2 ).

Correlation matrices.

Verification of the assumption for the dimensionality of the questionnaire was performed by exploratory factor analysis, the method of principal axes (Principal axis factoring) in SPSS 22 (IBM Corp., Armonk, NY, USA). Significant Bartlett’s test of sphericity [χ2 (1128) = 14,141,492; p < 0.001] indicate that the inter-correlation matrix is factorial. The Unweighted Least Squares (ULS) procedure was used to extract the number of factors. In order to determine the number of significant factors, the data were subjected to bootstrap parallel analysis (1000), integrated into the FACTOR program (Lorenzo-Seva & Ferrando, 2006, Tarragona, Spain, available: https://psico.fcep.urv.cat/utilitats/factor/index.html ). It was found that the four characteristic roots of the actual data explain the higher percentage of variance than their random counterparts according to the criterion’s 95th percentile, and the authors opted for that number of factors, since it also met the criterion of interpretability. These four factors explain 52.321% of the variance. Promax rotation was used, and the correlation matrix is shown in Table 2 . Only loads exceeding the value of 0.30 are shown in the table. Based on the correlation matrix, it can be noticed that no item has cross-loads on more than one factor, while the 3 items of the Initial Phoneme Identification have loads less than recommended, in order to be kept in the scale. The first factor is called Stylistic Awareness and encompasses all items of the Syllable Merging and Syllable Segmentation subscales. Apart from the items from these two subscales, this factor does not contain saturations of other items from other subscales and explains 28.99% of the variance. The second factor is termed Advanced Phonemic Awareness and encompasses all items on the Phoneme Elimination and Phoneme Substitution subscales and explains 10.128% of the variance. The third factor, called Initial Phoneme Awareness, is mostly saturated with items from the Rhyme Recognition subscale, although it also contains significant factor loads on 3 of the 6 items of Identification of the Initial Phoneme. The remaining three items of this scale do not achieve significant saturation on any factor. The third factor explains 7.429% of the variance. The fourth factor is saturated only with the items of the Phoneme Segmentation subscale, and it is named the same as the mentioned subscale and explains 5.765% of the variance. This factor includes 5 of the 6 saturated items on this subscale, while the remaining are achieved in the form of low saturated first factor. The reliability of individual factors has been found to be good, ranging from 0.83 for the Initial Phoneme Awareness factor, and 0.84 for the Phoneme Segmentation factor; to 0.87 for the Advanced Phoneme Awareness factor and 0.92 for the Syllable Awareness factor. Considering the whole of the scale the reliability is very high amounting to 0.95. The correlations between the factors are low and range from 0.133 to 0.420, and therefore there was no need to check the second-order factor structure. Factor scores were preserved as variables and thus formed scores of four subscales which were used in further analyses.

Furthermore the Gray Oral Reading Test (GORT 5) is one of the most commonly used measures of oral reading fluency and comprehension [ 34 ]. It is designed for identification of dyslexia in students ranging in age from 6 years to 23 years and 11 months, as well as for students who may need more intensive and explicit instruction in reading acquisition. According to the authors of the test, reliability Chronbach alpha was 0.90. The test includes tasks that assess oral reading speed, accuracy, fluency and comprehension on the basis of chronological age and grade. The oral reading index is a composite score formed by combining students’ fluency and reading comprehension scores [ 35 ].

In the Republic of Serbia, there is no standardized reading assessment tool that could be used with the general population for measuring essential components of reading, such as speed, accuracy, fluency and reading comprehension. Taking into account that the original version of the test is in English, the process of a cross-cultural adaptation of the test was conducted to best reflect the aims of this research, following the guidelines for adaptation with regard to linguistic and cultural differences [ 36 ]. According to translation and adaptation guidelines, the forward translation from English to Serbian was done first, independently by two translators. One of the translators was, by recommendation, of the same occupation, and the other had no experience related to this field. Then, a synthesized version was produced and two translations were harmonized into one version by two translators. Next, it was translated from Serbian into English, undergoing the process of the so-called double blind translation, followed by harmonization of translators and examiners in the field of semantic equivalence, idiom equivalence, experiential and conceptual equivalence of concepts. There was no need for changes in the content. Following this phase, we moved through the pilot study to implement this test on a sample of forty third-graders and the same number of second-graders. The obtained results showed high reliability (Cronbach’s ά = 0.91) and the instruments were applied to the entire sample.

The results found in the entire sample show high instrument reliability in relation to the Comprehension and Oral Reading Index ( Table 3 ). Since the other GORT test assessment measures showed low reliability, in view of the fact that the Oral Reading Index is a composite score formed by combining fluency and comprehension scaled scores, in further analysis we will use only Total Reading Index since the Oral Reading Index represents the most reliable testing score. Indices in the group of average to high (over 90) are achieved by students who have reached the level of oral reading that is expected for their age. Low indexes (below 90) are given to students who have not reached the level expected for their age in reading.

The reliability of the applied Gray Oral Reading Test GORT-5 scales.

2.3. Study Design

Before starting the test, parents were informed about the purpose and procedures associated with the assessment. First, they were given oral explanation, and then printed information. For parents who were not present, written information and an envelope were provided to send/return their completed consent to the examiner. Students would participate in the survey provided that their parents had given consent for their child’s participation in the research.

The assessment took place in the classrooms that students regularly attended. Classrooms were isolated from noise and distracting sounds. The assessment was performed both during students’ regular and extended stay in school. As a rule, students entered the classroom one at a time and first got to know the examiner via informal conversation. First grade students were included in the research in the second term, since the education system in Serbia is organized so that children in the first term of the first grade master the basics of reading and only in the second term are they expected to read independently.

Students were first assessed using the FONT test. It was explained to them that they should answer the examiner’s questions with YES or NO, and that the questions would be asked orally. At all times, they had the opportunity to seek further clarification from the examiner.

Subsequently, reading performance was tested. Students were first given explanation about reading the text, and answering a few questions about the texts read. They were told that the examiner would record some data while they were reading, and that it was non-evaluative assessment.

2.4. Statistical Analysis

Data analysis was performed using the statistical package SPSS 20 (IBM corp., Armonk, NY, USA).

Descriptive statistics methods were used to measure central tendency (arithmetic mean), and measures of variability (standard deviation) in order to summarize the major numerical characteristics of observations. Additionally, we applied factor analysis, one-factor ANOVA, MANOVA, T-test and Regression Analysis. In the tests used, statistically significant differences were observed outside the 95% confidence interval ( p < 0.05). To measure the reliability across the whole scale, the Cronbach’s alpha coefficient was used as a measure of internal consistency. The coefficients of at least 0.80 were considered acceptable.

Reading success was first analyzed on the entire sample. The obtained results show that students classified to the category of Very Poor made up 2.3%; Poor 11.2%; and Below Average 32.6%; whereas the largest number of students from first through third grade were classified to category 4, which corresponds to the category of Average (47.8%); Above Average made up 3.2%; Superior 2.0%; and Very Superior reading 0.9%. There is no significant difference in relation to the frequency-based categories of students’ reading success in relation to the class they attend (χ2 = 9.97, p = 0.61).

Additionally, one-way analysis of variance was used to test whether there are differences in students’ scores in different classes in relation to the Oral Reading Index scale. The results show that there is no significant difference in the overall index of oral reading (F = 0.45, p = 0.64) in relation to the class that students attend ( Table 4 ).

Reading performance in relation to the class that students attend.

SD-standard deviation; F-Multivariate analysis of variance; Cohen’s d- the effect size measure.

In relation to the level of development of Phonological Awareness, five categories have been formed. In relation to the class that the child attends, differences were noticed in relation to the category of development of Phonological Awareness (χ2 = 378.71, p < 0.001). Table 5 shows that in the first grade there are significantly more students in the Lower Average category compared to the number of students in this category of Phonological Awareness in the second and third grade. Additionally, in the second grade there are significantly more students who are in the category Below Average compared to the third grade students. In the second and third grade, there are significantly more students in the Above the Average category as compared to the first grade. According to the instructions given in the FONT Assessment Manual, there is no Above Average category for the second and third grade and for that reason a 0 value is given for these columns.

Phonological awareness based on the grade level.

In our further analysis we were interested in if there is a significant difference in the level of development of certain elements of phonological awareness among students of different grades ( Table 6 ). This question was checked by MANOVA in order to prevent the occurrence of an alpha error. Box’s Test of Equality of Covariance Matrices shows that there are no differences in the covariance of the dependent variables between the groups. Pillai’s Trace MANOVA test shows that four subscales of the FONT test show differences in relation to the grade that students attend (F = 1,790, p < 0.001). The magnitude of the effect shows that 8.5% of the differences between classes are explained by four subscales. In the next step, individual ANOVAs were performed. In relation to all four subscales/factors of the FONT scale, there is a significant difference in relation to which grade the student attends.

The results on the phonological awareness test subscales by grades.

df-degrees of freedom; Eta2-the effect size measure.

On all the Phonological Awareness subscales, it was noticed that there is a significant difference in the manifestation of scores among students who attend different grades of primary school. Syllable Awareness, Advanced Phoneme Awareness and Phoneme Segmentation are significantly more developed in the third grade students compared to all younger students, while the second graders have a significantly better result than the first-graders. On the Initial Phoneme Awareness subscale, the second-graders are significantly better than the first-graders., but no differences were observed between second and third grade students. However, the magnitudes of the effects of these differences are small, less than 0.2.

In order to examine the differences in the phonological awareness of children with and without reading difficulties, we formed groups based on the Oral Reading Index, so that the children with an Oral Reading Index below 90 entered the group of children with reading difficulties. The MANOVA results show that the students with reading difficulties have poorer overall phonological awareness than students who do not have reading difficulties. The overall model is significant (F = 12.908, p < 0.001, d = 0.070). When the differences in relation to each subscale of phonological awareness were analyzed, it was noticed that differences are observed in the same direction on the first two subscales, while there are no significant differences on the other two subscales ( Table 7 ). In terms of measures of magnitude, these differences were expressed using the Cohen’s d coefficient, which ranged from d = 0.014 to d = 0.053. The magnitude of the effect shows that these significant differences are also very small ( Table 7 ).

Differences in the phonological awareness of students with and without reading difficulties.

Cohen’s d, the effect size measure.

Although the correlations between the overall Reading Index and the FONT total (r = 0.19, p < 0.001) and subscales are significant (ranging from 0.07 to 0.20), in order to examine how much of the variance in the oral reading index can be explained based on four phonological awareness variables, multiple regression analysis was applied ( Table 8 ). The set of predictors consisted of Syllable Awareness, Advanced Phoneme awareness, Initial Phoneme awareness and Phoneme Segmentation. Preliminary analyses were conducted to determine that there were no significant deviations from the expected normality, linearity, multicollinearity, and homoscedascity.

Characteristics of multiple-regression analysis.

R—correlation coefficient; R²—coefficient of determination.

The obtained model is statistically significant, F (4,688) = 13,356, p < 0.001. The multiple co-relation coefficient is R = 0.269. The percentage of variance of the overall Reading Index explaining predictor variable is 7%.

A significant positive contribution of the two separate predictor variables was observed, whereby the initial phonological awareness scale has a higher beta coefficient (Beta = 0.212, p < 0.001) than advanced phoneme awareness scale (Beta = 0.096, p = 0.046) ( Table 9 ).

Partial contributions of predictors.

Criterion—ORI (Oral Reading Index).

4. Discussion

Contemporary scientific considerations support the phonological awareness deficit theory as the leading cause of dyslexia and reading difficulties. Considering that the Serbian language belongs to the group of transparent languages, in which one grapheme corresponds to one phoneme, this letter-to-sound conversion method should be facilitating literacy acquisition and learning to read. Thus, students who are learning to read consistent or transparent orthography, with letter-sound correspondence, learn to read faster [ 37 ].

Additionally, the results obtained show that a large number of children included in this study have reading problems, which is a key criterion for classification to the category of reading difficulties. In our research, below low and low reading skills were found in 13.5% of students, while the largest number of students from first through third grade were average readers. The previous research in our country, which referred to identifying students with reading difficulties, was conducted in 1999 and then 8.4% of children with reading disabilities were reported [ 38 ]. In recent years, we have seen an increase in the number of students with reading difficulties, which can be justified by the greater sensitivity of the academic community to this problem, in addition to better recognition and greater involvement of experts in this field. Another reason for the increase in the number of students with reading difficulties is that previously there was no instrument to measure characteristics that would clearly define the difference between average reading and reading difficulties, and the grading was based on the examiner’s subjective evaluation. By using GORT 5 and FONT, we have tried to point out the importance of applying valid instruments for reading assessment and readiness to read.

Children with reading difficulties, compared to typical readers of their same age, have substantially lower achievements in reading accuracy and speed [ 39 ]. Further, the results of the present study show that reading accuracy was a minor problem with regard to reading speed, while most common error types were sound and syllable omissions or addition in words. In the research that comprised native German speaking children with reading difficulties [ 40 ], as well as in our research, reading speed deficits for all types of reading tasks, including text were found. Additionally, error types correspond to those of native Serbian speaking children. Accordingly, the results of this research have shown that students who are slower in reading, in terms of accuracy, have more pronounced problems related to automatization of components of phonological awareness because it has not evolved over time. These results are in accordance with other studies confirming that students with reading difficulties are found to have reading speed as a core deficit because phonological awareness skills have not been automated [ 39 ]. Furthermore, the naming speed is related to the reading speed and the number of incorrectly read words errors [ 41 ]. These findings offer support for the hypothesis of reading speed as the core reading deficit in a transparent orthography.

Reading difficulty is associated with phonological awareness and the ability to identify words. The school system in Serbia prescribes children to enroll in the first grade at the age of seven, so that the students included in our sample were older than the students in similar research, in which the abilities of phonological awareness were assessed. Thus, the successful performance of students in our sample can be explained, with more than half of students who achieved above average phonological awareness scores. The results of this research show that students who have been classified as those with low phonological awareness have less developed vocabulary compared to students with average and above average phonological awareness measures. In the second grade students, low average phonemic awareness scores also correlates with lower vocabulary subtest scores. Actual result was expected for second grade students, because poor vocabulary and limited lexical resources can impoverish their knowledge of phoneme-grapheme relationship in words, i.e., phonological awareness. Studies conducted with native English speaking children had also reported very good children’s phonological awareness at the age of four and five. For them, performing tasks involving rhyme recognition were easier than recognizing the initial phoneme. Students who had been learning how to read for about a year were able to perform syllable and phoneme segmentation [ 42 , 43 ].

Goswami’s research [ 44 ] suggests that a sequence of a phonological awareness development is universal across languages. Evidence of this lies in phonological skills such as syllable segmentation, initial phoneme identification, and rhyme recognition, which begin before developing literacy skills [ 44 , 45 ]. Cassady et al. [ 46 ], as well as Rathvon [ 47 ] suggest that all types of phonological awareness tasks can be grouped into nonphonemic tasks (which measure global aspects of phonological awareness, such as rhyming and syllable segmentation) and phonemic awareness tasks (which measure the ability to attend to or to manipulate individual phonemes).

By analyzing separate components of phonological awareness compared to chronological age, it was noticed that the first grade students in our research were worse at performing tasks of syllable merging, syllable segmentation, identifying the initial phoneme and recognizing rhyme compared to second and third grade students. Students who have better developed phonological skills are better at reading fluency, as well as reading comprehension. Components of phonological awareness that are formed before entering school and acquisition of literacy skills are not closely connected to speed, accuracy, fluency and reading comprehension. This can be explained by the fact that syllable awareness, sound segmenting and initial phonological awareness, elements of phonological awareness, are important for the pre-reading period. Already at the end of the first, and especially the second grade, the tasks with the elimination of the initial phoneme, the identification of the final phoneme and phoneme substitution, i.e., advanced phonological awareness, proved to be more difficult. This can be attributed to the fact that syllable and phoneme segmentation, initial phoneme identification and rhyming are components of phonological awareness that are important for the pre-reading learning period. Until the end of the first, and particularly second grade, the tasks in initial phoneme deletion, final phoneme identification and phoneme substitution turned out to be more difficult. Students in the second and third grade were more successful in performing these tasks compared to the first grade students, which suggests contribution of reading process to the improvement of phonological awareness.

At the same time, research shows that already preschool—aged children can successfully segment words into syllables, while phoneme segmentation is successfully performed only with starting school and learning to read [ 45 , 48 , 49 , 50 , 51 , 52 ]. Therefore, the data obtained from this study confirm the findings of previous studies. Likewise, first grade students have shown more developed syllable segmentation compared to phoneme segmentation, whereas the other components of phonological awareness were developed in the second and third grade students. We assume that this happened as a result of reading and literacy acquisition.

The primary deficit that underpins reading difficulties in all languages lies in problems with phonological awareness [ 44 , 53 , 54 ]. In this context, it is recommended to focus on five cognitive strategy and skill areas that contribute to reading development—phonological awareness, learning of grapheme-phoneme correspondences, reading fluency, vocabulary and reading comprehension [ 55 , 56 , 57 ]. There was the same tendency in the present research. The group of students whose reading scores did not involve deviation below the average had shown a more developed phonological awareness and were more successful in all reading tasks. Students who had lower reading scores also obtained lower scores in the phonological awareness test.

5. Conclusions

Our research showed that the largest number of students who participated in the research had an average reading development. There was no significant difference in reading among students of different grades. Students who have difficulty reading have poorer phonological awareness compared to students who have no difficulty reading. Phonological awareness has proven to be a significant predictor in mastering reading on two subscales, but even these significant differences are very small.

The limitation of this study lies in the fact that an additional instrument wasn`t used for the assessment of reading in order to establish concurrent validity of the GORT 5 test, which was used for the first time in Serbian. The shortcoming of our research is not taking into account the time devoted to reading instruction and reading preparations done at pre-school institutions. Subsequent studies should also consider the time that the children spend reading at home, as well as different reading instruction styles.

Acknowledgments

The research was conducted with support from primary school principals in Novi Sad, Kula and Crvenka in addition to the support and collaboration of psychologists and teachers of these schools.

Author Contributions

Conceptualization, V.M., Š.G.; methodology, V.M., S.G.; validation, T.K.; formal analysis, S.G.; investigation, V.M., T.K.; data curation, S.G.; writing—original draft preparation, V.M.; writing—review and editing, V.M.; visualization, T.K.; supervision, Š.G.; project administration, Š.G. All authors have read and agreed to the published version of the manuscript.

This research received no external funding.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Medical Faculty in Novi Sad (Decision No. 52/14).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is in the following e-collection/theme issue:

Published on 17.4.2024 in Vol 26 (2024)

Service Quality and Residents’ Preferences for Facilitated Self-Service Fundus Disease Screening: Cross-Sectional Study

Authors of this article:

Author Orcid Image

Original Paper

  • Senlin Lin 1, 2, 3 * , MSc   ; 
  • Yingyan Ma 1, 2, 3, 4 * , PhD   ; 
  • Yanwei Jiang 5 * , MPH   ; 
  • Wenwen Li 6 , PhD   ; 
  • Yajun Peng 1, 2, 3 , BA   ; 
  • Tao Yu 1, 2, 3 , BA   ; 
  • Yi Xu 1, 2, 3 , MD   ; 
  • Jianfeng Zhu 1, 2, 3 , MD   ; 
  • Lina Lu 1, 2, 3 , MPH   ; 
  • Haidong Zou 1, 2, 3, 4 , MD  

1 Shanghai Eye Diseases Prevention &Treatment Center/ Shanghai Eye Hospital, School of Medicine, Tongji University, Shanghai, China

2 National Clinical Research Center for Eye Diseases, Shanghai, China

3 Shanghai Engineering Research Center of Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China

4 Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China

5 Shanghai Hongkou Center for Disease Control and Prevention, Shanghai, China

6 School of Management, Fudan University, Shanghai, China

*these authors contributed equally

Corresponding Author:

Haidong Zou, MD

Shanghai Eye Diseases Prevention &Treatment Center/ Shanghai Eye Hospital

School of Medicine

Tongji University

No 1440, Hongqqiao Road

Shanghai, 200336

Phone: 86 02162539696

Email: [email protected]

Background: Fundus photography is the most important examination in eye disease screening. A facilitated self-service eye screening pattern based on the fully automatic fundus camera was developed in 2022 in Shanghai, China; it may help solve the problem of insufficient human resources in primary health care institutions. However, the service quality and residents’ preference for this new pattern are unclear.

Objective: This study aimed to compare the service quality and residents’ preferences between facilitated self-service eye screening and traditional manual screening and to explore the relationships between the screening service’s quality and residents’ preferences.

Methods: We conducted a cross-sectional study in Shanghai, China. Residents who underwent facilitated self-service fundus disease screening at one of the screening sites were assigned to the exposure group; those who were screened with a traditional fundus camera operated by an optometrist at an adjacent site comprised the control group. The primary outcome was the screening service quality, including effectiveness (image quality and screening efficiency), physiological discomfort, safety, convenience, and trustworthiness. The secondary outcome was the participants’ preferences. Differences in service quality and the participants’ preferences between the 2 groups were compared using chi-square tests separately. Subgroup analyses for exploring the relationships between the screening service’s quality and residents’ preference were conducted using generalized logit models.

Results: A total of 358 residents enrolled; among them, 176 (49.16%) were included in the exposure group and the remaining 182 (50.84%) in the control group. Residents’ basic characteristics were balanced between the 2 groups. There was no significant difference in service quality between the 2 groups (image quality pass rate: P =.79; average screening time: P =.57; no physiological discomfort rate: P =.92; safety rate: P =.78; convenience rate: P =.95; trustworthiness rate: P =.20). However, the proportion of participants who were willing to use the same technology for their next screening was significantly lower in the exposure group than in the control group ( P <.001). Subgroup analyses suggest that distrust in the facilitated self-service eye screening might increase the probability of refusal to undergo screening ( P =.02).

Conclusions: This study confirms that the facilitated self-service fundus disease screening pattern could achieve good service quality. However, it was difficult to reverse residents’ preferences for manual screening in a short period, especially when the original manual service was already excellent. Therefore, the digital transformation of health care must be cautious. We suggest that attention be paid to the residents’ individual needs. More efficient man-machine collaboration and personalized health management solutions based on large language models are both needed.

Introduction

Vision impairment and blindness are caused by a variety of eye diseases, including cataracts, glaucoma, uncorrected refractive error, age-related macular degeneration, diabetic retinopathy, and other eye diseases [ 1 ]. They not only reduce economic productivity but also harm the quality of life and increase mortality [ 2 - 6 ]. In 2020, an estimated 43.3 million individuals were blind, and 1.06 billion individuals aged 50 years and older had distance or near vision impairment [ 7 ]. With an increase in the aging population, the number of individuals affected by vision loss has increased substantially [ 1 ].

High-quality public health care for eye disease prevention, such as effective screening, can assist in eliminating approximately 57% of all blindness cases [ 8 ]. Digital technologies, such as telemedicine, 5G telecommunications, the Internet of Things, and artificial intelligence (AI), have provided the potential to improve the accessibility, availability, and productivity of existing resources and the overall efficiency of eye care services [ 9 , 10 ]. The use of digital technology not only reduces the cost of eye disease screening and improves its efficiency, but also assists residents living in remote areas to gain access to eye disease screening [ 11 - 13 ]. Therefore, an increasing number of countries (or regions) are attempting to establish eye screening systems based on digital technology [ 9 ].

Fundus photography is the most important examination in eye disease screening because the vast majority of diagnoses of blinding retinal diseases are based on fundus photographs. Diagnoses can be made by human experts or AI software. However, traditional fundus cameras must be operated by optometrists, who are usually in short supply in primary health care institutions when faced with the large demand for screening services.

Fortunately, the fully automatic fundus camera has been developed on the basis of digital technologies including AI, industrial automation, sensors, and voice navigation. It can automatically identify the person’s left and right eyes, search for pupils, adjust the lens position and shooting focus, and provide real-time voice feedback during the process, helping the residents to understand the current inspection steps clearly and cooperatively complete the inspection. Therefore, a facilitated self-service eye screening pattern has been newly established in 2022 in Shanghai, China.

However, evidence is inadequate about whether this new screening pattern performs well and whether the residents prefer it. Therefore, this cross-sectional study aims to compare the service quality and residents’ preferences of this new screening pattern with that of the traditional screening pattern. We aimed to (1) investigate whether the facilitated self-service eye screening can achieve service quality similar to that of traditional manual screening, (2) compare residents’ preferences between the facilitated self-service eye screening and traditional manual screening, and (3) explore the relationship between the screening service quality and residents’ preferences.

Study Setting

This study was conducted in Shanghai, China, in 2022. Since 2010, Shanghai has conducted an active community-based fundus disease telemedicine screening program. After 2018, an AI model was adopted ( Figure 1 ). At the end of 2021, the fully automatic fundus camera was adopted, and the facilitated self-service fundus disease screening pattern was established ( Figure 1 ). Within this new pattern, residents could perform fundus photography by themselves without professionals’ assistance ( Multimedia Appendix 1 ). The fundus images were sent to the cloud server center of the AI model, and the screening results were fed back immediately.

research journal article on the phonological features of a language

Study Design

We conducted a cross-sectional study at 2 adjacent screening sites. These 2 sites were expected to be very similar in terms of their socioeconomic and educational aspects since they were located next to each other. One site provided facilitated self-service fundus disease screening, and the residents who participated therein comprised the exposure group; the other site provided screening with a traditional fundus camera operated by an optometrist, and the residents who participated therein comprised the control group. All the adult residents could participant in our screening program, but their data were used for analysis only if they signed the informed consent form. Residents could opt out of the study at any time during the screening.

In the exposure group, the residents were assessed using an updated version of the nonmydriatic fundus camera Kestrel 3100m (Shanghai Top View Industrial Co Ltd) with a self-service module. In the process of fundus photography, the residents pressed the “Start” button by themselves. All checking steps (including focusing, shooting, and image quality review) were undertaken automatically by the fundus camera ( Figure 2 ). Screening data were transmitted to the AI algorithm on a cloud-based server center through the telemedicine platform, and the screening results were fed back immediately. Residents were fully informed that the assessment was fully automated and not performed by the optometrist.

research journal article on the phonological features of a language

In the control group, the residents were assessed using the basic version of the same nonmydriatic fundus camera. The optical components were identical to those in the exposure group but without the self-service module. In the process of fundus photography, all steps were carried out by the optometrist (including focusing, shooting, and image quality review). Screening data were transmitted to the AI algorithm on a cloud-based server center through the telemedicine platform, and the screening results were fed back immediately. Residents were also fully informed.

Measures and Outcomes

The primary outcome was the screening service’s quality. Based on the World Health Organization’s recommendations for the evaluation of AI-based medical devices [ 14 ] and the European Union’s Assessment List for Trustworthy Artificial Intelligence [ 15 ], 5 dimensions were selected to reflect the service quality of eye disease screening: effectiveness, physiological discomfort, safety, convenience, and trustworthiness.

Furthermore, effectiveness was based on 2 indicators: image quality and screening efficiency. A staff member recorded the time required for each resident to take fundus photographs (excluding the time taken for diagnosis) at the screening site. Then, a professional ophthalmologist evaluated the quality of each fundus photograph after the on-site experiment. The ophthalmologist was blinded to the grouping of participants. Image quality was assessed on the basis of the image quality pass rate, expressed as the number of eyes with high-quality fundus images per 100 eyes. Screening efficiency was assessed on the basis of the average screening time, expressed as the mean of the time required for each resident to take fundus photographs.

To assess physiological discomfort, safety, convenience, and trustworthiness of screening services, residents were asked to finish a questionnaire just after they received the screening results. A 5-point Likert scale was adopted for each dimension, from the best to the worst, except for the physiological discomfort ( Multimedia Appendix 2 ). A no physiological discomfort rate was expressed as the number of residents who chose the “There is no physiological discomfort during the screening” per 100 individuals in each group. Safety rate is expressed as the number of residents who chose “The screening is very safe” or “The screening is safe” per 100 individuals in each group. Convenience rate is expressed as the number of residents who chose “The screening is very convenient” or “The screening is convenient” per 100 individuals in each group. The trustworthiness rate is expressed as the number of residents who chose “The screening result is very trustworthy” or “The screening result is trustworthy” per 100 individuals in each group.

The secondary outcome was the preference rate, expressed as the number of residents who were willing to use the same technology for their next screening per 100 individuals. In detail, in the exposure group, the preference rate was expressed as the number of the residents who preferred facilitated self-service eye screening per 100 individuals, while in the control group, it was expressed as the number of residents who preferred traditional manual screening per 100 individuals.

To understand the residents’ preference, a video displaying the processes of both facilitated self-service eye screening and traditional manual screening was shown to the residents. Then, the following question was asked: “At your next eye disease screening, you can choose either facilitated self-service eye screening or traditional manual screening. Which one do you prefer?” A total of 4 alternatives were set: “Prefer traditional manual screening,” “Prefer facilitated self-service eye screening,” “Both are acceptable,” and “Neither is acceptable (Refusal of screening).” Each resident could choose only 1 option, which best reflected their preference.

Sample Size

The rule of events per variable was used for sample size estimation. In this study, 2 logit models were established for the 2 groups separately, each containing 8 independent variables. We set 10 events per variable in general. According to a previous study [ 16 ], when the decision-making process had high uncertainty, the proportion of individuals who preferred the algorithms was about 50%. This led us to arrive at a sample size of 160 (8 variables multiplied by 10 events each, with 50% of individuals potentially preferring facilitated screening [ie, 50% of 8×10]) for each group.

Every dimension of the screening service quality and the preference rate were calculated separately. Chi-square and t tests were used to test whether the service quality or the residents’ preferences differed between the 2 groups. A total of 7 hypotheses were tested, as shown in Textbox 1 .

  • H1: image quality pass rate exposure group ≠ image quality pass rate control group H0: image quality pass rate exposure group =image quality pass rate control group
  • H1: screening time exposure group ≠screening time control group H0: screening time exposure group =screening time control group
  • H1: no discomfort rate exposure group ≠no discomfort rate control group H0: no discomfort rate exposure group = no discomfort rate control group
  • H1: safety rate exposure group ≠safety rate control group H0: safety rate exposure group = safety rate control group
  • H1: convenience rate exposure group ≠convenience rate control group H0: convenience rate exposure group = convenience rate control group
  • H1: trustworthiness rate exposure group ≠trustworthiness rate control group H0: trustworthiness rate exposure group = trustworthiness rate control group
  • H1: preference rate exposure group ≠preference rate control group H0: preference rate exposure group = preference rate control group

If any of the hypotheses among hypotheses 1-6 ( Textbox 1 ) were significant, it indicated that the service quality was different between facilitated self-service eye screening and traditional manual screening. If hypothesis 7 was significant, it meant that the residents’ preference for facilitated self-service eye screening was different from that for traditional manual screening.

Additionally, subgroup analyses in the exposure and control groups were conducted to explore the relationships between the screening service quality and the residents’ preferences, using generalized logit models. The option “Prefer facilitated self-service eye screening” was used as the reference level for the dependent variable in the models. The independent variables included age, sex, image quality, screening efficiency, physiological discomfort, safety, convenience, and trustworthiness. All statistics were performed using SAS (version 9.4; SAS Institute).

Ethical Considerations

The study adhered to the ethical principles of the Declaration of Helsinki and was approved by the Shanghai General Hospital Ethics Committee (2022SQ272). All participants provided written informed consent before participating in this study. The study data were anonymous, and no identification of individual participants in any images of the manuscript or supplementary material is possible.

Participants’ Characteristics

A total of 358 residents enrolled; among them, 176 (49.16%) were in the exposure group and the remaining 182 (50.84%) were in the control group. Residents’ basic characteristics were balanced between the 2 groups. The mean age was 65.05 (SD 12.28) years for the exposure group and 63.96 (SD 13.06) years for the control group; however, this difference was nonsignificant ( P =.42). The proportion of women was 67.05% (n=118) for the exposure group and 62.09% (n=113) for the control group; this difference was also nonsignificant between the 2 groups ( P =.33).

Screening Service Quality

In the exposure group, high-quality fundus images were obtained for 268 out of 352 eyes (image quality pass rate=76.14%; Figure 3 ). The average screening time was 81.03 (SD 36.98) seconds ( Figure 3 ). In the control group, high-quality fundus images were obtained for 274 out of 364 eyes (image quality pass rate=75.27%; Figure 3 ). The average screening time was 78.22 (SD 54.01) seconds ( Figure 3 ). There was no significant difference in the image quality pass rate ( χ 2 1 =0.07, P =.79) and average screening time ( t 321.01 =–0.58 [Welch–Satterthwaite–adjusted df ], P =.56) between the 2 groups ( Figure 3 ).

research journal article on the phonological features of a language

For the other dimensions, detailed information is shown in Figure 3 . There were no significant differences between any of these rates between the 2 groups (no physiological discomfort rate: χ 2 1 =0.01, P =.92; safety rate: χ 2 1 =0.08, P =.78; convenience rate: χ 2 1 =0.004, P =.95; trustworthiness rate: χ 2 1 =1.63, P =.20).

Residents’ Preferences

In the exposure group, 120 (68.18%) residents preferred traditional manual screening, 19 (10.80%) preferred facilitated self-service eye screening, 19 (10.80%) preferred both, and the remaining 18 (10.23%) preferred neither. In the control group, 123 (67.58%) residents preferred traditional manual screening, 14 (7.69%) preferred facilitated self-service eye screening, 20 (10.99%) preferred both, and the remaining 25 (13.74%) preferred neither.

The proportion of residents who chose the category “Prefer facilitated self-service eye screening” in the exposure group was significantly lower than that of residents who chose the category “Prefer traditional manual screening” in the control group ( χ 2 1 =120.57, P <.001; Figure 3 ).

Subgroup Analyses

In the exposure group, 4 generalized logit models were generated ( Table 1 ). Regarding the effectiveness of facilitated self-service eye screening, neither the image quality nor the screening time had an impact on the residents’ preferences. Regarding the other dimensions for facilitated self-service eye screening service quality, models 3 and 4 demonstrated that distrust in the results of facilitated self-service eye screening might decrease the probability of preferring this screening service and increase the probability of preferring neither of the 2 screening services.

a Age and gender were adjusted in model 1. Age, gender, image quality, and screening efficiency were adjusted in model 2. Age, gender, physiological discomfort, safety, convenience, and trustworthiness were adjusted in model 3. Age, gender, image quality, screening efficiency, physiological discomfort, safety, convenience, and trustworthiness were adjusted in model 4.

b In the exposure group, distrust in the results of facilitated self-service eye screening might decrease the probability of preferring this screening service and increase the probability of preferring neither the traditional nor the facilitated self-service screening services.

c Not available.

In the control group, another 4 generalized logit models were generated ( Table 2 ). Men were more likely to choose a preference both screening services. The probability of preferring manual screening might increase with age, as long as the probability of preferring facilitated self-service eye screening decreased. Regarding the effectiveness of traditional manual screening, neither the image quality pass rate nor the screening time had an impact on the residents’ preferences. For the other dimensions of the quality of traditional manual screening, models 7 and 8 showed that if the residents feel unsafe about traditional manual screening, their preference for traditional manual screening might decrease, and they might turn to facilitated self-service eye screening.

a Age and gender were adjusted in model 5. Age, gender, image quality, and screening efficiency were adjusted in model 6. Age, gender, physiological discomfort, safety, convenience, and trustworthiness were adjusted in model 7. Age, gender, image quality, screening efficiency, physiological discomfort, safety, convenience, and trustworthiness were adjusted in model 8.

b In the control group, if the residents feel unsafe about traditional manual screening, their preference for traditional manual screening might decrease, and they might turn to facilitated self-service eye screening.

A new fundus disease screening pattern was established using the fully automatic fundus camera without any manual intervention. Our findings suggest that facilitated self-service eye screening can achieve a service quality similar to that of traditional manual screening. The study further evaluated the residents’ preferences and associated factors for the newly established self-service fundus disease screening. Our study found that the residents’ preference for facilitated self-service eye screening is significantly less than that for traditional manual screening. This implies that the association between the service quality of the screening technology and residents’ preferences was weak, suggesting that aversion to the algorithm might exist. In addition, the subgroup analyses suggest that even the high quality of facilitated self-service eye screening cannot increase the residents’ preference for this new screening pattern. Worse still, distrust in the results of this new pattern may lead to lower usage of eye disease screening services as a whole. To the best of our knowledge, this study is one of the first to evaluate service quality and residents’ preferences for facilitated self-service fundus disease screening.

Previous studies have suggested that people significantly prefer manual services to algorithms in the field of medicine [ 16 - 18 ]. Individuals have an aversion to algorithms underlying digital technology, especially when they see errors in the algorithm’s functioning [ 18 ]. The preference for algorithms does not increase even if the residents are told that the algorithm outperforms human doctors [ 19 , 20 ]. Our results confirm that fundus image quality in the exposure group is similar to that in the control group in our study, and both are similar to or even better than those reported in previous studies [ 21 , 22 ]. However, the preference for facilitated self-service fundus disease screening is significantly less than that for traditional manual screening. One possible explanation is that uniqueness neglect—a concern that algorithm providers are less able than human providers to account for residents’ (or patients’) unique characteristics and circumstances—drives consumer resistance to digital medical technology [ 23 ]. Therefore, personalized health management solutions based on large language models should be developed urgently [ 24 ] to meet the residents’ individual demands. In addition, a survey of population preferences for medical AI indicated that the most important factor for the public is that physicians are ultimately responsible for diagnosis and treatment planning [ 25 ]. As a result, man-machine collaboration, such as human supervision, is still necessary [ 26 ], especially in the early stages of digital transformation to help residents understand and accept the digital technologies.

Furthermore, our study suggests that distrust in the results of facilitated self-service fundus disease screening may cause residents to abandon eye disease screening, irrespective of whether it is provided using this new screening pattern or via the traditional manual screening pattern. This is critical to digital transformation in medicine. This implies that if the digital technology does not perform well, residents will not only be averse to the digital technology itself but also be more likely to abandon health care services as a whole. Digital transformation is a fundamental change to the health care delivery system. This implies that it can self-disrupt its ability to question the practices and production models of existing health care services. As a result, it may become incompatible with the existing models, processes, activities, and even cultures [ 27 ]. Therefore, it is important to assess whether the adoption of digital technologies contributes to health system objectives in an optimal manner, and this assessment should be carried out at the level of health services but not at the level of digital transformation [ 28 ].

The most prominent limitation of our study is that it was conducted only in Shanghai, China. Because of the sound health care system in Shanghai, residents have already received high-quality eye disease screening services before the adoption of the facilitated self-service eye screening pattern. Consequently, residents are bound to demand more from this new pattern. This situation is quite different from that in lower-income regions. Digital technology was adapted in poverty-stricken areas to build an eye care system, but it did not replace the original system that is based on manually delivered services [ 13 ]. Therefore, the framing effect may be weak [ 29 ], and there is little practical value in comparing digital technology and manual services in these regions. Second, our study is an observational study and blind grouping was not practical due to the special characteristics of fundus examination. However, we have attempted to use blind processing whenever possible. For instance, ophthalmologists’ evaluation of image quality was conducted in a blinded manner. Third, the manner in which we inquired about residents’ preferences might affect the results. For example, participants in the exposure group generally have experience with manual screening, but those in the control group may not have had enough experience with facilitated screening despite having been shown a video. This might make the participants in the control group more likely to choose manual screening because the new technology was unfamiliar. Finally, individual-level socioeconomic factors or educational level were not recorded, so we cannot rule out the influence of these factors on residents’ preferences.

In summary, this study confirms that the facilitated self-service fundus disease screening pattern could achieve high service quality. The preference of the residents for this new mode, however, was not ideal. It was difficult to reverse residents’ preference for manual screening in a short period, especially when the original manual service was already excellent. Therefore, the digital transformation of health care must proceed with caution. We suggest that attention be paid to the residents’ individual needs. Although more efficient man-machine collaboration is necessary to help the public understand and accept new technologies, personalized health management solutions based on large language models are required.

Acknowledgments

This study was funded by the Shanghai Public Health Three-Year Action Plan (GWVI-11.1-30, GWVI-11.1-22), Science and Technology Commission of Shanghai Municipality (20DZ1100200 and 23ZR1481000), Shanghai Municipal Health Commission (2022HP61, 2022YQ051, and 20234Y0062), Shanghai First People's Hospital featured research projects (CCTR-2022C08) and Medical Research Program of Hongkou District Health Commission (Hongwei2202-07).

Data Availability

Data are available from the corresponding author upon reasonable request.

Authors' Contributions

SL, YM, and YJ contributed to the conceptualization and design of the study. SL, YM, YJ, YP, TY, and YX collected the data. SL and YM analyzed the data. SL, YM, and YJ drafted the manuscript. WL, YX, JZ, LL, and HZ extensively revised the manuscript. All authors read and approved the final manuscript submitted.

Conflicts of Interest

None declared.

Video of the non-mydriatic fundus camera Kestrel-3100m with the self-service module.

Questions for screening service quality.

  • GBD 2019 BlindnessVision Impairment Collaborators, Vision Loss Expert Group of the Global Burden of Disease Study. Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. Lancet Glob Health. Feb 2021;9(2):e144-e160. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Marques AP, Ramke J, Cairns J, Butt T, Zhang JH, Muirhead D, et al. Global economic productivity losses from vision impairment and blindness. EClinicalMedicine. May 2021;35:100852. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jan C, Li S, Kang M, Liu L, Li H, Jin L, et al. Association of visual acuity with educational outcomes: a prospective cohort study. Br J Ophthalmol. Nov 18, 2019;103(11):1666-1671. [ CrossRef ] [ Medline ]
  • Chai YX, Gan ATL, Fenwick EK, Sui AY, Tan BKJ, Quek DQY, et al. Relationship between vision impairment and employment. Br J Ophthalmol. Mar 16, 2023;107(3):361-366. [ CrossRef ] [ Medline ]
  • Nayeni M, Dang A, Mao AJ, Malvankar-Mehta MS. Quality of life of low vision patients: a systematic review and meta-analysis. Can J Ophthalmol. Jun 2021;56(3):151-157. [ CrossRef ] [ Medline ]
  • Wang L, Zhu Z, Scheetz J, He M. Visual impairment and ten-year mortality: the Liwan Eye Study. Eye (Lond). Aug 19, 2021;35(8):2173-2179. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • GBD 2019 BlindnessVision Impairment Collaborators, Vision Loss Expert Group of the Global Burden of Disease Study. Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. Lancet Glob Health. Feb 2021;9(2):e130-e143. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cheng C, Wang N, Wong TY, Congdon N, He M, Wang YX, et al. Vision Loss Expert Group of the Global Burden of Disease Study. Prevalence and causes of vision loss in East Asia in 2015: magnitude, temporal trends and projections. Br J Ophthalmol. May 28, 2020;104(5):616-622. [ CrossRef ] [ Medline ]
  • Li JO, Liu H, Ting DS, Jeon S, Chan RP, Kim JE, et al. Digital technology, tele-medicine and artificial intelligence in ophthalmology: a global perspective. Prog Retin Eye Res. May 2021;82:100900. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ting DSW, Pasquale LR, Peng L, Campbell JP, Lee AY, Raman R, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. Feb 25, 2019;103(2):167-175. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Xie Y, Nguyen QD, Hamzah H, Lim G, Bellemo V, Gunasekeran DV, et al. Artificial intelligence for teleophthalmology-based diabetic retinopathy screening in a national programme: an economic analysis modelling study. Lancet Digit Health. May 2020;2(5):e240-e249. [ CrossRef ]
  • Tang J, Liang Y, O'Neill C, Kee F, Jiang J, Congdon N. Cost-effectiveness and cost-utility of population-based glaucoma screening in China: a decision-analytic Markov model. Lancet Glob Health. Jul 2019;7(7):e968-e978. [ CrossRef ]
  • Xiao X, Xue L, Ye L, Li H, He Y. Health care cost and benefits of artificial intelligence-assisted population-based glaucoma screening for the elderly in remote areas of China: a cost-offset analysis. BMC Public Health. Jun 04, 2021;21(1):1065. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Generating Evidence for Artificial Intelligence Based Medical Devices: A Framework for Training Validation and Evaluation. World Health Organization. URL: https://www.who.int/publications/i/item/9789240038462 [accessed 2024-03-27]
  • The Assessment List for Trustworthy Artificial Intelligence. URL: https://altai.insight-centre.org/ [accessed 2024-03-27]
  • Dietvorst BJ, Bharti S. People reject algorithms in uncertain decision domains because they have diminishing sensitivity to forecasting error. Psychol Sci. Oct 11, 2020;31(10):1302-1314. [ CrossRef ] [ Medline ]
  • DeCamp M, Tilburt JC. Why we cannot trust artificial intelligence in medicine. Lancet Digit Health. Dec 2019;1(8):e390. [ CrossRef ]
  • Frank D, Elbæk CT, Børsting CK, Mitkidis P, Otterbring T, Borau S. Drivers and social implications of artificial intelligence adoption in healthcare during the COVID-19 pandemic. PLoS One. Nov 22, 2021;16(11):e0259928. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Juravle G, Boudouraki A, Terziyska M, Rezlescu C. Trust in artificial intelligence for medical diagnoses. Prog Brain Res. 2020;253:263-282. [ CrossRef ] [ Medline ]
  • Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. Oct 2019;1(6):e271-e297. [ CrossRef ]
  • Scanlon PH, Foy C, Malhotra R, Aldington SJ. The influence of age, duration of diabetes, cataract, and pupil size on image quality in digital photographic retinal screening. Diabetes Care. Oct 2005;28(10):2448-2453. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Cen L, Ji J, Lin J, Ju S, Lin H, Li T, et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks. Nat Commun. Aug 10, 2021;12(1):4828. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Longoni C, Bonezzi A, Morewedge C. Resistance to medical artificial intelligence. J Consum Res. 2019;46:650. [ CrossRef ]
  • Huang AS, Hirabayashi K, Barna L, Parikh D, Pasquale LR. Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management. JAMA Ophthalmol. Feb 22, 2024. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ploug T, Sundby A, Moeslund TB, Holm S. Population preferences for performance and explainability of artificial intelligence in health care: choice-based conjoint survey. J Med Internet Res. Dec 13, 2021;23(12):e26611. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Young AT, Amara D, Bhattacharya A, Wei ML. Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digit Health. Sep 2021;3(9):e599-e611. [ CrossRef ]
  • Alami H, Gagnon M, Fortin J. Digital health and the challenge of health systems transformation. Mhealth. Aug 08, 2017;3:31-31. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ricciardi W, Pita Barros P, Bourek A, Brouwer W, Kelsey T, Lehtonen L, et al. Expert Panel on Effective Ways of Investing in Health (EXPH). How to govern the digital transformation of health services. Eur J Public Health. Oct 01, 2019;29(Supplement_3):7-12. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Khan WU, Shachak A, Seto E. Understanding decision-making in the adoption of digital health technology: the role of behavioral economics' prospect theory. J Med Internet Res. Feb 07, 2022;24(2):e32714. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Mavragani; submitted 06.01.23; peer-reviewed by B Li, A Bate, CW Pan; comments to author 13.09.23; revised version received 15.10.23; accepted 12.03.24; published 17.04.24.

©Senlin Lin, Yingyan Ma, Yanwei Jiang, Wenwen Li, Yajun Peng, Tao Yu, Yi Xu, Jianfeng Zhu, Lina Lu, Haidong Zou. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 17.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. What Are Some Examples Of Phonology

    research journal article on the phonological features of a language

  2. (PDF) Phonological and Morphological Basis Underlying the English

    research journal article on the phonological features of a language

  3. (PDF) Phonological awareness and speech comprehensibility: An

    research journal article on the phonological features of a language

  4. Sample phonological process analysis.

    research journal article on the phonological features of a language

  5. Journal of Speech, Language, and Hearing Research

    research journal article on the phonological features of a language

  6. (PDF) Eastern Indic Linguistic Features of Pali Language. A Historical

    research journal article on the phonological features of a language

VIDEO

  1. How To Start A Research Paper? #research #journal #article #thesis #phd

  2. The Unique Flavors of Indian English

  3. phonological development of language

  4. Journal Entry for 0500/0990 Cambridge IGSCE First Language English

  5. Phonology Problem: Spanish Voiced Stops and Fricatives

  6. October 12, 2020 The most persistent myth about dyslexia: seeing letters backwards

COMMENTS

  1. Phonological features and their phonetic correlates

    As a result of the more recent work of Chomsky and Halle (1968), the role of distinctive features within phonological theory has become even more crucial. This paper will review the nature of phonological features in general, and will suggest a particular set of features which it might be appropriate to use in phonological descriptions of ...

  2. Current research in phonological typology

    This type of data-driven research in phonological typology highlights how the testing of long standing theories can be relevant for debates on universal phonotactics. Their data and methodology also show the importance of openly accessible data and reproducible methods, models, and code. 3.2.3. Himmelmann.

  3. Phonological features emerge substance-freely from the phonetics and

    In BiPhon-NN, the levels that presumably contain structure (e.g., phonological features) are physically separate from the ones at which phonetic representations are applied; the relevant example from the present article is that we look for phonological features at the middle level, which is separate from the phonetic and semantic layers at the ...

  4. Phonetic and Phonological Factors in the Second Language Production of

    Language and Linguistics Compass is a unique linguistics journal publishing topical and significant research across the entire language and linguistics discipline. Abstract The study of second language (L2) speech production has been informed by research in a number of areas, including phonological theory, acoustic phonetics, and articulatory ...

  5. (PDF) Phonological Features: Problems and Proposals

    The purposes of this paper are: (1) to demonstrate certain inadequacies in the. feature system proposed by Chomsky & Halle's Sound pattern of English (1968; henceforth SPE), and (2) to indicate ...

  6. English phonology and an acoustic language universal

    The configuration of the phonemes in Fig. 2 can be related to sonority, or aperture , as defined in phonology 1,7,14,15,16,17,18,19; vowels, sonorant consonants, and obstruents make a hierarchy of ...

  7. Acoustic and language-specific sources for phonemic ...

    All sites that were speech responsive for at least one band are shown in Fig. 1 b-e. Of the roughly 485 speech responsive electrodes for each band, an average of 31 acoustic surface sites (SD ± ...

  8. PDF Acoustic and language-specific sources for phonemic ...

    Acoustics, phonology, and morphology drive neural activity For assessing acoustic-phonemic divergence with coronal stop neu- tralization, acoustic sites and phonemic sites were defined using a

  9. A Meta-Analysis of the Effect of Phonological Awareness ...

    A search of the research literature published from 1990 to 2019 yielded 45 articles with 46 studies containing 3,841 participants in total. Effect sizes were recorded for the effect of various PA and/or phonics instructional interventions on word and pseudo word reading.

  10. Phonological feature re-assembly and the importance of phonetic cues

    It is argued that new phonological features can be acquired in second languages, but that both feature acquisition and feature re-assembly are affected by the robustness of phonetic cues in the input. ... This article was published in Second Language Research. VIEW ALL JOURNAL METRICS. Article usage * Total views and downloads: 155 * Article ...

  11. (PDF) Phonological features in infancy

    Features serve two main purposes in the phonology of languages: First, they de-. limit sets of sounds that participate in phonological processes and patterns (the. classificatory function); and ...

  12. (PDF) Phonetics and Phonology: Overview

    Phonetics subsumes the physical aspects. of speech production and their relation to speech perception, while phonology addresses. the functional and systemic nature of the sounds of particular ...

  13. Phonetic and Phonological Research on Native American Languages: Past

    4. Phonological research on Native American languages. Research on languages of the Americas has also played an important role in advances in phonological description and theory over the past century. Already at the turn of the twentieth century, Sapir was churning out remarkably detailed descriptions of phonological patterns in a variety of ...

  14. Research Article Phonological and phonetic contributions to perception

    Introduction. Perception of non-native phones is susceptible to native language influences at both phonological and phonetic levels. If a phonological contrast does not exist in the native language, listeners may have difficulty in identifying and discriminating it, e.g., native Japanese speakers have difficulty discriminating the English /r/-/l/ contrast, which is absent in Japanese (MacKain ...

  15. PDF Journal of Language and Linguistic Studies

    İbrahim Halil Topal / Journal of Language and Linguistic Studies, 15(2) (2019) 420-436 421 Underhill, 1994), the focus of this paper is to examine thoroughly the issues related to phonetics and phonology with regard to language learning and teaching from a CEFR perspective. However, the areas

  16. Significance of Phonological Features in Speech Emotion ...

    A novel Speech Emotion Recognition (SER) method based on phonological features is proposed in this paper. Intuitively, as expert knowledge derived from linguistics, phonological features are correlated with emotions. However, it has been found that they are seldomly used as features to improve SER. Motivated by this, we set our goal to utilize phonological features to further advance SER's ...

  17. Full article: Phonological Awareness and Alphabetic Knowledge in

    Abstract. The role of phonological awareness and alphabet knowledge in learning to read is well established in mono-lingual English speakers. However, it is under explored in the context of English Language Learners (ELL), especially in regions like India where the native language differs phonologically and orthographically from the target literate language, which is English.

  18. Phonological Features of Tone

    Phonological Features of Tone. William S-Y. Wang; ... Journal of Speech, Language, and Hearing Research 60, ... Nigeria, from 1973-1975 while I was employed as a Senior Research Fellow in the Centre for the Study of Nigerian Languages of Ahmadu Bello University., (Jan 1978): ...

  19. EIL Pronunciation Research and Practice: Issues ...

    Nevertheless, many EIL programmes have failed to give due prominence to and coverage of pronunciation as a result of EIL teachers' limited knowledge about English phonetics and phonology. While research on phonetic/phonological theory and EIL pronunciation features has proliferated over the past few decades, research findings have remained ...

  20. International Journal of English and Literature

    This paper attempts to describe the result of a data-based investigation of the phonology of the Basilectal Philippine English as a response to Tupaz&rsquo; (2004) challenge to conduct Philippine English studies that would describe not only the &ldquo;educated English&rdquo; (the acrolect and mesolect speakers), but the &ldquo;linguistic practices of genuinely marginalized voices (the basilect ...

  21. Effectiveness of Early Phonological Awareness Interventions for

    This article reviews research examining the efficacy of early phonological interventions for young students identified with Speech or Language impairments. Eighteen studies are included, providing results for nearly 500 students in preschool through third grade.

  22. Constraints on novel word learning in heritage speakers

    Introduction: Recent research on word learning has found that adults can rapidly learn novel words by tracking cross-situational statistics, but learning is greatly influenced by the phonological properties of the words and by the native language of the speakers. Mandarin-native speakers could easily pick up novel words with Mandarin tones ...

  23. Phonological Awareness as the Foundation of Reading Acquisition in

    Our research included 689 primary school students in first through third grade (Mean age 101.59 months, SD = 12,690). The assessment tools used to conduct this research include the Phonological Awareness Test and the Gray Oral Reading Test. According to the results from the present study, 13.7% of students have reading difficulties.

  24. Phonology

    Phonology. Phonology is the study of how sound is used in language. It looks at the patterns of sound in language and how they interact with each other. Phonology is important in helping us to understand how languages are structured and how they change over time. It is also important for the application of language in everyday life, such as in ...

  25. Speechreading, Phonological Skills, and Word Reading Ability in

    american journal of speech-language pathology (ajslp) journal of speech, language, and hearing research (jslhr) language, speech, and hearing services in schools (lshss) perspectives of the asha special interest groups; topics; special collections

  26. Full Transfer and Segmental Emergence in the L2 Acquisition of ...

    In this paper, we discuss a child Kazakh speaker's acquisition of English as her second language. In particular, we focus on this child's development of the English segments |f, v, θ, ð, ɹ, ʃ, ʧ|, which are not part of the Kazakh phonological inventory of consonants. We begin with a longitudinal description of the patterns that the child displayed through her acquisition of each of ...

  27. Full article: Disfluency in speech and language disorders

    Disfluencies can also be one of the symptoms of a speech or language disorder, such as stuttering, cluttering, Parkinson's disease, Alzheimer's disease, and many others. Such pathological disfluencies tend to have different characteristics when compared to typical speakers. Disfluencies in speech and language disorders have long been the ...

  28. Journal of Medical Internet Research

    Background: Fundus photography is the most important examination in eye disease screening. A facilitated self-service eye screening pattern based on the fully automatic fundus camera was developed in 2022 in Shanghai, China; it may help solve the problem of insufficient human resources in primary health care institutions. However, the service quality and residents' preference for this new ...