Recent papers with perspectives on CORE translations (1) Di Biase et al., 2021 : Clinical Outcomes in Routine Evaluation (and CST)

Created 15/9/21.

For several years now I have been wanting to write something about translating CORE self-report measures (and any psychological questionnaire). I wanted to write something less formal than an academic paper, something more readable for clinicians and perhaps people just interested in questionnaires and in languages. These might give birth to a formal paper later but at last I can start. I can start because the latest in a set of academic papers has been published now so I know these will sit on peer-reviewed, empirical work. The crucial paper that came out (today I think) is:
Di Biase, R., Evans, C., Rebecchi, D., Baccari, F., Saltini, A., Bravi, E., Palmieri, G., & Starace, F. (2021). Exploration of psychometric properties of the Italian version of the Core Young Person’s Clinical Outcomes in Routine Evaluation (YP-CORE). Research in Psychotherapy: Psychopathology, Process and Outcome, 24(2). https://doi.org/10.4081/ripppo.2021.554. I think that doi should take you to the paper which is open access so anyone can read it for free.

As the title says, this paper is about the Italian translation of the 10 item YP-CORE designed for young people, teenagers. the original measure was created in English and this is one of eight translations of it so far. I’m starting this little series of blog posts with this paper because in some ways it’s a classical, quantitative psychometric paper. Bear with me as the next section is heavy on quotes from the paper and may feel as if I’m utterly failing to get out of the formal academic paper mode. I hope the interpolated comments explain why I think this was a very good, careful, honest paper. If you only skim the quotes I think you’ll get the gist of it.

The context was that the enormous work of Dr. Rosalba Di Biase, and that of many of her colleagues, supported the collection of a non-help-seeking sample of 206 11 year school students and a separate sample of 175 young people seeking help. Post-intervention scores were available for 74 of the 175 help-seeking young people. That data allowed us to look at internal reliability, the difference between the mean of the help-seeking and the non-help-seeking samples, sensitivity to change and to do a factor analysis of the baseline item data.

However, one reason I like this paper is that we don’t ignore that language and translations are qualitative issues as well as quantitative. I think we give a good quick summary of the intensive work that created the translation:

The Italian YP-CORE translation was conducted in line with current recommendations (Prakash, Shah, & Hariohm, 2019) and to the specific protocol required the CORE System Trust (CST, 2015; https://www.coresystemtrust.org.uk/translations/) using a mixed translation procedure of forward and back translation, group review and field testing (see Yassin & Evans, 2021 for a complete account of this process). For the Italian YP-CORE independent forward translations were produced by nine mental health professionals (five female and four male). An independent back translation was than produced by a professional bilingual translator. The translations obtained were compared and the final version was reviewed by a member of CORE System Trust (CE). The quality of the translation was also verified through a group of ten Italian-speaking, teenagers aged 14-17 (four female and six male all of medium socioeconomic background). They were asked to rate each item’s comprehensibility a three level scale. Eight of the adolescents rated all the items as easy to read and comprehensible (score 3); one adolescent rated item 5 as unclear (score 2) and one more rated it as not at all clear (score 1). This led to small changes to item 5 to align the language more closely to that of adolescents.
Di Biase et al., 2021, pp. 232.

And later in the paper:

Internal consistency, based on Cronbach’s alpha, showed adequate to good inter-item reliability. However, internal consistency in the non-clinical sample is slightly lower than reported by Twigg et al. (2016). This is largely related to item 5 (‘There’s been someone I felt able to ask for help’ in English). Translation of this item into Italian was not easy and several versions of it were explored but the chosen translation, ‘Mi sono sentito di chiedere aiuto a qualcuno quando ne ho avuto bisogno’, was considered complicated by some younger adolescents and only eventually chosen as a ‘least problematical’ translation when it was accepted that there seemed to be no perfect translations that would work across the entire age range. Omitting this item would produce a fractionally higher Cronbach alpha but it was retained to maintain comparability of domain coverage with the YP-CORE in other languages.
Di Biase et al., 2021, pp. 235-236.

My Italian runs to a few polite words and just about enough to order drinks and simple food but I loved sitting in on some of the translation process and being asked, for a few items, why it is worded the way it is in English and an interesting question and answer, or usually, questions from the translating team and more questions back from me, led to serious work on alternative ways to say things in Italian and the first quote above clarified the subsequent qualitative work, in a different region, which honed that item. We address this item in the discussion:

This suggests that YP-CORE is an acceptable tool for young people with age appropriate wording and able to be completed quickly and easily. Internal consistency, based on Cronbach’s alpha, showed adequate to good inter-item reliability. However, internal consistency in the non-clinical sample is slightly lower than reported by Twigg et al. (2016). This is largely related to item 5 (‘There’s been someone I felt able to ask for help’ in English). Translation of this item into Italian was not easy and several versions of it were explored but the chosen translation, ‘Mi sono sentito di chiedere aiuto a qualcuno quando ne ho avuto bisogno’, was considered complicated by some younger adolescents and only eventually chosen as a ‘least problematical’ translation when it was accepted that there seemed to be no perfect translations that would work across the entire age range. Omitting this item would produce a fractionally higher Cronbach alpha but it was retained to maintain
comparability of domain coverage with the YP-CORE in other languages.
Di Biase et al., 2021, p. 236.

Then I am proud that we were honest about things and never tried to pretend things were cleaner than they were:

This was an unfunded study so the sampling frames were convenience frames based on the locations of the authors but chosen to cover the full age range of 11 to 17.
Di Biase et al., 2021, p. 232.

… and …

The unfunded nature for the research meant that the non-clinical samples are opportunistic not rigorous random samples of the population.
Di Biase et al., 2021, p. 233.

However, we also get pretty technical:

Inter-item correlations across the n=376 full baseline sample with complete data ranged from 0.04 to 0.63, and 41 out of 45 correlations were statistically significant. The one-factor solution showed a good fit for the YP-CORE, raw χ2(35)=33.9, P=0.52; CFI=1.00; TLI=1.00, RMSEA <0.001, 90% CI for RMSEA = 0.00-0.036. Standdardized factor loadings are reported in Table 5. Despite the excellent fit, adding a second factor with the negatively cued items loading on the first factor and the negatively cued items loading on the second factor showed an even better fit: raw χ 2 (34)=22.8, P=0.93; CFI=1.00; TLI=1.01, RMSEA<0.001, 90% CI for RMSEA = 0.00-0.012 and the difference was statistically significant robust raw χ 2 (1)=14.2, P=0.0002. Both one and two factor solutions are shown in Table 6 and Figure 6.
Di Biase et al., 2021, p. 235.

… and pick that up in the discussion …

The Confirmatory Factor Analysis showed perhaps surprisingly good fit to both the one-factor model proposed by O’Reilly et al. (2016) and the two-factor solution found by Twigg et al. (2009). The strong fit reflects the use of the relatively recently developed diagonally weighted least squares (DWLS) estimation instead of the traditional maximum likelihood (ML) estimation. ML estimation is sensitive to deviations from Gaussian distributions which are inevitable with short ordinal response scales. That the two factor model fits statistically significantly better than the one factor model is interesting. This can be interpreted as a method factor (or response set: some people answering more positively to positively cued items than negatively ones despite sharing the same general well-being as those less affected by the cueing). Alternatively, it can be interpreted as reflecting what is often thought to be genuine, if small, multidimensionality of psychological states and traits in which positive and negative aspects are strongly correlated (here R=0.71) but not
identical. The debate between the two interpretations has waged inconclusively for many years, see e.g. (Carmines & Zeller, 1976) for an early example arguing the effect is probably a response set versus most of the literature about the positive and negative affect scales (PANAS) for the opposite position (Watson, Clark, & Tellegen, 1988). We are agnostic about these positions but believe that the 0.71 correlation, and the pragmatic and comparative utility of staying with a single score, justify retaining a single score across all items for the Italian YP-CORE.
Di Biase et al., 2021, p. 236-237.

That seems to me to avoid the idealisation and reification, deification perhaps, of factor analytic findings in many psychometric papers.

Let’s be less academic, scholastic …

I think this paper addresses what matters about a translation of a measure:

There are no perfect translations. Translations is a very complex, entirely qualitative process.
You can maximise its likely capacity to keep the meaning of the original with a thorough translation process or protocol (more on that in the next blog post) …
… but there are no perfect translation protocols that can guarantee the best possible translation (but the old forward translation and independent back-translation process is terrible!)
That process should include what I call “qualitative field testing”: it is lay people most typical of the intended users you must involve even if they don’t speak a word of the original language of the paper as theirs may not be the language of bilingual people, experts and highly educated people
You may find despite all this that you have a choice between dropping an item that might improve a quantitative psychometric value from the data, here Cronbach’s alpha for baseline internal reliability, versus thus losing domain coverage and comparability with the original and other translations. Fortunately for us, the change in alpha was sufficiently small this was not a difficult choice.
You may find that factor analysis suggests, as it clearly does here, that you don’t have a unidimensional measure. However, again I feel we had a fairly easy decision again to recommend anyone using the Italian YP-CORE stays with scoring it with a single score as the dimensionally purer alternative of having a score from the positive items and from the negative items leaves you with reduced internal reliability on both (markedly for the positive items) and as the two dimensions correlate .71 here.
You really do want sufficient baseline internal reliabiliity to suggest that, though not unidimensional, there is a lot of shared variance: the ten items are not inappropriately put together. Again, this is partly a qualitative issue: do therapists and clients think they are sensible items for the purpose. (Not covered in the paper but they do!)
You also want the measure, as it’s to be used as a measure of change as well as for comparisons between individuals and between groups of individuals, to show sensitivity to change: it did

I love this paper and wish more psychometric explorations of therapy change measures took a similarly careful, honest and fundamentally pragmatic, not scholastic, almost numerological, approach to our tools and particularly to translating them. Huge thanks to Rosalba, Dr. Di Biase to go back to formal, to all her colleagues and to the many young people who made all this possible.