What Is the Agreement Method
Schlüter PJ A multivariate hierarchical Bayesian approach to measure matching in comparison studies of repeated measurement methods. BMC Med Res Methodol. 2009;9(1):6. Atkinson G, Nevill A. Comment on the use of concordance correlation to assess the correspondence between two variables. Biometrics. 1997;53:775–7. Barnhart HX, Haber M, Kosinski AS. Evaluation of the individual agreement. J Biopharm Stat. 2007;17:697–719. Compliance studies examine the distance between readings made by different devices or observers measuring the same amount.
If the values generated by each device are close to each other most of the time, we conclude that the devices match. In the literature, several different matching methods have been described in the mixed linear modelling framework, which are used in timed repeated measurements in subjects. If the assumptions described above are not valid, nonparametric methods should be considered. For example, Perez-Jaume and Carrasco propose a non-parametric alternative to the calculation of the TDI that is more stable and reliable than the parametric method when working with distorted data [30]. It is also relatively easy to calculate and less affected by outliers or extremes than the parametric approach. The method is simply to calculate the quantiles of an ordered list of matched differences to calculate the TDI. A bootstrap method can then be used to calculate the upper limit by resampling at the patient level and then recalculating the TDI for each new bootstrap sampling. This seems to be the same as a percentile method first described by Bland and Altman [5], except that in the case of repeated measurements, we use bootstrap resampling to get the upper limit. Although it does not assume a normal distribution, we must always assume that the paired differences are independent and distributed identically. Other non-parametric methods are available [31, 32]. Stevens [33] also developed a generalization of the probability of agreement based on the moment method, which does not require a distribution assumption for real values. Entirely Bayesian versions of the tuning limit method have also been proposed, for example Schluter`s Bayesian chord method [34].
In addition, Barnhart [12] and Barnhart et al. [11] an interesting method using generalized estimation equations to obtain a nonparametric estimate of CP. Recently, Jang et al. [35] proposed a new set of correspondence indices adapted to contexts where there are multiple assessors and heterogeneous variances. There are a plethora of methods for assessing continuous agreement in the literature, which differ in complexity and underlying assumptions. In this article, we looked at five different methods to analyze the same problem in agreement with pooled and unbalanced data. including some that are well known and frequently used in the literature, and others that include recent advances in agreement research. Note that the variance due to the random part ( {sigma}_{alpha}^2+{sigma}_{gamma}^2+{sigma}_{alpha gamma}^2+{sigma}_{alpha beta}^2+{sigma}_{beta gamma}^2+{sigma}_{varepsilon}^2 ) and the variance due to the fixed factor (device) ( {phi}_{beta}^2={sum}_{j=1}^2{beta}_j^2 ), which explains the systematic differences between the two devices. If the latter term is not included, the consistency between the devices and not their agreement is measured.
The total variance is then ( {sigma}_{alpha}^2+{phi}_{beta}^2+{sigma}_{gamma}^2+{sigma}_{alpha gamma}^2+{sigma}_{alpha beta}^2+{sigma}_{beta gamma}^2+{sigma}_{varepsilon}^2. ) with the square root of the total variance, which gives an estimate of the standard deviation to be used in the conventional Bland-Altman formula for compliance limits. The correlation coefficient of concordance was estimated to be 0.68 (95% CI 0.60 to 0.72). All confidence intervals were determined by a bootstrap procedure (at the individual level). The CCC is positive and the confidence interval does not contain zero or negative values, indicating that the tape recorder is slightly compliant with the reference device. A CCC value of 0.68 may be an acceptable agreement, but researchers should first agree on the CCC value required to conclude that the devices can be used interchangeably. Note that while this CCC is not much different from one that ignores the repeated measured nature of the data, the 95% confidence intervals, as expected, differ significantly. Although the CCC is not a graphical method, some diagrams can complement the numerical results. For example, a cloud of observation points of each device is plotted relative to each other, with a line superimposed on the diagram showing the perfect match line (i.e.
with intersection 0 and slope 1) (see Fig. 1). Or a Bland-Altman diagram could be used, showing the differences between the methods compared to the mean (Fig. 2). RAP is supported in this work by NHS Lothian through the Clinical Trials Unit in Edinburgh. We are grateful to Professor Michael Haber (Emory University) for providing an example of the R and SAS program code from one of his publications [27], which we modified and adapted to apply the individual chord coefficient method to our sample COPD in an earlier version of this article. Based on the studies described above, each of the five statistical approaches is summarized in Table 2. More statistical details related to these methods, additional diagnostic diagrams, and the R code used to produce the results are all included in the additional material. Haber M, Gao J, Barnhart HX. Evaluation of the agreement between the measurement methods from the data and the corresponding repeated measurements via the individual correspondence coefficient. J Data Sci. 2010;8( 3):457.
Lin L, Hedayat AS, Sinha B, Yang M. Statistical methods for evaluating the agreement: models, problems and tools. J Am Stat Assoc. 2002;97(457):257–70. When applying LoA, TDI or CP methods, the indication of the acceptable difference is required. It`s important to note that this is a contextual decision that should be made by an expert who knows what it means for devices to be virtually equal. Whether or not the differences between devices tend to fall into CAD depends on both the relative distortion between them and their accuracy. If the distortion and inaccuracy are sufficiently low (as determined by CAD), the devices can be used interchangeably for practical purposes. This is an important decision because a poorly specified CAD leads to incorrect conclusions about the degree of compliance. The five methods led to similar conclusions on the agreement between the devices in the COPD example; However, some methods focused on different aspects of the comparison between devices, and the interpretation was clearer for some methods than for others.
Carstensen B, Simpson J, Gurrin LC. Statistical models to assess the consistency of comparative studies of methods with replica measures. Int J Biostat 2008;4(1):16. Roy A. An application of the linear mixed-effects model to assess the agreement between two methods with replicated observations. J Biopharm Stat. 2009;19(1):150–73. In the case of repeated measurements, the application of standard data conformity limits leads to limit values that are too narrow because they do not take into account the reduction in variability that occurs when working with average values of measured values.
In this case, we must use a specially adapted version of the limits of the agreement, for which there are several methods. .