Statistical Techniques: Measurement System Analysis

ramosstarnesprojec
May 18
6 min read

Hello everyone, and welcome back to MedTech Compliance Chronicles! This week, we are continuing to build on some of the statistical techniques we have been discussing and exploring a fundamental pillar of statistical quality: Measurement System Analysis (MSA). Whether you're tracking process capability, verifying product specifications, or making go/no-go decisions on the production floor, your conclusions are only as good as the measurements behind them. Nearly every decision we make as quality and regulatory professionals hinges on the reliability of the measurements we take. But how can we be truly confident that our measurements accurately reflect reality, and are not simply adding noise or masking critical issues?

MSA provides the formal evaluation of a measurement system’s ability to produce consistent, accurate, and trustworthy data. In the medical device industry, where precision matters not just for operational efficiency but for patient safety and regulatory compliance, MSA is not just a best practice, it’s an expectation. In this post, we’ll explore how MSA is performed, its critical role in the medical device lifecycle, and why it remains one of the most defensible ways to ensure data integrity.

Performing MSA

Measurement System Analysis is the structured process of assessing whether the variation in your data stems from the parts or processes themselves—or from your measurement system. A complete MSA evaluates the full measurement setup: the instrument, operator, method, environment, and the parts or products being evaluated. The goal is to confirm that the measurement system introduces minimal error and is capable of supporting reliable decision-making, particularly for compliance-critical activities like inspection, validation, and release.

For variable data, where measurements are continuous, the most rigorous and widely used technique is the Gage Repeatability and Reproducibility (Gage R&R) study, particularly analyzed using the ANOVA (Analysis of Variance) method–there is an old school method with averages and ranges, but the ANOVA method is more robust and is just as easy to carry out as the averages and ranges method with modern computing software. To conduct an ANOVA Gage R&R study, start by selecting a representative set of parts (typically around 10) that cover the expected range of process variation. At least three appraisers (operators) should each measure every part in a randomized order multiple times (2-5), to minimize bias. It is important to take all measurements in a random order and completely ‘reset’ (zero) the measurement system between each measurement. For example, the operator selects part # 4 to take the first measurement, takes one measurement, and then selects another part for measurement. They DO NOT take all measurements of part # 4 back-to-back and then select another part.

Once the data is collected, ANOVA is used to determine the contribution of three main sources of variation: the parts themselves (part-to-part variation), the repeatability of the measurement device (equipment variation), and the reproducibility among the operators (appraiser variation), including any interaction between the operator and the part. The output of the ANOVA analysis provides variance components that quantify the impact of each source. These are often expressed as a percentage of the total study variation or a percentage of the tolerance. The goal of measurement is to determine the part-to-part variation. Variation contributed by the measuring instrument itself and operator performing the measurement are noise that could affect our ability to adequately determine the conformity of the part being measured. In general, it is desirable for the combination of equipment variation and operator variation to be 10% or less of the total variation, in some cases, up to 30% may be acceptable. If equipment and operator variation is more than 30% of the total, then you need to reevaluate the adequacy of your equipment or the training and competence of your operators.

When measurements are not numerical but instead rely on categorical outcomes like “pass” or “fail,” Attribute Agreement Analysis is used. The goal is to evaluate three things: how well each appraiser agrees with themselves (repeatability), how well they agree with each other (reproducibility), and how well they agree with a known standard or “gold standard” decision. Similar to Gage R&R, a set of parts is selected, ideally including a mix of clearly conforming parts, clearly non-conforming parts, and parts that are near the specification limits, as these are often where disagreements occur. Multiple appraisers evaluate each part multiple times. The analysis then calculates the percentage of agreement: within each appraiser across trials, between all pairs of appraisers, and between each appraiser's judgments and the standard values. A good attribute system should show high agreement across all three dimensions. Since attribute data does not lend itself to variance calculations, confidence in the measurement system is derived from consistency and correct classification rates. While less quantitative than Gage R&R, attribute studies are just as critical, especially in areas like visual inspections and manual test interpretation.

Beyond repeatability and reproducibility, a complete understanding of a measurement system's capability also requires assessing its accuracy over its operating range and its performance over time. Linearity studies are conducted to evaluate if the bias of the measurement system remains consistent across the expected range of measurements. This is typically done by measuring several parts with known reference values that span the operating range and plotting the bias against the reference value to identify any trend. Stability, on the other hand, examines whether the measurement system's bias changes over time. This is often assessed by repeatedly measuring a single master part or a stable artifact over an extended period and monitoring the measurements for any significant drift.

A related concept to stability, sometimes considered as part of long-term stability, is end-of-period reliability, which involves re-evaluating the measurement system's performance after a significant period of use or exposure to various conditions. These types of studies require a lot of historical data but can be used to justify adjustment of calibration intervals as behavior of the instrument at the end of its calibration interval becomes clearer.

MSA in Medical Devices

Measurement System Analysis plays a particularly critical role in the medical device industry, where the consequences of poor data can go beyond process inefficiencies and lead to patient harm, regulatory noncompliance, and costly product recalls. Because of this, both ISO 13485:2016 and global regulatory requirements emphasize the need for validated inspection, measuring, and test equipment. MSA provides the evidence that a measurement system is suitable for its intended use and capable of producing results accurate and precise enough to support the quality requirements of a device.

Measurement systems come into play at every stage of the product lifecycle. During design verification, for example, test methods used to verify design inputs must not introduce more variation than the design tolerances can accommodate. If you are using a peel tester to verify sterile barrier strength, a Gage R&R study ensures that measured differences reflect true differences in material strength, not operator handling or equipment noise. Incomplete or inadequate MSA during this phase can compromise the credibility of design verification data submitted in support of regulatory filings like 510(k)s or CE Marks.

In manufacturing, MSA becomes foundational to ongoing process monitoring and control. Whether you're measuring fill volume in syringes, performing torque tests on orthopedic devices, or conducting visual inspections of surface defects, your ability to claim process stability and capability depends on a reliable measurement system. A process capability index like Cp or Cpk is meaningless if the underlying measurements are not trustworthy. The FDA’s process validation guidance and ISO 13485:2016’s section on control of monitoring and measuring resources both require that organizations verify the adequacy of measurement systems, and MSA is one of the most direct and defensible ways to do so.

In nonconformance investigations, historical data often serves as the basis for determining the scope or timing of a problem. If your measurement system hasn't been validated, conclusions drawn from this data can be called into question. MSA strengthens internal investigations by ensuring that the data used to identify trends, determine root cause, or justify containment is statistically valid. In this sense, MSA supports not just detection but accountability and closure within a quality system.

Finally, as more medical device manufacturers move toward automation and in-line inspection systems, MSA practices are evolving as well. Automated vision systems, digital sensors, and software-driven test equipment still require evaluation for repeatability, reproducibility, and bias, often using digital log files and advanced statistical tools. The principles of MSA remain unchanged, but their implementation is adapted to modern technology and documentation practices.

Conclusion

Neglecting MSA is like putting a blindfold on the business. Unreliable measurements can hide real problems, chase non-existent ones, lead to inefficient processes and compromise quality. By investing the time and effort in conducting thorough MSA studies and acting on their findings, organizations can gain true confidence in their data. This confidence translates directly into better decision-making, more effective problem-solving, optimized processes, and ultimately, the consistent delivery of quality products and services.

Statistical Techniques: Measurement System Analysis

Performing MSA

MSA in Medical Devices

Conclusion

Recent Posts

Comments