In testing as in life, context is everything

by Katy Spink, Dark Horse COO

Recent events in the battle to contain the global COVID-19 pandemic have me recalling one of the most memorable lessons of my graduate school education, and thinking about parallels in Cell and Gene Therapy product development. During my graduate school experience, in a course on epidemiology through the medical school, we were looking at why widespread diagnostic testing is not always the right answer, using an illustrative case. My memory of the "ah-ha" moment I experienced is still crisp, decades later, because the stark differences found in our comparison was far beyond what my young grad-student instinct would have expected.

Using the real test sensitivity and specificity, and the HIV prevalence numbers of the time for each location to calculate “positive predictive value” (the percent of positive test results that represent “true positives” as opposed to “false positives”) we worked through the math, comparing what would happen if you tested every person who walked through the door at San Francisco General Hospital, vs doing the same for a hospital in Salt Lake City. There were, at the time, good diagnostic tests available with impressive sensitivity and specificity percentages, so the difference between those two locations was prevalence: how widespread the infections were.

HIV prevalence in San Francisco was high enough at that time that if you tested everyone, the majority of your positive tests would be “true positives” (people correctly identified as being HIV+). In contrast, due to much lower rates of HIV prevalence in Salt Lake City, even with a very accurate test, most positive test results would be “false positives” (individuals who were truly HIV negative, but were mistakenly identified by the test as HIV+). In other words, while blanket HIV testing of all patients would likely be useful to protect hospital staff in San Francisco, it would deplete resources and cause undue stress in Salt Lake City, based only on a pronounced difference in HIV prevalence between those two locations.

Developments in COVID-19 antibody testing over the past couple weeks have brought the memories of this critical lesson rushing back, and have also got me thinking about analogous situations in my chosen field, cell & gene therapy.

Six weeks ago, Anthony posted in this forum about the importance of supplementing PCR-based tests for active infection with serological tests for SARS-CoV-2 antibodies. Although we cannot be 100% certain that recovering from COVID-19 will confer immunity, at least a limited period of immunity post-recovery seems likely, based on prior experience with similar viruses. So why are so many experts cautioning against implementing widespread antibody testing at this time? And why is FDA increasing regulation on antibody testing at a time when we need to rush more tests to market as fast as possible? I recently dusted off the memory banks of that old graduate school lesson in thinking through these questions for myself.

Last week, FDA released the results of validation testing of the first twelve antibody tests to have received Emergency Use Authorization (EUA) under the new regulations. Although we do not have accurate measures of seropositivity prevalence in the US (and we certainly know that it will vary widely by region), FDA used a hypothetical rate of 5% to calculate theoretical positive predictive value (PPV) and negative predictive value (NPV) for each test, and also provided a link to a calculator that would allow individuals to calculate PPV and NPV themselves, based on input assumptions for test performance and population prevalence. The results were, in my opinion, quite striking.

Assuming that 5% of the population was truly antibody positive (likely an generous estimate for many areas of the country), even for the tests that met the bar for EUA, two had PPVs of approximately 50%. In other words, a positive antibody test using one of those methods had about the same accuracy as a coin toss in determining whether you truly had antibodies to SAR-CoV-2. Admittedly, the NPVs of those two tests are much better (99.6 and 99.7% at 5% prevalence), but given that many are discussing antibody testing as a potential gateway for reentry to society, I would posit that PPV is far more important than NPV in this context. A false positive result, in this case, could potentially result in people reentering society prematurely, putting themselves and others at risk. [Interestingly, for the PCR test for active infection, the opposite would be true, since a false negative would be the result likely to confer greater societal risk.]

So what happens to PPV for those tests if the population prevalence for antibody positivity is 20%, as was recently suggested to be the case for New York City? They climb significantly, to 80-85% (which, incidentally, is likely high enough to confer herd immunity in the population that would be released back into society based on a single test with these methods¹).

These questions of how good of a test method is good enough—and for that matter, on what metric should we even define ‘good’—come up often in our day-to-day work at DHC too.

As an example, let’s consider a purity assay designed to detect a rare cellular impurity in a mixed therapeutic cell population (e.g. residual undifferentiated cells that could confer teratoma risk in a pluripotent stem cell derived cell therapy; residual αβ T cells that could confer risk of GVHD in an allogeneic immunotherapy product). Now, let’s assume you’ve developed a very sensitive flow cytometry assay that can detect the concerning cell type at a rate as low as 0.1% of your overall cell population, and you’ve tested your product and you don’t see any of the concerning cells.
Clearly you’re fine, right? I mean, you have a highly sensitive assay indicating that the concerning impurity isn’t present in your cell population.

Or are you? Once again, context matters.

Context for your assay performance matters here because, in order to know whether the existing assay is ‘good enough,’ you need to understand how many of the concerning cells would be too many. Taking the example of the residual undifferentiated cells, this would be the cell number that confers risk of teratoma—a number which is well known to be highly dependent on manufacturing process and delivery site (once again, context!).

For the purposes of this discussion, let’s assume you have data (perhaps from a nonclinical spiking study) suggesting that delivering more than 10,000 cells of the concerning impurity would confer an undue risk. Just as the sensitivity and specificity required for a reasonable PPV in the SARS-CoV-2 antibody test depends on the context of the underlying population prevalence of SARS-CoV-2 antibodies, the required sensitivity (measured in this case as the lower limit of detection, or LLoD) of your assay to detect undifferentiated cell impurities will depend on the context of intended dose of your product. For a product requiring a low dose of 100,000 cells (e.g. for an ocular indication), 10,000 cells would be 10% of your delivered cell population, so your flow assay with the LLoD of 0.1% is more than adequate. In contrast, for a product with a much higher intended dose of 10⁹ cells (e.g. one for heart disease), that same assay can only tell you that you have fewer than a million undifferentiated cells within each dose—not even remotely good enough.

That's where we come in. You need to know what you want to measure, what your goal is, and what the variables at play for your particular therapy will be. Understanding context and therefore being able to match assay type to diagnostic need is one of our specialties, thanks to DHC analytical development experts. Not all tests are created equal, and we make sure that we understand this detail so that you don't have to.

Oh, and I haven’t even circled back yet to the question of “on what metric should we even define ‘good’?” Well, I’ve got homeschooling of children to get to, so let’s file that one under #potencyassay and save it for another day, shall we?

1assuming for the moment that antibody positivity equates to immunity

In testing as in life, context is everything

Categories

Related Posts

Roadmap to FIH: Charting a Path to Success

QMS in a Box: A Phase-Appropriate Quality Management Starter Kit

Kim Benton on the Commissioner's National Priority Voucher (CNPV) Program

Don Fink makes the case for a Conditional Use Authorization Pathway

Join the Quarter Horse Newsletter