Predictive Modeling 101
How CMS’s Newest Fraud Prevention Tool Works and What It Means for Providers
By Susan E. White, PhD, CHDA
IN JULY THE Centers for Medicare and Medicaid Services (CMS) began applying predictive modeling techniques to Medi- care claims data to detect fraud. CMS contracted with Northrop Grumman, an information solutions company, to develop the technology. Northrop Grumman partnered with Verizon and National Government Services to develop the platform. 1 The application of predictive modeling technology to detect Medicare fraud was mandated by the Small Business Jobs Act of 2010, which required CMS implement a program in 10 states by July 1, 2011.2 However, CMS opted to implement the program nationwide at the outset. Organizations should understand how CMS’s newest fraud
prevention tool works and the changes it could bring to claims
processing.
What Is Predictive Modeling?
Predictive modeling applies statistical techniques to determine
the likelihood of certain events occurring together. Statistical
methods are applied to historical data to “learn” the patterns
in the data. These patterns are used to create models of what is
most likely to occur.
Predictive modeling is used by credit card issuers to determine if transactions are likely fraudulent. Customers who receive a phone call from their credit card company verifying that
they authorized a transaction were the subjects of a predictive
model.
For example, a customer’s typical credit card transaction is
$100. The credit card issuer notices that the customer submitted
three $5,000 transactions in one day. Given the customer’s history and the credit card issuer’s historical data regarding fraudulent transactions, those transactions look suspicious.
The credit card company may then put a hold on the card
and call to verify that the customer really did authorize the suspect transactions. The triggers that tell the credit card company
when to suspect a fraud issue are created via predictive modeling techniques.
Predictive modeling techniques use multiple data sources.
Data such as the provider’s claim history, the patient’s demographics and health status, the services included on the claim,
and the attributes associated with previously identified fraudulent claims may all be used to develop a statistical model.
Statistical techniques used to create the model may include
logistic regression, cluster analysis, or decision trees. All of these
statistical techniques allow the user to combine multivariate
historical data into a model that may be used to assess the probability or likelihood that current claims are fraudulent.
In logistical regression, the likelihood that a claim is fraudulent is estimated based on a series of historical data. In cluster
analysis, historical data are used to build a model that will measure the “distance” of a claim from the typical claims submitted
by that provider or for that type of service. Decision trees use a
series of screens or yes/no questions to determine the probability that a claim is valid.
The output of each of these methods is the probability of a
claim’s validity that is expressed as a score.
The claim score is typically structured so that it is directly related to the probability that a claim is in error. A high score may
indicate a high probability that a claim is not legitimate. If the
score meets a criteria (either above or below a cutoff value),
then it is identified as a potential error.
The criteria or cutoff value may be used to tune the model to
control the sensitivity and specificity of the model. If the cutoff