Behavioral Based Rules Engine for Pricing w/Explainable AI
I had an idea. Actually it was just a thought. Then it became a question. The answer to the question was nurtured into an idea. The idea was given life thanks to an amazing law firm specializing in patent law and one of the very intelligent executives working for the largest privately owned insurance agencies in the US. And while we don’t communicate regularly, I consider him to be a good friend (I define that by the quality not quantity of our engagements). The following is some of my original thoughts on using behavioral data to price personal lines risks. I updated certain ideas at the beginning of 2023 to address XAI and again in 2024 to consider the exponentially improved techniques in training.
If you want AI to be explainable it has to understand how to explain. This is not returning words with the highest probability based off of a cosign. We allow AI to self train based on the concept of providing the token that would cause the least amount of discomfort. It is my belief that a constitutional AI model using synthetic data to self train will produce the best results and the least amount of biases. Regardless of the method, the model will still need to be reviewed for any inherent biases that may show up in the formation of rates.
- Leverage Anthropic’s Constitutional AI approach to ensure Claude’s training data and objectives are aligned with safety and ethics principles. This provides a strong foundation.
- Use techniques like LIME (Local Interpretable Model-Agnostic Explanations) to generate explanations for individual predictions made by the model. This helps determine what factors contributed most.
- Visualize and analyze the attention layers in Claude to see what input data is being focused on during predictions. The attention weights indicate feature importance.
- Build in tracing capabilities to follow the step-by-step logic behind each prediction. Maintaining an audit trail improves transparency.
- Test the model’s responses to counterfactual examples that systematically alter inputs. Observe how outputs change to better understand causal relationships.
- Evaluate model performance across stratified data slices to check for variability or bias related to key variables like age, gender, etc.
- Document the training process thoroughly, highlighting ethical considerations and how potential risks were mitigated.
Stratifying the Data:
Stratifying data involves organizing and analyzing data into distinct groups or layers based on specific variables or characteristics. It allows for a deeper understanding of how different factors may impact outcomes and helps to identify any variability or bias that may exist within the data. I’m not an actuary and I have no desire to become one and I’m still very surprised that people do however I believe by stratifying the data we can better assess the relationships between variables and ensure that any patterns or trends are accurately represented in the analysis.Stratifying the data can help to evaluate model performance across different demographic groups, such as age and gender, and ensure that the AI system is making fair and unbiased predictions. This process involves thorough documentation, testing, and monitoring to maintain transparency and accountability in the decision-making process.
- Slice the existing policy/customer data by factors like sex, marital status, vehicle make, age group, claims history, violations, etc.
- Individuals will likely belong to multiple overlapping slices (e.g. female, married, Honda Civic, age 35–44, 1 claim)
- This creates many distinct sub-populations to analyze
- Incorporating Behavioral Data:
- Layer in additional behavioral data sources like telematics, streaming patterns, membership rewards, etc.
- Look for patterns that emerge when combining the traditional rating factors with these behavioral variables within each stratified slice.
- Explainable AI Modeling:
- Build interpretable machine learning models (e.g. decision trees) on these stratified slices
- The models can identify important variables and relationships driving predictions
- Techniques like SHAP values can explain why certain predictions were made
SHAP values, or Shapley additive explanations, are a technique used in explainable AI modeling to provide insights into how a model makes predictions. SHAP values assign a value to each feature in a prediction, indicating how much each feature contributes to the final prediction. By analyzing these values, we can understand the importance of each feature in the model’s decision-making process and gain insights into the factors driving the predictions. This helps to enhance the transparency and interpretability of the AI model, making it easier to understand and trust the reasoning behind its predictions. In this process, I’ve believe every insurance provider will have its own unique values as we consider their current model incorporated into the final rate formulations
- Rating Impact:
- By understanding factor importance and interactions within slices, you can adjust rating plans accordingly
- Underwriting rules and pricing can be easily tuned for different customer segments
- Discounts or loadings may apply based on revealed behavioral patterns
The key benefit is the ability to find granular patterns that a one-size-fits-all model may miss. Certain variables may only exhibit predictive power for specific sub-populations.
The key is balancing prediction accuracy with interpretability, while responsibly handling sensitive data. This requires diligence, testing and of course transparency. I believe I’m aligned with what the NAIC as documented for the necessary steps to ensure consumer safety and transparency.
- Establish a robust data governance process for vetting and documenting any potential new third-party data providers or data types that may get incorporated into the rating engine in the future.
- For each proposed new data source, conduct thorough analysis on its predictive value, stability, and potential fair lending implications before integrating it into the models.
- Maintain a data dictionary and traceability that clearly defines each data element, its source, and which rating factor(s) it maps to from the state filings.
- Leverage interpretable machine learning techniques like decision trees or logic rules that provide clear explanation capabilitiy.
- Implement model monitoring to detect if a new data source introduces unintended bias or adverse impact across different demographic groups.
- Only after this due diligence, submit a new rate/rule filing to DOI detailing the additional data source and relevance to the rating plan.
- Ensure the ability to run the models in backtesting mode separating old vs new data, to isolate impacts before production deployment
Introducing new data sources would trigger a full transparency cycle — but only updating the existing filed rating factors based on that data would not require a re-filing.
New Policy Premiums:
- When a new customer requests a quote, the system collects submitted information from the consumer
- It cross-references and enhances this data by querying the third-party data to build a comprehensive profile
- The AI rating engine analyzes all this data and detects behavioral patterns/risk signals
- Based on the current “state” of the machine learning models, a customized premium rate is generated reflecting the customer’s risk assessment
Renewal Adjustments:
- For current customers up for renewal, their premium can be adjusted up or down
- Major events like accidents or violations act as triggers to re-evaluate their risk profile
- Their premium gets re-rated based on updated data sources since their last renewal
- The models can identify behavioral changes that increase/decrease risk and price accordingly
Key Advantages:
- Premiums are constantly adapting to individual risk-linked behaviors over time
- No explicit rate filings needed as it’s portfolio-rated using sophisticated models
- Pricing is highly dynamic, granular and reflective of current predictive signals
- Adjustments can be implemented in real-time at renewal instead of waiting for approval cycles
Potential Challenges:
- Ensuring models remain unbiased and explainable as data sources increase
- Compliance with data privacy and governance for aggregating all these data types (is it possible to use Blockchain technology to create a secure digital contract between the data providers and the insurance provider so that PII is never revealed or associated to an individual on the insurer’s side)
- Clearly communicating rating factors/logic to customers for adjustments (I think full transparency, even if the consumer does not understand what those factors are, is still the best course.)
- Having effective model monitoring and overrides for any unintended distortions.
Why spend money on a patent if you are not going to protect your intellectual property. Because the industry is moving too slow. Because it was never just about me. Because my brain is consumed with other AI concepts. Because of the early 90s Dallas Cowboys. Yes, my beloved Cowboys. The Triplet Cowboy Era is a lesson in life. That Cowboy dynasty was interesting. We didn’t have the most talented running back (Sanders), the most efficient quarterback (Young), or the greatest receiver (Rice). We also had a playbook that was widely known. What everyone didn’t count on was our ability to consistently execute. I’m counting on that as well.
So do you think we will transition to true behavioral based models?