The core idea
Propensity scoring for inbound calls is a pre-auction quality prediction. Before a call enters a real-time bidding auction, a model evaluates available metadata about the caller and the call context, then outputs a score -- typically 0 to 100 -- predicting how likely this call is to convert into a sale.
A caller with a propensity score of 92 is almost certainly worth bidding on. A caller scoring 34 is probably not. The score lets you make informed bid decisions in milliseconds, before you commit any budget.
Think of it as underwriting for calls. Insurance companies don't write every policy at the same premium. They assess risk factors and price accordingly. Propensity scoring does the same thing for call buyers -- it assesses conversion likelihood and lets you bid accordingly.
What signals go into the score
Propensity models consume every piece of metadata available at auction time. The caller hasn't been connected yet, so the model can't use anything from the conversation itself. Instead, it works with contextual signals:
Geographic signals
Caller location is one of the strongest predictors of conversion. The model evaluates:
- State and ZIP code: Medicare calls from Florida convert at dramatically different rates than calls from Montana. The model knows this from historical data.
- Metro area: Urban callers in large metros often have different intent patterns than rural callers.
- Geo-source combinations: A caller from Tampa who arrived via a Google search for "Medicare Advantage plans near me" has a very different profile than a Tampa caller from a display ad retargeting campaign.
In practice, geographic signals alone can explain 30-40% of the variance in conversion rates across a large call dataset.
Traffic source signals
Where the caller came from before dialing tells the model a lot about intent:
- Paid search (Google, Bing): Callers from paid search typically have the highest intent. They were actively searching for a product or service.
- Paid social (Meta, TikTok): Lower intent on average, but specific campaign types (lead forms, click-to-call) can produce high-quality callers.
- Organic search: High intent, but harder to attribute and optimize.
- Display/programmatic: Generally lower intent. Callers from display retargeting are better than cold display prospecting.
- Direct/referral: Varies widely. Some referral sources produce excellent callers; others are near-zero quality.
The model doesn't just categorize by channel. It tracks performance at the publisher and sub-source level. Publisher A's Google traffic might convert at 28%, while Publisher B's Google traffic converts at 9%. The model learns these differences.
Temporal signals
When a call happens predicts quality:
- Time of day: For most B2C verticals, calls between 9am and 3pm local time convert at higher rates than evening calls. The model quantifies this by vertical and geo.
- Day of week: Tuesday through Thursday typically outperform weekends for insurance and financial services. Home services calls peak on Monday mornings (things break over the weekend).
- Seasonality: Medicare AEP calls in October have different quality profiles than off-season calls. The model adjusts scoring based on enrollment period timing.
Historical patterns
The most powerful signals come from historical data:
- Caller area code history: If the model has seen 500 previous calls from area code 813 (Tampa) for Medicare, and 38% converted, that baseline informs the score for the next 813 call.
- Source-geo interaction history: The model tracks conversion rates for every combination of source, geo, and daypart. A Google call from Tampa at 11am on a Tuesday might have a historical conversion rate of 52%, while a Meta call from the same geo at 8pm on a Saturday might be 6%.
- Repeat caller detection: If a caller has been seen before (same ANI), the model can factor in prior call outcomes. A repeat caller who previously had a 6-minute conversation but didn't convert might be more likely to convert on a follow-up call.
Fraud and compliance signals
The model also flags risk:
- Known robocall patterns: Calls from ANIs associated with robocall databases get scored down.
- Spoofed caller ID: Certain metadata patterns indicate spoofed numbers.
- TCPA risk indicators: Calls that show patterns consistent with lead generators using aggressive or non-compliant practices get lower scores.
How scoring affects bidding
The propensity score translates directly into bid strategy. Here's how a typical implementation works:
Bid multipliers
You set a base bid -- say $24 for Medicare calls in Florida. The propensity score applies a multiplier:
| Propensity Score | Multiplier | Effective Bid | Logic |
|---|---|---|---|
| 90-100 | 1.4x | $33.60 | High-intent caller. Worth paying premium to win. |
| 75-89 | 1.15x | $27.60 | Above-average quality. Bid slightly above base. |
| 50-74 | 1.0x | $24.00 | Average quality. Bid at base rate. |
| 30-49 | 0.6x | $14.40 | Below-average quality. Bid conservatively. |
| 0-29 | 0x (skip) | $0 | Low quality or fraud risk. Don't bid. |
With this structure, your budget flows disproportionately toward high-converting callers. You're not just buying calls -- you're buying the right calls.
Real numbers
Consider a scenario where you buy 1,000 calls per week without propensity scoring, all at a flat $24 bid:
- Total spend: $24,000
- Conversion rate (average): 14%
- Conversions: 140
- Effective CPA: $171.43
Now apply propensity scoring. Same 1,000 call opportunities, but you skip the bottom 20% (scores below 30) and shift budget toward the top tier:
- Calls purchased: 800 (skipped 200 low-quality)
- Avg bid (weighted by score): $26.40
- Total spend: $21,120
- Conversion rate (scored pool): 19.2%
- Conversions: 154
- Effective CPA: $137.14
That's a 20% CPA reduction with 10% more conversions and $2,880 less in total spend. The math works because you're cutting waste from the bottom and concentrating budget on callers who actually convert.
The feedback loop
Static scoring models degrade over time. Markets shift, publisher quality changes, seasonal patterns evolve. The real power of propensity scoring comes from the feedback loop.
How it works
- Call happens. You win an auction and receive a call.
- Outcome recorded. The call either converts (sale) or doesn't. You report this back to the platform.
- Model updates. The scoring model incorporates this new data point. It adjusts the weights for the specific combination of geo, source, time, and other features that characterized this call.
- Future scores change. The next time a caller with similar characteristics enters an auction, the propensity score reflects the updated model.
This cycle runs continuously. With enough volume (typically 500+ calls per week per vertical), the model can detect quality shifts within days. If a publisher's traffic source changes and their calls start converting at half the previous rate, the propensity scores for calls from that publisher will drop within 48-72 hours -- long before a human reviewing weekly reports would catch it.
Cold start
New campaigns don't have historical data. The model handles this with a layered approach:
- Layer 1 (global baseline): Average conversion rates by vertical, geo, and source across all buyers on the platform.
- Layer 2 (vertical cohort): Conversion rates from buyers in the same vertical with similar campaign profiles.
- Layer 3 (your data): As your conversion data accumulates, the model increasingly weights your specific outcomes.
Most models reach useful accuracy (meaningful separation between high and low scores) within 200-300 calls of conversion data.
Comparison to display ad quality scoring
If you come from programmatic display advertising, propensity scoring for calls will feel familiar. It's the equivalent of pre-bid brand safety, viewability prediction, and audience quality scoring combined.
In display, DSPs evaluate ad impressions before bidding: Is this a real user or a bot? Is the ad viewable? Does the user match the target audience? Display quality scoring uses signals like page content, user behavior history, device type, and time on site.
Call propensity scoring uses the same conceptual framework but with different signals. Instead of page content, it uses caller geo. Instead of user cookies, it uses ANI history. Instead of viewability, it predicts conversion.
The key difference: call propensity scoring has a tighter feedback loop. Display attribution is often probabilistic and delayed. Call outcomes are binary (converted or not) and typically known within hours. This means call scoring models can learn faster and achieve higher accuracy than display quality scores.
What to look for in a scoring system
Not all propensity scoring is equal. When evaluating a platform's scoring capabilities, ask:
- What's the feature set? Models using only geo and source are far less accurate than models incorporating temporal patterns, ANI history, and publisher sub-source data.
- How fast does the model update? Daily model refreshes are adequate. Real-time streaming updates are better. Weekly or monthly refreshes are too slow.
- Can you see the score? Some platforms score calls internally but don't expose the score to buyers. You should be able to see the score for every call, and understand which features drove it.
- Does it integrate with your bid strategy? Scoring is only useful if it connects to automated bid adjustments. If you have to manually review scores and adjust bids, you've lost most of the value.
- Is there a feedback mechanism? The platform should accept your conversion data and feed it back into the model. If it doesn't, the scores are static and will degrade.
Propensity scoring is the single highest-leverage feature for reducing CPA in call buying. It shifts your spend from volume-based purchasing to quality-based purchasing, and the feedback loop ensures it gets smarter every week.