Predictive Lead Scoring Explained

Written by
Tony Yang
Revenue Operations

Ahh, the “Demand Waterfall”. Otherwise known as “The Funnel,” this framework introduced by the research and advisory firm SiriusDecisions (now part of Forrester Research) – with various iterations of it over the past decade – defines a shared view between marketing and sales of the lead management process. 

SiriusDecisions Demand Waterfall
*Image Source: Forrester Research

Forrester B2B Revenue Waterfall
*Image Source: Forrester 2021 B2B Revenue Waterfall.
Putting Forrester’s B2B Revenue Waterfall Into Action: Six Tips For Laser-Focused Targeting

The context for commonly-used terms in B2B like “marketing qualified leads” (MQLs) and “sales qualified leads” (SQLs) is found within this framework. Performance of marketing and sales teams are typically measured by KPIs aligned along the funnel, with target/quota metrics that are calculated (or oftentimes assumed!) by backing in from the bottom of the funnel (i.e., closed won customers) all the way to the top (i.e., inquiries or prospects). Many demand gen marketers will then plan campaigns and programs that are meant to drive conversions throughout the funnel with the aim at meeting or exceeding these performance goals.

Why Lead Scoring Exists

However, about 98% of MQLs never result in closed business. If this is the case, and if we have a certain number of wins we need to acquire at the bottom of the funnel as our target, then shouldn’t marketing just focus on getting more leads at the top of the funnel? With the growth of marketing technology, marketers have gotten better and better at creating larger and larger volumes of leads. The problem is, of course, that not all leads convert, and there are numerous reasons why. 

What ultimately happens is that in order for marketing to meet their MQL targets, they generate more and more leads. They throw them over the wall to sales in the hopes that, if conversion rates stay the same, then more will flow through the middle and bottom of the funnel. Oftentimes this does seem to be the focus or modus operandi for demand gen marketers, likely due to the expectation of marketing set by CEOs and CROs who don’t truly understand the value of marketing and think of growth levers in binary and simplistic terms. 

In reality, this torrent of leads contains a high amount of crappy leads with some great-fit leads sprinkled in here and there without any reliable way to identify them. Not to mention, prospects can be at different stages of their buyer journey or decision-making process. So, it’s not uncommon to hear sales tell marketing that the leads they’re passing over are crap, or that they are just “tire kickers”. 

As a result, sales begin to lose trust in marketing, and marketing begins blaming sales by saying that they don’t follow up on the leads. It’s a vicious cycle that continues to feed into the stereotypical misalignment between marketing and sales. It’s no wonder then that teams of junior sales or business development reps (SDRs and BDRs) are now hired to crank out outbound and cold-calling campaigns using sales engagement tools such as Outreach or SalesLoft – which, by the way, is pretty much providing similar functionality to marketing automation platforms (MAP) for marketers 10 years ago…i.e., automated emails. Executive leadership then views go-to-market as inbound (marketing) versus outbound (SDR/BDR), which inadvertently (or sometimes purposefully) creates more competition, silos, and misalignment between the two functions.

The fact of the matter is that there is still a lot more wasted effort on poor leads than on the good ones. So as a way to prioritize leads for follow-up, the concept of lead scoring was born to help rank leads in order to determine how qualified they are…hence “marketing qualified leads” or MQLs.

The Basics of Lead Scoring

These days, most marketing operations professionals tend to set up lead scoring along two major factors to determine a lead’s score:

  • Fit: Scoring on dimensions such as job title, industry, company size, annual revenue, number of employees, etc. This determines if the prospect fits the company’s definition of an ideal customer profile (ICP). Some marketers (myself included) like to split this up into two “fit” scores – one for company fit and another for persona fit.
  • Behavioral: Engagement with content such as click on emails, content downloads, website visits, webinar attendance, etc. These activities are meant to understand a prospect’s level of interest.

With greater adoption of marketing automation, the built-in functionality for lead scoring in these tools are used as an attempt to predict which leads are more qualified than others and have a higher propensity to convert to the next funnel stage. I say “attempt” because while lead scoring that occurs in MAPs today has provided tremendous benefits, most implementations of lead scoring oftentimes don’t do a great job of providing a signal for purchase intent.

The idea of lead scoring isn’t broken, and implementing the mechanics of a very basic scoring system in your marketing automation tool is easy enough. In fact, many marketing automation systems do a great job at tracking a prospect’s demographic, firmographic and behavioral data that can be used to build a scoring model upon.

Why Rules-Based Lead Scoring Is Difficult

Setting up a lead scoring system that actually works takes A LOT of time and effort to get it right. If you’ve ever taken part in implementing a lead scoring model, you’ll know that to be true. 

Lead scoring models are based on rules defined by the person setting it up using data that’s available within the marketing automation platform. The challenge is that it’s difficult to determine what combination of profile and engagement data indicates a true fit and propensity to buy.  

This is particularly daunting for marketers and marketing ops people who are just starting off because it’s hard to know exactly what data points to base the scoring upon, how much weight to assign to these data points, what those scores actually mean, and how to utilize those scores effectively. This becomes even more complex as prospect behavior isn't static and changes over time, and also because organizations lack the data to make these models work and the predictive power in order to validate them.  Let’s dive deeper into some of these reasons.

Not Enough Data Is Available For Accurate Scoring

First of all, you may not have enough data in order to build a precise scoring model off of. Unless you have a data enrichment tool as part of your tech stack, much of the data needs to be captured upon form fills on a website or landing page, and the types of data are generally basic contact and demographic info about the lead or company such as company size, industry, and job titles. 

Also, the challenge is that there may be certain specific characteristics that make a potential prospect a better fit than others. I remember at one of my former companies, part of our ideal customer profile (ICP) definition was whether or not an enterprise brand was working with a marketing agency and spent a certain amount per year in media buys (e.g., ad spend). 

Now if I didn’t have access to a data vendor that provided this specific type of data (I actually did), then I would’ve had to resort to asking for this information on our webforms. Best practices tell us that the shorter the form, the better the on-page conversion. However, the less data you collect, the less you know about your prospect. In addition, this is assuming that the prospect is not entering any incorrect info, either purposefully (e.g., “Annual Revenue = Under $1 million” when it’s actually a Fortune 100 company) or unwittingly (“fat-fingering” it while typing on a mobile device).

Data Within CRM Is Often Not Accurate, Standardized Or Up-To-Date

Data often found in CRM and marketing automation systems get stale and are usually filled with errors or contain variations of the same information. Typically, after a lead or a contact is created in the CRM, sales reps rarely update them with new information. However, people change jobs or get promoted, companies grow or launch new products, needs change over time, companies merge or get acquired, etc. Customers and prospects keep changing but the data in internal systems that is used for targeting don’t get updated, unless you have a data enrichment vendor that updates records in your database on a regular basis. 

In addition, most CRM data is not clean or normalized on a regular basis. When multiple people enter data over a long period of time, it becomes less and less accurate...especially without any data validation or management rules in place. For example, a data field for Country can include “US”, “U.S.A.”, “United States”...which means the same thing to a human but from a data perspective these are all different values.

If you do have a data enrichment provider, it doesn’t necessarily mean that all your data challenges are solved. Despite the claims of data vendors, no single vendor will have 100% data accuracy AND data coverage. I remember another instance where I had what was considered to be the “best” data vendor on the market as part of my tech stack. When I dug into records that this platform was enriching, I noticed that some values for certain data fields were missing. For whatever reason, the vendor enriched the Microsoft record with 1-10 employees under that data field, and for other records the data vendor returned a null value for this same data field. Part of my scoring model was based on values in this Number of Employees data field…so you can see how inaccurate and incomplete data affected the lead scores.

On a related note, it’s not uncommon these days for a marketing ops team to have access to multiple data vendors. This presents another challenge, which is which data provider would you consider to be the source of truth if the two (or more) vendors provide data for the field? Not only will different vendors potentially have different values, but also different field types. In my previous example, Number of Employees from one vendor treated that field as a number field-type and returned actual number values (e.g., “23,000”) while another vendor treated that field as a text-based drop-down/picklist field type (e.g., “10K to 50K”). If you don’t normalize these data sets then these fields are not only incompatible but also may break your lead scoring rule 

All of the above can potentially result in duplicate records and/or inaccurate data values to which to base the scoring model upon.

Traditional Scoring Models Are Often Based On Guesswork

Another reason for why an accurate rules-based lead scoring model is difficult to implement is that we don’t know if the data that we're using to score are actually the right ones to use. We follow general rules-of-thumb, such as scoring visits to our pricing page or looking at long on-page dwell times as a qualifier. Thus, we say to ourselves, “let’s assign 50 points to any lead that engaged in these activities”. The problem is that we don't know if these actions are truly applicable or how much importance they should carry.

Irrelevant data points used in scoring will give inaccurate results. Our scoring models may be based on false correlations, which is the fallacy of placing emphasis on specific data points because we see them as common data points across our customers but may not be great indicators. For example, let’s say we noticed in our data that four out of the last five customers who purchased our CRM software had red hair. So, we decide to score any new prospects with red hair really high. We may even decide to get this critical info by asking what hair color they have on our landing page forms. This is obviously a very silly example, but many lead scoring models are based on this type of shoot-from-the-hip analysis.

Predictive Lead Scoring – From Rules-Based to Machine Learning

While lead scoring as a concept is actually meant to be predictive in nature, the reality is that most traditional lead scoring models aren’t very accurate predictors of sales readiness. As mentioned before, the reality is that rules-based lead scoring models aren't very accurate predictors of sales readiness because of the shortcomings that lie in the surface level analysis of the small data set that we use to build our lead scoring upon.

What’s missing are all the data points describing your ideal customer profile that aren’t found in your CRM or MAP. And what’s needed is a systematic and scientific way to identify which of these data points truly matter in determining your ideal customer profile.

What is Predictive Lead Scoring?

A better approach for lead scoring is to utilize scientific and statistical methods to predict who is most likely to buy. The idea is to use data science (i.e., statistical analysis, look-alike modeling, machine learning, etc.) to identify the signals that have a high correlation to converting prospects into customers. This process is called predictive modeling, and the output of this process is a well-defined ideal prospect profile which provides a more comprehensive picture of leads that have the highest propensity to buy. This profile should be the target that you base your lead scoring against.

If I were to define “predictive lead scoring”, it would be this – a methodology for ranking leads to determine sales readiness by using predictive modeling and other data science techniques to discover the most accurate and relevant data points for which to score.  

How Does Predictive Lead Scoring Work?

In order for us to use lead scoring to target the right audience, we need to understand which data fields to base the scoring model on to truly make it effective. These should include data points that are captured in your MAP, CRM, or other customer database. You probably know some things about your leads – which campaigns they’ve seen, where they clicked and what they filled on your form. This valuable data is the starting point to building a predictive model.

But before you can build a predictive model (either by building it yourself if you have data science expertise in your organization, or by buying a predictive scoring vendor), you need to make sure your own internal data is ready by addressing the following:

  • Data cleansing: update or delete dated and incorrect CRM and MAP records. Identify and address bad data such as duplicates (including across object types and across your various platforms), bad emails, bounced emails, bad names, bad titles, bad company names, etc.
  • Data normalization/standardization: classify and consolidate prospect data into standard formats. Some examples of data fields that need to be standardized are Company (i.e., “IBM”, “IBM Corp.“, “International Business Machines” are all the same company), Industry, Annual Revenue, Number of Employees, etc. If a particular data field is a picklist-type but somehow records got uploaded into your systems with values that aren’t consistent with what’s in your picklist, then these need to be updated and mapped accordingly.

The result of this process is a highly accurate and standardized database of your records, which then becomes a solid foundation for predictive lead scoring modeling. Some of the predictive technology vendors include data cleansing, normalization and standardization as part of their offering.

Exogenous Data From The Web Provides Useful Information

Now while the data that resides in a CRM or marketing automation tool is important to use in predictive models, it only gives you a limited view of the profile of your true buyers as previously described. What’s implicit to the definition of predictive lead scoring above is the requirement of having access to data that isn’t normally captured within typical CRM or marketing automation systems. 

The amount of data that’s generated across the web continues to grow every second of every day. For B2B, there exist clues in this data that can reliably point to which leads are a better fit than others and are more likely to close. Most of this data comes in the form of unstructured data, which is scattered around in millions of websites, blogs, social networks, job boards, news sites and online databases. 

Predictive scoring vendors on the market today typically employ data crawling and data mining techniques to gather these data points from the web and look for patterns and clusters in these large data sets, thereby bringing structure to unstructured data and helping to turn them into meaningful insights. These exogenous potential buying signals that can be found about your customers may include:

  • Technology stack in use
  • Website source code
  • Job boards and press releases
  • Ad spend and ad channels used
  • Intent data
  • And many more data points from all over the web

So why is all of this exogenous data useful? Predictive lead scoring technologies that monitor the online behavior of millions of B2B companies and decision-makers can identify a digital footprint that can predict their fit and tendency to buy products and services. For example, companies’ websites and social networks have a large list of profiles, each containing a person’s job title. This information may include public information on financials, staff, hiring, technologies used, marketing and sales tactics, website source code as well as semantic analysis of on-page text, and even purchase intent signals.

AI & Machine Learning Uncovers Insights From Data

It would be hard to gain meaningful insights by looking at one profile after the other. Data mining helps extract this data, and machine learning algorithms structure it in a manner that standardizes and uses them to make meaningful prediction models about the future. Storing and processing this data on a regular basis requires a set of technologies and processes that are different from those used with traditional data processing applications. Hence, one of the benefits of buying a predictive scoring technology rather than simply employing your own internal data science analytics.

Once you have the data from your marketing automation platform or CRM ready to go, you combine it with all the potential buying signals from the exogenous data and use predictive analytic machines to do all the number crunching and statistical analyses for you.  This predictive modeling process is where the magic happens.

All of this data is inputted into predictive models utilizing machine learning to uncover the data points that actually matter for your business and have the highest correlation across your customers. An absolutely fantastic explanation on how machine learning works to identify trends is found here at the R2D3 website. These commonly-shared data points or traits will then give you a good, full picture of your ideal customer profile. In other words, the model will identify the set of data indicators that make your customers unique compared to all of the other leads or prospects in your database. 

By evaluating every prospect in your database as well as those coming inbound, any data indicators share by the prospect with the ideal customer is identified by the predictive model. This results in a predictive score to show how similar the prospect is to your customers - the more data indicators shared the higher the predictive score. You can therefore match your leads up against this profile – the closer the match the higher the predictive lead score, which means the lead looks very similar to customers who purchased from you before. 

Benefits Of Predictive Scoring For B2B Demand Generation

This is a somewhat simplified explanation of how predictive lead scoring works, but the end benefit should be clear – lead scoring that truly identifies best fit prospects most likely to purchase or convert. By evaluating every prospect in your database as well as those coming inbound, any data characteristics (both internal and exogenous) shared by the prospect with the ideal customer are identified by the predictive model.  This results in a predictive score to show how similar the prospect is to your customers - the more data indicators shared the higher the predictive score.

You can accomplish several goals with predictive scoring, including:

  • Uncovering Hidden Gems In Your House List: Once you identify your ideal customer profile, you can predictively score your entire prospect database to find any “low-hanging fruit” – these are the untouched or recycled leads that look similar to your customers but didn't make it all the way through the sales funnel.
  • Better Funnel Efficiency & Lead Velocity: By helping your sales reps more accurately identify the higher quality inbound leads, they’ll be able to prioritize lead follow up more efficiently and start with working on the leads most likely to respond. With predictive scoring, marketing will be able to provide sales with a better idea of who are the best bets. Where should they be spending their time? Where should they be going the extra mile, because they have a higher chance of converting the lead?
  • Data-Driven Persona Development & Fine-Tuned Segmentation: With knowledge of your ideal customer profile and the data elements that are part of its DNA, product and content marketers will be able to develop richer personas for better messaging, and marketing operations will be able to more effectively segment lists for targeted campaigns, relevant nurture tracks, and a more personalized customer experience.
  • Sales Enablement: By providing the context behind the predictive score and making the exogenous data for each lead transparent and available directly within the CRM, sales will be armed with greater intelligence for each lead and understand why that lead will most likely buy.
  • Find Cross-sell or Upsell Opportunities: If you’re able to create multiple predictive models (one for each of your product lines, for example), then you’re able to analyze a lead against each of these models to figure out which products the lead will most likely buy.

Predictive lead scoring resolves the challenges of traditional lead scoring by both expanding the data and improving results. It uses the power of machine learning in order to prioritize leads and focus the time, money and effort of marketing and sales on driving more revenue. 

So rather than building your scoring model on potentially arbitrary data points, predictive lead scoring will tell you what your ideal prospects should look like according to characteristics of your customers. It will show you how to build your lead scoring on the data points that actually matter, so you no longer have to guess. After all, the goal of marketing isn’t simply to throw leads over the wall to sales – it’s to partner with sales to generate and identify sales opportunities and ultimately grow revenue.

Did you like this post and found it informative? If so, I would really appreciate it if you would share this to your network (i.e., LinkedIn, Twitter, Facebook, telling your mother…) using the share links on this page. Feel free to mention me @tones810 to share your thoughts on this topic with me – I’m always open to hearing other points of view and new ways of doing things!

Also, consider subscribing below if you’d like to receive updates on new posts and articles as they get published. Thanks for taking this journey with me!

Tony Yang

Tony is a long-time marketer with over 17 years experience in B2B SaaS companies. While he started his marketing career at IBM, for the past 14 years he's been leading marketing and revenue operations at various startups, including Mintigo (acquired by Anaplan), Qordoba (rebranded to, and Conversion Logic (acquired by VideoAmp). He's been recognized as a thought leader and speaker on various topics such as ABM, PLG, marketing ops and revops, growth, and B2B marketing at past events including GTM Summit, FlipMyFunnel and SiriusDecisions. In addition, he is the Head of Growth at Mucker Capital and also serves as a coach and mentor at several startup accelerator programs.


Visualize Your B2B Buyers Journey


Related Posts

Subscribe to receive updates!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form