Scoring Analysis
Last updated
Last updated
Scoring Analysis allows you to automatically derive the optimal set of scoring rules to prioritise your dataset so that companies similar to a certain "Target" set have the highest scores and those that are not like them have the lowest.
To do this, we perform a 'correlation analysis' where we compare every GoodFit datapoint of all the target companies to other companies in your GoodFit account. Based on the frequency that certain field values occur in each set of companies, we can work out which factors correlate more with the 'target' set than the other companies and use that to derive scoring rules automatically.
Video run-through
To get started, navigate to scoring -> scoring analysis.
We allow you to create multiple scoring analysis reports, for example if you want to try different 'target sets' or have different scoring sets per team or region, or simply want to repeat the analysis over time and track the evolution over time.
You can create a new report by clicking "New analysis report". This presents you with a screen where you can name this report and paste a set of domains corresponding to the "Target" set of accounts. These target accounts would be the ones you are trying to make your scoring rank the highest. E.g. they may be won accounts or accounts that progressed far in the sales pipeline. The exact definition is flexible and may need to vary from business to business, however we recommend starting with your won accounts over a recent timeframe. Ideally we would have in the order of 50 account domains, and we don't recommend using less than 20.
Note that we normalise the domains, so its ok to paste web addresses directly. Once you click "Generate report" the system starts calculating the report. This can take a few minutes.
Once the report is generated, it will display as a list of the mostly highly impactful data fields (e.g. the data points that most influence the differentiation between the target set and others) and within each field, a set of values for that field and how that inidividual value correlates with the target set and a suggested score for that value.
You can click into each field and look at how the values correlate with the target set vs the other companies, and therefore how impactful that value is to scoring. Hovering over the suggest score gives a breakdown of the frequency analyis of that value between the target set and the other companies.
For example, in this case we can see this value ("factor") was present in 71% of the target "good" companies, but its 8.6% across all others. Therefore we can reason as this factor is much more common in the 'target' companies its an important scoring rule, and we have therefore suggested a score of 10 for this value. We also provide a confidence score, which is an estimation of how confidence we that this is an impactful rule, and is decided based on how different the frequencies are and the number of times it occurs. See below for a more detailed explanation.
We automatically select the most impactful rules, however you can deselect some if you disagree, and select others if you think they should be included
You can preview the scoring by clicking "Preview scoring" which shows the best and worst companies in your dataset based on those scoring rules. You should expect to see your target companies as the best fit along with similar companies.
Once you are happy with the selected scoring rules, you can create a new scoring field from it, which will add a new column to your GoodFit dataset with scores calculated using the suggested rules. Note that you can modify the rules and values as you see fit.
We offer two modes of Scoring Analysis:
"vs All". In this case we perform the frequency analysis comparing the target "good" accounts to all other companies in your GoodFit dataset.
"vs Other". In this case we perform the frequency analysis comparing the target "good" accounts to a supplied set of other "bad" accounts.
In most cases, we recommend "vs All" as the purpose of scoring is to create a score that works to score any company in your dataset. However, sometimes you may wish to perform a more detailed analysis for a specific part of the pipeline. However care must be taken to interpret the findings in the context of those settings.
For example, it may be useful to analysis a the set of "won" accounts versus the "worked but lost" accounts for a specific period. However this is only really useful to tell whats different about these two sets of companies, and this is unlikely to make a great scoring field for your dataset as a whole.
We begin by dividing your GoodFit dataset into two 'sets': the 'good' set and the 'other' set. We then iterate over each data field that is of "picklist", "multi picklist", "boolean", "number", "percentage" or "currency" types. For each value within each field (or range of values for numeric, see below) we perform a frequency analysis of that field and value in the good and other sets.
For example, country=UK may occur in 40 out of 50 "good" companies, giving a frequency of 80%, whereas in the "other" set it may only occur in 300 out of 1000, giving 30%.
We use a "Two proportion Z-test" to work out the relative differences of these two frequencies based on the number of examples in each set and derive the confidence that the two frequencies are actually different given the number of occurrences. This gives us the confidence score and we use the confidence and difference between frequencies to derive the suggested score value.
Note that we treat data fields differently depending on type:
Picklist and Boolean. We consider each company as one example of this value
Multi-picklist. We iterate the list of values per company and consider each occurrence as a count of this value.
Numeric fields. We apply a statistical bucketing algorithm to evenly spread the values out over a set of ranges, so we can work out the range that each companies value falls into and count these as discrete values.
Note that we don't consider free text values as they tend to have unique values and have little impact on correlation.