How useful is CVSS Score in CVE triage - The CVSS who cried wolf

TLDR: Based on CVSS Score alone you cannot effectively prioritize issues without taking considerable risk. Other than the practically non-existent Low CVSS severity category all have numerous exploited vulnerabilities

Vulnerability management has been around long enough to have it’s own mythology. There are a number of “truth” that are accepted without much questioning. These include such things as: Patching issues 5-8 days is an effective timeline. Security researchers publishing exploits is good for Internet security. Triaging based on CVSS Scores is an effective strategy. So, let’s start with the easiest one to measure: the CVSS Scores.

What is CVSS?

Common Vulnerability Scoring System, is a way to categorize security issues and score them based on severity. Created and maintained by FIRST used and made popular by NVD which is the de facto CVE information source for most of us (including inthewild).

Estimating (aka guessing) the severity of a vulnerability is hard. CVSS does a good job in breaking it down to the most important drivers (accessibility of attack vector, required privileges, necessity of user interaction…) and it is already at version 3 of the formula to provide a metric based on them. However, it is probably clear to everybody who had to calculate it at least once that it is often inaccurate. That said, it is widely accepted as the least bad option for rating vulnerabilities.

You should expect to see that the earlier CVSS v2 is still often used and CVSS v3 is understood to be more precise, so we used those scores in our research.

It is also important to note that it is the provider who does the categorization that is needed for the score calculation and often they will have different interpretations or even level of experience in doing this.

In practice, if your vulnerability management tool provides a Risk Rating (Critical/High/Medium/Low) it is either some version of CVSS or it takes CVSS into account with a very high weight.

What would a good measure look like?

To simplify things let’s narrow the goal of vulnerability management to “avoid being hacked by exploitation of known vulnerabilities” and the goal of a measure in triage to prioritize issues that are more urgent to resolve based on their likelihood of being exploited.

If CVSS is such a measure, we can backtest on past data how well it performs:

Prediction: Issues that have higher CVSS scores should be more likely to be exploited.
Separation: The most important thing we are trying to find out during prioritization are the issues that are less important. It is easy to have 100% accuracy if somebody says everything is critical. In fact one would intuitively expect that higher severity issues should be less frequent.

Setting expectations: call me an optimist, but in a utopia most of exploited vulnerabilities would get Critical ratings (and high percentage of them would actually get exploited) and there would be maybe a few with High rating that would still get exploited. One could safely deprioritize Medium and Low severity vulnerabilities that would in fact be the great majority of vulnerabilities.

We are using the large amount of CVE exploitation data in our database to verify assumptions and methods used in vulnerability management. As always we are limiting the research to the last 5 years of data to make sure that we get recent but not overly noisy trends.

Disclaimers and Caveats

For context;however, it must be said that the scope of CVSS goes beyond CVEs. It is used generally as a risk rating system for vulnerabilities that are found, for example, in custom built software. Also it is the industry who decided to use CVSS as a way to prioritize and predict, CVSS by itself is a categorization system and it is not necessarily an issue with the system if it does not work for the purpose we decided to use it for.

Some observations about the data

Large number of the non-critical issues exploited in the wild are used for pivoting or privilege escalation. It can be argued that if one avoids critical issues -most of the initial access-, pivoting/escalation will not be possible
Similarly, most of the lower scored issues are part of exploit chains, e.g. a kernel memory leak that is used during memory corruption.

Some marginal observations on CVSS

NVD record (and the CVSS score) of vulnerabilities often come days after they have been published. If your security model relies on patching within single digit days it is questionable to what extent can you rely on NVD or CVSS data.
Sometimes even CVSS Base Scores are changed over time and we observed that in some cases they are seemingly changed to reflect exploitation in the wild. Based on the review of a sample, we expect this not to have a significant effect on our research so we used current CVSS scores.

How Good is CVSS?

Over 50% of vulnerabilities, that have a CVSS score are actually either High or Critical severity so deprioritizing Mediums does not help all that much and the category Low in reality does not exist
Even though Critical vulnerabilities are exploited disproportionately more often than others, most of the exploited vulnerabilities are in fact Highs. Assigning less priority to everything below Critical (and less stringed SLA), must be understood as a ROI based decision and one is taking a considerable amount of risk potentially not patching (on time) 63.5% of exploited vulnerabilities.
Even disregarding/deprioritizing Mediums one should be aware that this way over 10% of actually exploited vulnerabilities should be expected to be unpatched.

We are not suggesting these are unreasonable tradeoffs but the reality is clearly very different than our optimistic expectation. Based on CVSS rating it is not possible to disregard any severity category without a very real risk of diregarding actually exploited vulnerabilities
We have also checked and the percentages are relatively stable over time, we are neither “getting better” at picking out important vulnerabilities nor is there a tendency to inflate the labels.

Is there a better option?

No, not really. However if the industry just assumes that CVSS Scores are a useful in CVE triage and calls methods based on that “best practice” it will never improve and even make people falsely think this problem of triage is “solved”.

In fact, if anything, we are proponents of automated patching instead of triage for most use-cases. One can compare the last decade of effort in vulnerability scanning and CVE triage in contrast with success of automated patching in Wordpress, Windows or PaaS solutions. Our research is aimed to show that vulnerability triage is not at all clear cut as one might think. In fact based on the work already got into it maybe it is not even easier to solve than the issues of automated patching. There is an argument that maybe the industry could focus more of its effort on getting rid of the work instead of finding better ways to prioritize vulnerabilities.

That said for the sake of completeness there are interesting new options in conducting vulnerability triage in both academia and in the industry. These include looking at information like Twitter trending and sentiment analysis or unifying and crowdsourcing triage.

Also we are planning to publish our own ultimate guide on patching, because the reality is that you still need to do it yourself.