Discover how to automate vulnerability detection of your assets, so you can shift left security and protect your infrastructure before a vulnerability becomes a security incident.
In this article, we’ll cover how to query the NIST vulnerability database, create a CPE Name for a resource, and automate the fetching of vulnerabilities with a script.
Vulnerabilities as an Attack Vector
Software developers aren’t perfect; sometimes their code misses an edge case that makes your infrastructure vulnerable.
Let’s take the Apple goto fail vulnerability in 2016, a single duplicated line enabled man-in-the-middle attacks on SSL communications involving Apple devices:
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail; // This executes outside the if
These errors are common and silly, but still dangerous; moreover, as code gets more complex, and as more resources you have, the bigger your surface attack. That’s why it’s so important to install security updates.
Vulnerabilities are your backdoor and can provide access in many ways. Allowing attackers to disrupt your service, obtain information about your infrastructure (so they can exploit other vulnerabilities), and in the worst cases, execute code that ends on a remote control. Once they are in, attackers can move laterally to other resources, accessing or encrypting your business data, using your resources for crypto mining, or altering your production line ever so slightly so you don’t notice but enough so production fails.
Nowadays attacks tend to start as automated processes executed by botnets that look for vulnerable services and automatically hack them.
We often aren’t aware of how much information our services share, like a web server that shares its version number. It didn’t take long for me to find a public web server like that:
$ curl -I https://▋▋▋.▋▋▋▋▋▋▋▋▋.net
HTTP/2 301
server: nginx/1.18.0 (Ubuntu) ← ⚠️
date: Fri, 04 Oct 2024 08:57:20 GMT
content-type: text/html
content-length: 178
location: https://▋▋▋.▋▋▋▋▋▋▋▋▋.net/home/
We know this server is using server: nginx/1.18.0 (Ubuntu)
, we can look for vulnerabilities affecting this software and launch an attack.
Querying for Vulnerabilities in Your Device
Where can you search for the vulnerabilities in a given software? In a vulnerability database (ba dum tss).
When a vulnerability is found, it is classified under the Common Vulnerabilities and Exposures (CVE) system. This system standardizes naming vulnerabilities (CVE-YEAR-ID
) and presenting relative information (description, metrics, etc.).
Some organizations such as NIST NVD maintain CVE databases, where people can learn how to mitigate them.
This database has a search tool that allows you to search by software and version. You only need to format it in a CPE (Common Platform Enumeration), a structured naming scheme for information technology systems, software, and packages.
When in doubt about what is the correct CPE for your resource, you can always ask Google:
For the nginx
service from the example earlier, the CPE would be cpe:2.3:a:f5:nginx:1.18.0:*:*:*:*:*:*:*
.
Entering this string on the search tool:
Throws three high-severity vulnerabilities that could be used to impact that web server:
Getting closer to an industrial setup, we’ll take a Siemens ‘SIMATIC S7-1500’ PLC as an example:
This concrete PLC has a 0.5.1
firmware version, so its CPE would be cpe:2.3:o:siemens:simatic_s7-1500_cpu_firmware:0.5.1:*:*:*:*:*:*:*
.
Querying the NIST database, we see 13 vulnerabilities:
Nice! Now we can go through this list and apply mitigation actions where needed.
Querying the NIST NVD API
However, repeating this process for each resource at a time is time-wasting, and directly not feasible when we have hundreds of devices.
Luckily, NIST offers an API to query this database and automate this vulnerability discovery process.
Using the API is incredibly simple. We could perform the same queries we did before in the command line with a program like curl
:
$ curl "https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=cpe:2.3:a:f5:nginx:1.18.0" > nginx-1-18-0.json
$ curl "https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=cpe:2.3:o:siemens:simatic_s7-1500_cpu_firmware:0.5.1" > simatic_s7-1500.json
The response is a rather intuitive and comprehensive JSON payload:
$ cat "nginx-1-18-0.json" | jq .
{
"resultsPerPage": 3,
"startIndex": 0,
"totalResults": 3,
"format": "NVD_CVE",
"version": "2.0",
"timestamp": "2024-10-07T10:47:05.540",
"vulnerabilities": [
{
"cve": {
"id": "CVE-2021-23017",
"sourceIdentifier": "f5sirt@f5.com",
"published": "2021-06-01T13:15:07.853",
"lastModified": "2023-11-07T03:30:29.880",
"vulnStatus": "Modified",
"cveTags": [],
"descriptions": [
{
"lang": "en",
"value": "A security issue in nginx resolver was identified, which might allow an attacker who is able to forge UDP packets from the DNS server to cause 1-byte memory overwrite, resulting in worker process crash or potential other impact."
},
…
You can code these HTTP calls and response processing with the language of your choice, and integrate them with whatever tools integrates best with your workflow. We’ll see a couple of examples in a bit.
Only a final piece of the puzzle is missing, obtaining an API Key.
NIST limits the number of requests that unregistered users can perform. If you are going to use this API in any serious way, you should request an API Key to lift these restrictions. The process is straightforward, you only need to fill out and submit a simple form, and click a link from an email you’ll receive.
Once you obtain your API Key, you can provide it via an apiKey
HTTP header. In curl, this is done with the -H
param:
$ curl -H "apiKey: YOUR-API-KEY-HERE" "https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=cpe:2.3:a:f5:nginx:1.18.0"
Automating Vulnerability Detection on a Spreadsheet
Many people keep their asset inventory on a spreadsheet. So let’s see how to integrate the NIST NVD API inside Google Sheets.
[!NOTE]
This approach is not recommended, we are displaying it only for educational purposes and because it is fun. Main disadvantages are: You cannot include an apiKey, it is taxing on the API backend, and it doesn’t provide a good user experience. We provide a proper option on the next section.
We start with a simple spreadsheet that contains all the basic information from our assets:
The plan will be to:
- Generate the CPE for each device.
- Call the API for each resource.
- Extract the CVE numbers from the payload.
Let’s get started!
1. Generating CPEs
The first step is to generate the CPE for each device, which can be easily done by concatenating the already available information:
="cpe:2.3:" & F2 & ":" & B2 & ":" & C2 & ":" & D2
We added the text cpe:2.3
and the CPE part at the beginning. The CEP part describes the type of resource: a
for applications, h
for Hardware, and o
for Operating Systems.
2. Calling the API
Google Sheets has functions to import and process several types of data such as XML or CSV. However, it lacks support for the JSON that the NIST API returns. There are third-party solutions for that, but keeping things simple, we’ll get creative and use only the default functions.
The IMPORTDATA(url, delimiter, locale)
function can load data from a url
, separating data rows by new lines, and columns by the delimiter
character. Would this work for us?
We could use this function to load the whole payload in a single cell, and then extract the data. Keep in mind that the API returns a minified JSON, that is without extra spaces or line breaks, we used jq
earlier to prettify the output.
=IMPORTDATA("[https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=](https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=)" & Inventory!E2,"Ç")
In this example, Inventory!E2
is the CPE we calculated earlier, and we are using a strange character as a delimiter (“Ç”
) so the data is not split in columns and stays in the same cell.
The output? Error: Result too large.
It looks like the payload is too big for a single cell, we’ll have to look for a way to split it into smaller chunks.
Looking closer at the JSON payload from the API we can identify some useful patterns:
…
"vulnerabilities": [
{
"cve": {
"id": "CVE-2021-23017",
"sourceIdentifier": "f5sirt@f5.com",
…
Every CVE number is preceded by the string "cve": { "id": "CVE
. We could:
- Split the payload in columns by the
{
character. - Keep only those that start with
"id": "CVE-
.
This would be the formula:
=TRANSPOSE(
QUERY(
TRANSPOSE(
IMPORTDATA("https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=" & Inventory!E2,"{")
), 'HELPER - Query'!$A$1))
Where:
-
QUERY
Allow us to run SQL queries on a dataset. -
TRANSPOSE
Is used to turn rows into columns so we can evaluate the query for each value. After the query, it is used again to turn the result into a single row afterward. -
'HELPER - Query'!$A$1
is a reference to the cell containing the query:SELECT * WHERE Col1 contains ':"CVE-'
. Google Sheets doesn’t like the"
inside the query, but it seems fine if we move the query to a different cell ¯\(ツ)/¯.
Will this work?
Let’s drop this formula on a separate sheet of our inventory spreadsheet and see:
Indeed! It works!
3. Extracting the CVE number
We are close, we have each CVE number on a separate cell and we only need to extract it.
My first thought was to tweak the SELECT *
on the SQL query, trying to return only a substring instead of the whole field. That way all the processing would be contained in one place. However, the SQL language in Google Sheets has a very limited function set and this approach is a no-go.
We’ll keep the current results as they are on a separate sheet HELPER - RAW_CVEs
, acting as a helper to display the clean CVE numbers in another place.
We can leverage the function REGEXEXTRACT
to extract only the CVE number using regular expressions:
=REGEXEXTRACT('HELPER - RAW_CVEs'!A1,"CVE-\d*-\d*")
We could clean up things a bit, using HYPERLINK
to drive the user to the NIST website for that CVE, and other functions to filter out errors.>
=IF(
NOT(ISBLANK('HELPER - RAW_CVEs'!A1)),
HYPERLINK(
"https://nvd.nist.gov/vuln/detail/" & REGEXEXTRACT('HELPER - RAW_CVEs'!A1,"CVE-\d*-\d*"),
REGEXEXTRACT('HELPER - RAW_CVEs'!A1,"CVE-\d*-\d*")
),
""
)
The result is gorgeous!
Our example spreadsheet is available if you want to see it in action.
[!NOTE]
This approach is not recommended, we are displaying it only for educational purposes and because it is fun. Main disadvantages are: You cannot include an apiKey, it is taxing on the API backend, and it doesn’t provide a good user experience. Check out a proper option on the next section.
Automating Vulnerability Detection With Python {#python}
Let’s see how to do something similar with a proper programming language like Python.
Although our example is gonna be radically simple, this solution has huge potential for improvement. For example, you could expand it to integrate with your tools, program it to run periodically or send notifications via email.
Our example will cover:
- Receive a
.cvs
list of resources. (i.e. Exported from a spreadsheet). - Call the API to fetch vulnerabilities.
- Output a list of vulnerabilities in JSON format.
Let’s take a look at the code in our vuln_finder.py
script. You can also find the whole code and example data files in our GitHub repository.
1. Reading the resource list
We’ll use a few modules on our script:
-
argparse
to read command line options. -
requests
to perform the API calls. -
csv
to read the resource list. -
json
to write our output.
The first step will be to read the user options:
import argparse
import requests
import csv
import json
#
# Parsing arguments
parser = argparse.ArgumentParser()
parser.add_argument("-d", "--devices", default = "inventory.csv", help = "Devices file in csv format. Default: inventory.csv.")
parser.add_argument("-o", "--output", default = "output.json", help = "Output file where to store the json response. Default: output.json.")
parser.add_argument("-k", "--apikey", required=True, help = "NIST NVD API Key.")
args = parser.parse_args()
#
devices_csv = args.devices
output_file = args.output
apikey = args.apikey
And then, we’ll read the device list csv, and generate the CPE name from there:
#
# Reading devices file
inventory = []
with open(devices_csv, newline='') as csvfile:
inventoryreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in inventoryreader:
device = {}
device['name'] = row[0]
device['vendor'] = row[1]
device['software'] = row[2]
device['version'] = row[3]
device['part'] = row[4]
device['cpe'] = "cpe:2.3:" + device['part'] + ":" + device['vendor'] + ":" + device['software'] + ":" + device['version']
device["cves"] = []
inventory.append(device)
2. Calling the API
Now, for every device in our inventory, we call the API. The response
module makes it really simple to send this request with the apiKey
header.
- We use
request.get(url, headers)
to make the API call. - We use
response.json()
to get a dictionary with the JSON Payload. - We fetch the data we want from that dictionary.
#
# Calling JSON API for each device
data = []
for device in inventory:
# The API endpoint
url = "https://services.nvd.nist.gov/rest/json/cves/2.0?cpeName=" + device['cpe']
# Send a GET request to the API
response = requests.get(url, headers={"apiKey": apikey})
# Process the response
vulnerabilities = response.json()["vulnerabilities"]
for vulnerability in vulnerabilities:
cve = {}
cve["id"] = vulnerability["cve"]["id"]
cve["url"] = "url: https://nvd.nist.gov/vuln/detail/" + cve["id"]
# Try to grab CVSS Score v3.1
if "metrics" in vulnerability["cve"] and "cvssMetricV31" in vulnerability["cve"]["metrics"]:
cve["baseScore"] = vulnerability["cve"]["metrics"]["cvssMetricV31"][0]["cvssData"]["baseScore"]
cve["baseSeverity"] = vulnerability["cve"]["metrics"]["cvssMetricV31"][0]["cvssData"]["baseSeverity"]
device["cves"].append(cve)
#
data.append(device)
We have to dig into several dictionaries to reach the data we want:
vulnerability["cve"]["id"]
vulnerability["cve"]["metrics"]["cvssMetricV31"][0]["cvssData"]["baseScore"]
vulnerability["cve"]["metrics"]["cvssMetricV31"][0]["cvssData"]["baseSeverity"]
We kept things simple for this example, but keep in mind that the NIST NVD payload is quite comprehensive, so there’s lots of insightful information you could fetch here.
3. Outputting the data
Finally, we write the data into a JSON file:
#
# Output the result
with open(output_file, 'w') as f:
json.dump(data, f)
Here is where we could cache the data, send notifications, and in general, do some kind of intelligence with the data we fetched.
Testing the script
Let’s take our script for a ride.
Here is the inventory.cvs
we’ll provide:
Nginx Web Server,f5,nginx,1.18.0,a
SIMATIC S7-1500,siemens,simatic_s7-1500_cpu_firmware,0.5.1,o
We’ll run the script like:
$ python vuln_finder.py -d inventory.csv -o output.json -k THE-API-KEY
And we obtain an output file like:
$ cat output.json | jq .
[
{
"name": "Nginx Web Server",
"vendor": "f5",
"software": "nginx",
"version": "1.18.0",
"part": "a",
"cpe": "cpe:2.3:a:f5:nginx:1.18.0",
"cves": [
{
"id": "CVE-2021-23017",
"url": "url: https://nvd.nist.gov/vuln/detail/CVE-2021-23017",
"baseScore": 7.7,
"baseSeverity": "HIGH"
},
{
"id": "CVE-2021-3618",
"url": "url: https://nvd.nist.gov/vuln/detail/CVE-2021-3618",
"baseScore": 7.4,
"baseSeverity": "HIGH"
},
{
"id": "CVE-2023-44487",
"url": "url: https://nvd.nist.gov/vuln/detail/CVE-2023-44487",
"baseScore": 7.5,
"baseSeverity": "HIGH"
}
]
},
{
"name": "SIMATIC S7-1500",
"vendor": "siemens",
"software": "simatic_s7-1500_cpu_firmware",
"version": "0.5.1",
"part": "o",
"cpe": "cpe:2.3:o:siemens:simatic_s7-1500_cpu_firmware:0.5.1",
"cves": [
{
…
Great! It Works!
[!NOTE]
If you want to learn more, or try this yourself, check the whole code in our GitHub repository.
Next Steps: Automation and Correlation, the base for OTSPM
We’ve seen how easy it is to automate vulnerability discovery by querying vulnerability databases’ APIs. Where to go next from here?
The trends in cybersecurity are:
- Automation of all manual tasks to avoid human errors, and provide always up-to-date information.
- Correlation of data gathered from several sources to detect more threats.
Let’s take automating inventory as an example.
Keeping an inventory in a spreadsheet seems easy. However, what happens if a malicious device or service is added? How long it will take you to detect it?
If you continuously scan your infrastructure to keep an updated picture of your infrastructure, you can perform some nice correlations:
- New resources over real-time security alerts: “Someone accessed a suspicious URL at the same time this new device was added.”
- State of the resources over vulnerability discovery: “This device has a critical vulnerability, but it’s a test machine that is off. We can prioritize other alerts over this one.”
- State of the resources over compliance: “A recent configuration change in this server has made it fall out of compliance.”
Keep this in mind whether you build your own security solutions or use a vendor product, security has proved to be more effective when treated as a single solution, where any information may be useful at any given stage.
Conclusion
Vulnerability databases help you discover where are you vulnerable and take mitigation actions.
You can automate this discovery process by integrating your inventory with their APIs, which is easily done in any popular programming language. You can even do this querying from a spreadsheet (although it is not recommended).
Finally, advancing on this automation journey will open you up to the possibility of correlating data from multiple sources, making your security solution more effective.