Not many folks are aware that Trusty serves all of its OSS security signal intelligence via a publicly accessible API.
For this blog we will look at the provenance payload of the API, but first let's round up why provenance is important and why you should care.
What is Source of Origin Provenance and why is this Mapping important?
Mapping open source packages to their source code involves establishing a clear and verifiable link between the packaged binaries or libraries and the original source code from which they were built. This process ensures that the package used in a project corresponds exactly to a known state of the source code in a repository, providing a transparent and traceable path from source to distribution.
By mapping packages to their source code, developers can establish the grounds to verify that packages have not been tampered with or altered from their original source. This process helps in identifying and mitigating potential security threats such as malware or backdoors that could have been introduced during the packaging process.
Quite often, a false source of origin claim is leveraged by malicious packages when using typosquatting attacks. They supply the same repository URL to the package manager as the targeted package.
A source of origin guarantee can also help prevent packages from artificially inflating their popularity, via claiming to have a more popular origin (aka StarJacking)
Many OSS security scoring products (who I won't name) attempt to compute their quality scores without this mapping and are therefore prone to being spoofed / gamed.
Trusty's Provenance Payload
Within Trusty we leverage two systems to establish provenance.
The first being Sigstore
Sigstore is an open-source project that enhances software supply chain security by providing tools for cryptographic signing, verification, and provenance tracking of software artifacts. It integrates with the Supply Chain Levels for Software Artifacts (SLSA) framework, which is a set of security guidelines and standards designed to protect the integrity of software supply chains.
By offering a secure, automated, and easy-to-use method for developers to sign their code, Sigstore ensures the authenticity and integrity of software packages. It maintains a transparent log of signatures and verifiable records, enabling users to trace the source and provenance of software components. This traceability, aligned with SLSA standards, gives a good starting foundation for establishing a reproducible build.
When a sigstore / SLSA provenance statement is available, we provide it within the sigstore section of Trusty's JSON payload
For example, the popular NPM Next.js project uses Sigstore to sign its releases:
curl -sS https://api.trustypkg.dev/v2/provenance\?package_name\=next\&package_type\=npm |jq '.sigstore'
{
"source_repo": "https://github.com/vercel/next.js",
"workflow": ".github/workflows/build_and_deploy.yml",
"issuer": "CN=sigstore-intermediate,O=sigstore.dev",
"token_issuer": "https://token.actions.githubusercontent.com",
"transparency": "https://search.sigstore.dev/?logIndex=88381843"
}
Looking at this payload, we can see the source_repo
where the release was built. We can see the actual workflow that built the release .github/workflows/build_and_deploy.yml
and that the token issuer (OIDC) was GitHub. Lastly we get the link to the transparency log record https://search.sigstore.dev/?logIndex=88381843
Sigstore uses a project call rekor, which is an immutable tamper resistant store (aka transparency log). All signing events within sigstore have an X509 certificate that is place into the log, with the log being a read-only / immutable merkle tree.
Trusty actually verifies the cryptographic root of trust for you, but if were particularly paranoid and inclined, you could dig even deeper.
Let's grab the certificate directly from the rekor API:
curl -s -X GET "https://rekor.sigstore.dev/api/v1/log/entries?logIndex=88381843" | \
jq -r '.[].body' | \
base64 -d | \
jq -r '.spec.content.envelope | .signatures[0].publicKey' | \
base64 -d | \
openssl x509 -inform PEM -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
77:ac:ac:23:54:40:e2:ac:f5:e0:c6:fc:f6:01:ce:fc:93:66:bf:1b
Signature Algorithm: ecdsa-with-SHA384
Issuer: O = sigstore.dev, CN = sigstore-intermediate
Validity
Not Before: Apr 24 17:11:58 2024 GMT
Not After : Apr 24 17:21:58 2024 GMT
Subject:
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:41:9d:5f:eb:89:3c:6b:72:a3:56:7d:f9:de:a6:
4c:f3:c8:64:58:3a:f5:02:10:0d:77:3e:31:44:63:
49:75:71:49:f7:23:9b:bf:16:4a:70:d0:09:0a:84:
9d:d4:64:1b:fb:ce:5e:d3:9b:6c:cb:46:1f:09:d3:
a9:fd:72:c5:4b
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature
X509v3 Extended Key Usage:
Code Signing
X509v3 Subject Key Identifier:
DF:31:88:1D:36:FF:D1:30:27:F9:24:FC:65:44:11:04:D3:52:EC:A9
X509v3 Authority Key Identifier:
keyid:DF:D3:E9:CF:56:24:11:96:F9:A8:D8:E9:28:55:A2:C6:2E:18:64:3F
X509v3 Subject Alternative Name: critical
URI:https://github.com/vercel/next.js/.github/workflows/build_and_deploy.yml@refs/heads/14-2-1
1.3.6.1.4.1.57264.1.1:
https://token.actions.githubusercontent.com
1.3.6.1.4.1.57264.1.2:
push
1.3.6.1.4.1.57264.1.3:
2e7a96a1e8821d88f210a90b4284fd24b71c1821
1.3.6.1.4.1.57264.1.4:
build-and-deploy
1.3.6.1.4.1.57264.1.5:
vercel/next.js
1.3.6.1.4.1.57264.1.6:
refs/heads/14-2-1
1.3.6.1.4.1.57264.1.8:
.+https://token.actions.githubusercontent.com
1.3.6.1.4.1.57264.1.9:
.Zhttps://github.com/vercel/next.js/.github/workflows/build_and_deploy.yml@refs/heads/14-2-1
1.3.6.1.4.1.57264.1.10:
.(2e7a96a1e8821d88f210a90b4284fd24b71c1821
1.3.6.1.4.1.57264.1.11:
github-hosted .
1.3.6.1.4.1.57264.1.12:
.!https://github.com/vercel/next.js
1.3.6.1.4.1.57264.1.13:
.(2e7a96a1e8821d88f210a90b4284fd24b71c1821
1.3.6.1.4.1.57264.1.14:
..refs/heads/14-2-1
1.3.6.1.4.1.57264.1.15:
..70107786
1.3.6.1.4.1.57264.1.16:
..https://github.com/vercel
1.3.6.1.4.1.57264.1.17:
..14985020
1.3.6.1.4.1.57264.1.18:
.Zhttps://github.com/vercel/next.js/.github/workflows/build_and_deploy.yml@refs/heads/14-2-1
1.3.6.1.4.1.57264.1.19:
.(2e7a96a1e8821d88f210a90b4284fd24b71c1821
1.3.6.1.4.1.57264.1.20:
..push
1.3.6.1.4.1.57264.1.21:
.Dhttps://github.com/vercel/next.js/actions/runs/8819983308/attempts/1
1.3.6.1.4.1.57264.1.22:
..public
CT Precertificate SCTs:
Signed Certificate Timestamp:
Version : v1 (0x0)
Log ID : DD:3D:30:6A:C6:C7:11:32:63:19:1E:1C:99:67:37:02:
A2:4A:5E:B8:DE:3C:AD:FF:87:8A:72:80:2F:29:EE:8E
Timestamp : Apr 24 17:11:58.098 2024 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:45:02:20:19:09:9E:B3:CC:FF:AC:47:5F:81:8B:85:
16:FB:DF:AD:84:99:97:38:8D:AC:D1:14:DB:D2:B7:FF:
26:29:C8:65:02:21:00:9F:1D:01:89:F4:FC:43:EA:AD:
1A:86:8C:F7:9A:F5:D1:3E:56:1C:C7:E0:78:AB:87:28:
3D:59:41:B3:17:96:EF
Signature Algorithm: ecdsa-with-SHA384
30:66:02:31:00:a5:e9:cb:4b:eb:a6:1c:d0:8c:d3:15:8c:f1:
8b:aa:98:9d:79:06:97:46:e1:72:2c:e3:dd:3c:00:99:a7:23:
9b:fc:cd:07:83:40:b3:ee:50:39:be:54:3b:81:0e:a1:fc:02:
31:00:96:4d:98:59:7d:1d:e6:a5:3f:f7:8f:42:49:b1:75:7a:
f7:42:50:dd:89:e0:96:f8:a0:e0:02:dc:85:01:df:09:29:e4:
e9:f5:74:54:2e:5f:8a:fc:c3:62:1e:c9:74:81
Obviously a lot going on here, and I don't want to turn this into an X509 piece, but the key parts of interest are in the X509 SAN.
What this is telling us, is that GitHub signed an id_token that was used to guarantee that the claims within the X509 certificate were true at the time that the GitHub Action workflow ran.
What if sigstore is not available?
Not all projects or even package managers have a system capable of providing decent guarantees around provenance, as does sigstore.
With this gap in mind, Stacklok (my company) developed a secondary system called Historical Provenance (HP). HP is an intelligent method of matching a package to its source repositories using Git tags. We discovered a strong correlation between Git tags (created during a project's release) and the matched timestamps in a package manager's published metadata. This is particularly useful because it is difficult to spoof. Git's immutable tree structure means that the longer a project has existed, the more snapshots there are, each corresponding to the time of releases being published. To spoof this history you would you need either a time machine or the opportunity to compromise a package managers servers (and if that happens, good lord, we are in for a run ride).
For more details on Historical Provenance, its best to read the following post which deep dives the algorithm and science at play.
Let's take a look at the Trusty API again for Next.js, but this time using HP:
curl -sS https://api.trustypkg.dev/v2/provenance\?package_name\=next\&package_type\=npm |jq '.hp'
{
"overlap": 49.23954372623574,
"common": 259,
"tags": 526,
"versions": 2626,
"over_time": {
"period_type": "month",
"period_count": 12,
"hp_over_time": {
"2023-07-01": {
"tags": 0,
"vers": 6,
"matches": 0
},
"2023-08-01": {
"tags": 0,
"vers": 8,
"matches": 0
},
"2023-09-01": {
"tags": 0,
"vers": 6,
"matches": 0
},
"2023-10-01": {
"tags": 6,
"vers": 7,
"matches": 6
},
"2023-11-01": {
"tags": 3,
"vers": 3,
"matches": 3
},
"2023-12-01": {
"tags": 2,
"vers": 2,
"matches": 2
},
"2024-01-01": {
"tags": 3,
"vers": 3,
"matches": 3
},
"2024-02-01": {
"tags": 1,
"vers": 1,
"matches": 1
},
"2024-03-01": {
"tags": 5,
"vers": 4,
"matches": 3
},
"2024-04-01": {
"tags": 5,
"vers": 6,
"matches": 5
},
"2024-05-01": {
"tags": 2,
"vers": 2,
"matches": 2
},
"2024-06-01": {
"tags": 2,
"vers": 2,
"matches": 2
}
}
}
}
Looking at the above, we can see out of the 526 tags, 259 are matched with package release timestamps, giving a strong overlap of 49.
As small as it may sound, our research found that a Common value of 2 was sufficient to provide a 95% success match rate, but of course, the more , the stronger the signal. There are also some outliers involved, sometimes monorepo's can cloud the findings, but overall it's a useful signal to investigate things further. Good security generally draws upon multiple signals when establishing level of threat.
In Trusty this is one of the signals we use to flag packages as being of interest and requiring deeper analysis.
Next week we will look at some more valuable data points available in the API.
Feel free to build around our APIs and if you come up with anything cool , please do share it with me!