Version 1.5.0 (April 26, 2023)
Original authors:
- Oliver Chang (ochang@google.com)
- Russ Cox (rsc@google.com)
Purpose
There are many problems to solve industry-wide concerning vulnerability detection, tracking, and response. One low-level problem is that there are many databases and no standard interchange format. A client that wants to aggregate information from multiple databases must handle each database completely separately. Databases that want to exchange information with each other must also each have their own parser for each format. Systematic tracking of dependencies and collaboration between vulnerability database efforts is hampered by not having a common interchange format. See our blog post for more details.
This document defines a draft of a standard interchange format. We hope to define a format that all vulnerability databases can export, to make it easier for users, security researchers, and any other efforts to consume all available databases. Use of this format would also make it easier for the databases themselves to share or cross-check information.
This shared interchange format is not expected to be the internal format for any particular database. We hope only that every vulnerability database will make its entries available in this format to enable interoperability.
The idea for this format originally arose from discussions between the Go vulnerability database team and the OSV team.
This format is stable, but further backwards compatible changes may still be made. Feedback from maintainers of other vulnerability databases and security response teams is most welcome. Please feel free to create an issue in this repo.
Format Overview
The format is a JSON-based encoding format, using the following informal schema. The exact details of each field are elaborated in the next section. All strings contain UTF-8 text.
A JSON Schema for validation is also available here.
{
"schema_version": string,
"id": string,
"modified": string,
"published": string,
"withdrawn": string,
"aliases": [ string ],
"related": [ string ],
"summary": string,
"details": string,
"severity": [ {
"type": string,
"score": string
} ],
"affected": [ {
"package": {
"ecosystem": string,
"name": string,
"purl": string
},
"severity": [ {
"type": string,
"score": string
} ],
"ranges": [ {
"type": string,
"repo": string,
"events": [ {
"introduced": string,
"fixed": string,
"last_affected": string,
"limit": string
} ],
"database_specific": { see description }
} ],
"versions": [ string ],
"ecosystem_specific": { see description },
"database_specific": { see description }
} ],
"references": [ {
"type": string,
"url": string
} ],
"credits": [ {
"name": string,
"contact": [ string ],
"type": string
} ],
"database_specific": { see description }
}
Again, this document is only about the JSON encoding the database serves to consumers, which could be applications or other databases. A database might store its entries in an entirely different format, or it might store them using this schema but in a more human-editable encoding, such as TOML or YAML. For serving, only the JSON encoding format is allowed, not a transliteration into any other encoding.
Overall, the approach of this schema is to define only the fields that absolutely must be shared between databases, leaving customizations to the “ecosystem_specific” and “database_specific” blocks (see below)
Field Details
schema_version field
{
"schema_version": string
}
The schema_version
field is used to indicate which version of the OSV schema
a particular vulnerability was exported with. This can help consumer applications
decide how to import the data for their own systems and offer some protection
against future breaking changes. The value should be a string matching the OSV
Schema version, which follows the SemVer 2.0.0 format, with
no leading “v” prefix. If no value is specified, it should be assumed to be 1.0.0
,
matching version 1.0 of the OSV Schema. Clients can assume that new minor and patch
versions of the schema only add new fields, without changing the meaning of old
fields, so that a client that knows how to read version 1.2.0 can process data
identifying as schema version 1.3.0 by ignoring any unexpected fields.
id, modified fields
{
"id": string,
"modified": string
}
The id
field is a unique identifier for the vulnerability entry. It is a
string of the format <DB>-<ENTRYID>
, where DB
names the database and
ENTRYID
is in the format used by the database. For example: “OSV-2020-111”,
“CVE-2021-3114”, or “GHSA-vp9c-fpxx-744v”.
The defined database prefixes and their “home” databases are:
Prefix | Database | Description |
---|---|---|
A |
Android Vulnerability Database |
|
GO |
Go Vulnerability Database |
|
OSV |
OSV Vulnerability Database |
|
PYSEC |
PyPI Vulnerability Database |
|
RUSTSEC |
RustSec Advisory Database |
|
GSD |
Global Security Database |
|
GHSA |
GitHub Security Advisory Database |
|
LBSEC |
LoopBack Advisory Database |
|
DSA /DLA /DTSA |
Debian Security Advisory Database |
|
RLSA /RXSA |
Rocky Linux Security Advisory Database |
|
ALSA /ALBA /ALEA |
AlmaLinux Security Advisory |
|
Your database here | Send us a PR |
In addition to those prefixes, other databases may serve information about non-database-specific prefixes. For example a language ecosystem might decide to use CVE identifiers to index its database rather than a custom prefix. The known databases operating without custom identifier prefixes are:
- Alpine Security Advisory. Serving
<ID>
in the shared format athttps://storage.googleapis.com/cve-osv-conversion/osv-output/<ID>.json
. - Your database here. Send us a PR.
The modified
field gives the time the entry was last modified, as an
RFC3339-formatted timestamp in UTC (ending in “Z”). Given two
different entries claiming to describe the same id
field, the one with the
later modification time is considered authoritative.
The id
and modified
fields are required. For schema versions above 1.0.0,
the schema_version
field is also required. All other fields are optional,
although of course an entry with no other metadata is not particularly useful.
(It could potentially stand for a reserved ID with no other public information.)
published field
{
"published": string
}
The published
field gives the time the entry should be considered to have been
published, as an RFC3339-formatted time stamp in UTC (ending in “Z”).
withdrawn field
{
"withdrawn": string
}
The withdrawn
field gives the time the entry should be considered to have been
withdrawn, as an RFC3339-formatted timestamp in UTC (ending in “Z”). If the
field is missing, then the entry has not been withdrawn. Any rationale for why
the vulnerability has been withdrawn should go into the summary text.
aliases field
{
"aliases": [ string ]
}
The aliases
field gives a list of IDs of the same vulnerability in other
databases, in the form of the id
field. This allows one database to claim that
its own entry describes the same vulnerability as one or more entries in other
databases. Or if one database entry has been deduplicated into another in the
same database, the duplicate entry could be written using only the id
,
modified
, and aliases
field, to point to the canonical one.
related field
{
"related": [ string ]
}
The related
field gives a list of IDs of closely related vulnerabilities, such
as the same problem in alternate ecosystems.
summary, details fields
{
"summary": string,
"details": string
}
The summary
field gives a one-line, English textual summary of the
vulnerability. It is recommended that this field be kept short, on the order of
no more than 120 characters.
The details
field gives additional English textual details about the
vulnerability.
The summary
field is plain text.
The details
field is CommonMark markdown (a subset of GitHub-Flavored
Markdown). Display code may at its discretion sanitize the input further, such
as stripping raw HTML and links that do not start with http:// or https://.
Databases are encouraged not to include those in the first place. (The goal is
to balance flexibility of presentation with not exposing vulnerability database
display sites to unnecessary vulnerabilities.)
severity field
{
"severity": [ {
"type": string,
"score": string
} ]
}
The severity
field is a JSON array that allows generating systems to describe
the severity of a vulnerability using one or more quantitative scoring methods.
Each severity
item is a JSON object specifying a type
and score
property,
described below.
severity[].type field
The severity[].type
property must be one of the types defined below, which
describes the quantitative method used to calculate the associated score
.
Severity Type | Score Description |
---|---|
CVSS_V2 |
A CVSS vector string representing the unique characteristics and severity of the vulnerability using a version of the Common Vulnerability Scoring System notation that is == 2.0 (e.g."AV:L/AC:M/Au:N/C:N/I:P/A:C" ). |
CVSS_V3 |
A CVSS vector string representing the unique characteristics and severity of the vulnerability using a version of the Common Vulnerability Scoring System notation that is >= 3.0 and < 4.0 (e.g."CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:H/I:N/A:N" ). |
Your quantitative severity type here. | Send us a PR. |
severity[].score field
The severity[].score
property is a string representing the severity score based
on the selected severity[].type
, as described above.
affected fields
{
"affected": [ {
"package": {
"ecosystem": string,
"name": string,
"purl": string
},
"severity": [ {
"type": string,
"score": string
} ],
"ranges": [ {
"type": string,
"repo": string,
"events": [ {
"introduced": string,
"fixed": string,
"last_affected": string,
"limit": string
} ]
} ],
"versions": [ string ],
"ecosystem_specific": { see description },
"database_specific": { see description }
} ]
}
The affected
field is a JSON array containing objects that describes the
affected package versions, meaning those that contain the vulnerability.
Within each object in the affected
array, the package
field identifies the
package containing the vulnerability. In most cases, there should be exactly one
entry in the affected
array per affected package
to describe all affected
versions. In rare cases, for example if the ecosystem_specific
encodes
platform information that doesn’t apply equally to all listed versions and
ranges, a separate entry with the same package
in the affected
array may be
needed.
The versions
field can enumerate a specific set of affected versions, and the
ranges
field can list ranges of affected versions, under a given defined
ordering. A version is considered affected if it lies within any one of the
ranges or is listed in the versions list. Pseudocode for evaluating if a
given version is affected is available here.
The versions
list is generally recommended to always be present, to allow
software to easily answer the question “is this specific version affected?”
without having to contain code specific to every different ecosystem. If the
affected versions can be accurately summarized by one or more non-overlapping
ranges, they may be encoded using the ranges
field with an appropriate type
(see below). In this case, the ranges act as a kind of compact form of a larger
versions
list. Tooling and infrastructure such as https://osv.dev are able to
expand these ranges for supported ecosystems into the versions
list for easier
consumption. Products or ecosystems that do not use version identifiers that can
be represented as ranges must include the enumerated versions
list instead.
affected[].package field
The affected
object’s package
field is a JSON object identifying the
affected code library or command provided by the package. The object itself has
two required fields, ecosystem
and name
, and an optional purl
field.
The ecosystem
identifies the overall library ecosystem. It must be one of the
strings in the table below. The name
field is a string identifying the library
within its ecosystem. The two fields must both be present, because the
ecosystem
serves to define the interpretation of the name
.
The purl
field is a string following the
Package URL specification that
identifies the package. This field is optional but recommended.
Different ecosystems can define the same names; they identify different packages. For example, these denote different libraries with different sets of versions and different potential vulnerabilities:
{"ecosystem": "npm", "name": "zlib"}
{"ecosystem": "PyPI", "name": "zlib"}
The defined ecosystems are:
Ecosystem | Description |
---|---|
Go |
The Go ecosystem; the name field is a Go module path. |
npm |
The NPM ecosystem; the name field is an NPM package name. |
OSS-Fuzz |
For reports from the OSS-Fuzz project that have no more appropriate ecosystem; the name field is the name assigned by the OSS-Fuzz project, as recorded in the submitted fuzzing configuration. |
PyPI |
the Python PyPI ecosystem; the name field is a normalized PyPI package name. |
RubyGems |
The RubyGems ecosystem; the name field is a gem name. |
crates.io |
The crates.io ecosystem for Rust; the name field is a crate name. |
Packagist |
The PHP package manager ecosystem; the name is a package name. |
Maven |
The Maven Java package ecosystem. The name field is a Maven package name. |
NuGet |
The NuGet package ecosystem. The name field is a NuGet package name. |
Linux |
The Linux kernel. The only supported name is Kernel . |
Debian |
The Debian package ecosystem; the name is the name of the source package. The ecosystem string might optionally have a :<RELEASE> suffix to scope the package to a particular Debian release. <RELEASE> is a numeric version specified in the Debian distro-info-data. For example, the ecosystem string “Debian:7” refers to the Debian 7 (wheezy) release. |
Alpine |
The Alpine package ecosystem; the name is the name of the source package. The ecosystem string must have a :v<RELEASE-NUMBER> suffix to scope the package to a particular Alpine release branch (the v prefix is required). E.g. v3.16 . |
Hex |
The package manager for the Erlang ecosystem; the name is a Hex package name. |
Android |
The Android ecosystem; the name field is the Android component name that the patch applies to, as shown in the Android Security Bulletins such as Framework , Media Framework and Kernel Component . The exhaustive list of components can be found at the Appendix. |
GitHub Actions |
The GitHub Actions ecosystem; the name field is the action’s repository name with owner e.g. {owner}/{repo} . |
Pub |
The package manager for the Dart ecosystem; the name field is a Dart package name. |
ConanCenter |
The ConanCenter ecosystem for C and C++; the name field is a Conan package name. |
Rocky Linux |
The Rocky Linux package ecosystem; the name is the name of the source package. The ecosystem string might optionally have a :<RELEASE> suffix to scope the package to a particular Rocky Linux release. <RELEASE> is a numeric version. |
AlmaLinux |
AlmaLinux package ecosystem; the name is the name of the source package. The ecosystem string might optionally have a :<RELEASE> suffix to scope the package to a particular AlmaLinux release. <RELEASE> is a numeric version. |
Your ecosystem here. | Send us a PR. |
It is permitted for a database name (the DB prefix in the id
field) and an
ecosystem name to be the same, provided they have the same owner who can make
decisions about the meaning of the ecosystem_specific
field (see below).
affected[].severity field
The severity
field is an optional element defined here.
This severity
field applies to a specific package, in cases where affected
packages have differing severities for the same vulnerability. If any package
level severity
fields are set, the top level severity
must not be set.
affected[].versions field
The affected
object’s versions
field is a JSON array of strings. Each string
is a single affected version in whatever version syntax is used by the given
package ecosystem.
affected[].ranges[] field
The affected
object’s ranges
field is a JSON array of objects describing the
affected ranges of versions.
affected[].ranges[].type field
In the ranges
field, the type
field is required. It specifies the type of
version range being recorded and defines the interpretation of the events
object’s introduced
, fixed
, and any type-specific fields.
The defined types and their additional fields are:
-
SEMVER
: The versionsintroduced
andfixed
are semantic versions as defined by SemVer 2.0.0, with no leading “v” prefix. The relationu < v
denotes the precedence order defined in section 11 of SemVer 2.0. Ranges listed with typeSEMVER
should not overlap: since SEMVER is a strict linear ordering, it is always possible to simplify to non-overlapping ranges.Specifying one or more
SEMVER
ranges removes the requirement to specify an explicit enumeratedversions
list (see the discussion above).Some ecosystems may recommend using SemVer 2.0 for versioning without explicitly enforcing it. In those cases you should use the
ECOSYSTEM
type instead. -
ECOSYSTEM
: The versionsintroduced
andfixed
are arbitrary, uninterpreted strings specific to the package ecosystem, which does not conform to SemVer 2.0’s version ordering.It is recommended that you provide an explicitly enumerated
versions
list when specifying one or moreECOSYSTEM
ranges, becauseECOSYSTEM
range inclusion queries may not be able to be answered without reference to the package ecosystem’s own logic and therefore may not be able to be used by ecosystem-independent processors. The infrastructure and tooling provided by https://osv.dev also provides automation for auto-populating theversions
list based on supportedECOSYSTEM
ranges as part of the ingestion process. -
GIT
: The versionsintroduced
andfixed
are full-length Git commit hashes. The repository’s commit graph is needed to evaluate whether a given version is in the range. The relationu < v
is true when commitu
is a (perhaps distant) parent of commitv
.Specifying one or more
GIT
ranges does NOT remove the requirement to specify an explicitly enumeratedversions
list, becauseGIT
range inclusion queries cannot be answered without access to a copy of the underlying Git repository.
affected[].ranges[].events fields
The ranges
object’s events
field is a JSON array of objects. Each object
describes a single version that either:
- Introduces a vulnerability:
{"introduced": string}
- Fixes a vulnerability:
{"fixed": string}
- Describes the last known affected version:
{"last_affected": string}
- Sets an upper limit on the range being described:
{"limit": string}
These events
objects represent a “timeline” of status changes for the affected
package.
The values of “introduced”, “fixed”, “last_affected” and “limit” are version strings
as defined by the affected[].ranges[].type
field.
Special values
"introduced"
allows a version of the value"0"
to represent a version that sorts before any other version."limit"
allows versions containing the string"*"
to represent “infinity”. If no limit events are provided, an implicit{ "limit": "*" }
is assumed to exist. Multiple"limit"
events are allowed in the same range.
Requirements
Only a single type (either "introduced"
, "fixed"
, "last_affected"
,
"limit"
) is allowed in each event object. For instance,
{"introduced": "1.0.0", "fixed": "1.0.2"}
is invalid.
Entries in the events
array can contain either "last_affected"
or "fixed"
events, but not both. It’s strongly recommended to use "fixed"
instead of
"last_affected"
where possible, as it precisely identifies the version which
contains the fix.
There must be at least one "introduced"
object in the events
array. While
not required, it’s also recommended to keep the events
array sorted according
to the affected[].ranges[].type
of the range.
See examples for examples of how to describe affected ranges.
affected[].ranges[].repo field
The ranges
object’s repo
field is the URL of the package’s code
repository. The value should be in a format that’s directly usable as an
argument for the version control system’s clone command (e.g. git clone
).
The affected
object’s ranges
field is a JSON array of objects, each
describing a single range. The range object defines the fields type
,
events
, repo
. introduced
, fixed
, and additional type-specific fields as needed.
This field is required if affected[].ranges[].type
is GIT
.
affected[].ranges[].database_specific field
The ranges
object’s database_specific
field is a JSON object holding
additional information about the range from which the record was obtained. The
meaning of the values within the object is entirely defined by the database and
beyond the scope of this document.
Databases should only use this field to store additional context that may be useful in converting from the OSV format back into the original database representation. Values in this field have no effect on the evaluation algorithm.
affected[].ecosystem_specific field
The affected
object’s ecosystem_specific
field is a JSON object holding
additional information about the vulnerability as defined by the ecosystem for
which the record applies. The meaning of the values within the object is
entirely defined by the ecosystem and beyond the scope of this document.
For example, the Go ecosystem includes here information about the affected functions and which modules the packages were found in, along with severity in the Go project-specific severity scale.
Note that this is a single field with key “ecosystem_specific”, which itself contains a JSON object with unspecified fields.
affected[].database_specific field
The affected
object’s database_specific
field is a JSON object holding
additional information about the vulnerability as defined by the database from
which the record was obtained. The meaning of the values within the object is
entirely defined by the database and beyond the scope of this document.
In general, the canonical database for a particular ecosystem should record its
information in ecosystem_specific
, allowing other aggregator databases to put
their own summaries in database_specific
.
For example, databases that add additional information such as computed CVSS scores for ecosystems that do not provide them could add that information here.
Note that this is a single field with key “database_specific”, which itself contains a JSON object with unspecified fields.
Evaluation
The algorithm to evaluate if a package pkg
at version v
is vulnerable is
given by the IsVulnerable
function in the pseudocode below:
func IsVulnerable(pkg, v, osv)
for affected in osv.affected
if affected.package == pkg
if IncludedInVersions(v, affected.versions) ||
IncludedInRanges(v, affected.ranges)
return true
return false
func IncludedInVersions(v, versions)
for version in versions
if v == version
return true
return false
func IncludedInRanges(v, ranges)
for range in ranges
if BeforeLimits(v, range)
vulnerable = false
for evt in sorted(range.events)
if evt.introduced is present && v >= evt.introduced
vulnerable = true
else if evt.fixed is present && v >= evt.fixed
vulnerable = false
else if evt.last_affected is present && v > evt.last_affected
vulnerable = false
if vulnerable
return true
return false
func BeforeLimits(v, range)
if no limit events in range.events
# implicit "*" entry is assumed
return true
for evt in range.events
if evt.limit is present and v < evt.limit
return true
return false
Examples
The following expresses that “every possible version is affected”.
"ranges": [ {
"type": "SEMVER",
"events": [
{ "introduced": "0" },
]
} ]
The following expresses that “everything before 1.0.2
” is affected.
"ranges": [ {
"type": "SEMVER",
"events": [
{ "introduced": "0" },
{ "fixed": "1.0.2" },
]
} ]
The following expresses that versions in the SemVer ranges [1.0.0,
1.0.2)
or [3.0.0, 3.2.5)
are affected. Everything else is unaffected.
"ranges": [ {
"type": "SEMVER",
"events": [
{ "introduced": "1.0.0" },
{ "fixed": "1.0.2" },
{ "introduced": "3.0.0" },
{ "fixed": "3.2.5" }
]
} ]
"limit"
events are typically not necessary for describing numbered (linear)
version ranges and should not be used. They are more useful for git ranges,
where it has more implications for the evaluation algorithm. Take the following
git commit graph and git range:
"ranges": [ {
"type": "GIT",
"repo": "https://github.com/owner/repo",
"events": [
{ "introduced": "X" },
{ "fixed": "Y" },
]
} ]
Without an explicit limit
, the list of computed affected commits will be X,
A, B, C, D, E, F
. This is the desired behaviour in most cases.
"ranges": [ {
"type": "GIT",
"repo": "https://github.com/owner/repo",
"events": [
{ "introduced": "X" },
{ "limit": "Y" },
]
} ]
If limit
is set to Y
, the list of affected commits will be X, A, B, C
. This
is equivalent to git rev-list X..Y
(but including X
and excluding Y
).
This may be useful if the scope of a vulnerability entry is limited to a small
set of linear branches. Multiple "limit"
events may be specified for each
branch – each expands the scope of the git commit graph to cover.
Note that we did not specify a fixed
event here as limit
makes it redundant.
references field
{
"references": [ {
"type": string,
"url": string
} ]
}
The references
field contains a list of JSON objects describing references.
Each object has a string field type
specifying the type of reference, and a
string field url
. The url
is the fully-qualified URL (including the scheme,
typically “https://”) linking to additional information, advisories, issue
tracker entries, and so on about the vulnerability itself. The type
specifies
what kind of reference the URL is.
The known reference type
values are:
ADVISORY
: A published security advisory for the vulnerability.ARTICLE
: An article or blog post describing the vulnerability.DETECTION
: A tool, script, scanner, or other mechanism that allows for detection of the vulnerability in production environments. e.g. YARA rules, hashes, virus signature, or other scanners.DISCUSSION
: A social media discussion regarding the vulnerability, e.g. a Twitter, Mastodon, Hacker News, or Reddit thread.REPORT
: A report, typically on a bug or issue tracker, of the vulnerability.FIX
: A source code browser link to the fix (e.g., a GitHub commit) Note that thefix
type is meant for viewing by people using web browsers. Programs interested in analyzing the exact commit range would do better to use theGIT
-typedaffected[].ranges
entries (described above).INTRODUCED
: A source code browser link to the introduction of the vulnerability (e.g., a GitHub commit) Note that theintroduced
type is meant for viewing by people using web browsers. Programs interested in analyzing the exact commit range would do better to use theGIT
-typedaffected[].ranges
entries (described above).PACKAGE
: A home web page for the package.EVIDENCE
: A demonstration of the validity of a vulnerability claim, e.g.app.any.run
replaying the exploitation of the vulnerability.WEB
: A web page of some unspecified kind.
credits fields
{
"credits": [ {
"name": string,
"contact": [ string ],
"type": string,
} ]
}
The credits
field is a JSON array providing a way to give credit for the
discovery, confirmation, patch, or other events in the life cycle of a
vulnerability.
Each of the objects in the credits
array must contain at minimum a name
field
specifying the name of the individual or entity being credited, using whatever
notation they prefer. It can also optionally include a contact
JSON array.
credits[].name field
credits[].name
should specify the name, label, or other identifier of the
individual or entity being credited, using whatever notation the creditor prefers.
For instance, this could contain a real name like Kovács János
, an Internet handle
like erikamustermann
, an entity name like GitHub
, or something else. This field
is required for each credits
entry.
credits[].contact[] field
Each credits[].contact[]
entry should be a valid, fully qualified, plain-text URL
at which the credited can be reached. Providing contacts is optional.
credits[].type field
The optional credits[].type
field should specify the type or role of the
individual or entity being credited. It must be one of the following defined
credit types:
FINDER
: identified the vulnerability.REPORTER
: notified the vendor of the vulnerability to a CNA.ANALYST
: validated the vulnerability to ensure accuracy or severity.COORDINATOR
: facilitated the coordinated response process.REMEDIATION_DEVELOPER
: prepared a code change or other remediation plans.REMEDIATION_REVIEWER
: reviewed vulnerability remediation plans or code changes for effectiveness and completeness.REMEDIATION_VERIFIER
: tested and verified the vulnerability or its remediation.TOOL
: names of tools used in vulnerability discovery or identification.SPONSOR
: supported the vulnerability identification or remediation activities.OTHER
: any other type or role that does not fall under the categories described above.
These values and their definitions correspond directly to the credit types defined in the MITRE CVE specification.
Examples
Including a URL and an email address in credits[].contact[]
and a credit type:
{
"credits": [ {
"name": "Janina Kowalska",
"contact": [
"https://twitter.com/JaninaKowalska01",
"mailto:nina@kowalska-family.net"
],
"type": "REMEDIATION_DEVELOPER",
} ]
}
database_specific field
The top-level database_specific
field is a JSON object holding additional
information about the vulnerability as defined by the database from which the
record was obtained. The meaning of the values within the object is entirely
defined by the database and beyond the scope of this document.
Unlike the affected[].database_specific
field which is specific to each
affected product, this top-level field allows aggregator databases to add
custom data that they considered applicable to the whole vulnerability.
Note that this is a single field with key “database_specific”, which itself contains a JSON object with unspecified fields.
Examples
Go vulnerability
The Go vulnerability database uses this format.
See https://go.dev/security/vuln/database for example entries and details of
ecosystem_specific
fields.
NPM vulnerability in GitHub database
GitHub uses this format already for its vulnerabilities. Here is the encoding of one entry:
{
"schema_version": "1.2.0",
"id": "GHSA-r9p9-mrjm-926w",
"modified": "2021-03-08T16:02:43Z",
"published": "2021-03-08T16:06:50Z",
"aliases": [
"CVE-2020-28498"
],
"summary": "Use of a Broken or Risky Cryptographic Algorithm",
"details": "The npm package `elliptic` before version 6.5.4 are vulnerable to Cryptographic Issues via the secp256k1 implementation in elliptic/ec/key.js. There is no check to confirm that the public key point passed into the derive function actually exists on the secp256k1 curve. This results in the potential for the private key used in this implementation to be revealed after a number of ECDH operations are performed.",
"severity": [
{
"type": "CVSS_V3",
"score": "CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:H/I:N/A:N"
}
],
"affected": [
{
"package": {
"ecosystem": "npm",
"name": "elliptic"
},
"ranges": [
{
"type": "ECOSYSTEM",
"events": [
{
"introduced": "0"
},
{
"fixed": "6.5.4"
}
]
}
]
}
],
"references": [
{
"type": "ADVISORY",
"url": "https://nvd.nist.gov/vuln/detail/CVE-2020-28498"
},
{
"type": "WEB",
"url": "https://github.com/indutny/elliptic/pull/244/commits"
},
{
"type": "WEB",
"url": "https://github.com/indutny/elliptic/commit/441b7428b0e8f6636c42118ad2aaa186d3c34c3f"
},
{
"type": "WEB",
"url": "https://github.com/christianlundkvist/blog/blob/master/2020_05_26_secp256k1_twist_attacks/secp256k1_twist_attacks.md"
},
{
"type": "WEB",
"url": "https://snyk.io/vuln/SNYK-JAVA-ORGWEBJARSNPM-1069836"
},
{
"type": "WEB",
"url": "https://snyk.io/vuln/SNYK-JS-ELLIPTIC-1064899"
},
{
"type": "WEB",
"url": "https://www.npmjs.com/package/elliptic"
}
],
"database_specific": {
"cwe_ids": [
"CWE-327"
],
"severity": "MODERATE",
"github_reviewed": true
}
}
OSV vulnerability
OSV uses this format already for its vulnerabilities. Here is the encoding of one entry:
{
"schema_version": "1.2.0",
"id": "OSV-2020-584",
"published": "2020-07-01T00:00:18.401815Z",
"modified": "2021-03-09T04:49:05.965964Z",
"summary": "Heap-buffer-overflow in collator_compare_fuzzer.cpp",
"details": "OSS-Fuzz report: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15499\nCrash type: Heap-buffer-overflow WRITE 3\nCrash state:\ncollator_compare_fuzzer.cpp\n",
"references": [
{"type": "REPORT", "url": "https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15499"}
],
"affected": [ {
"package": {
"ecosystem": "OSS-Fuzz",
"name": "icu"
},
"ranges": [
{
"type": "GIT",
"repo": "https://github.com/unicode-org/icu.git",
"events": [
{ "introduced": "6e5755a2a833bc64852eae12967d0a54d7adf629" },
{ "fixed": "c43455749b914feef56b178b256f29b3016146eb" }
]
}
]
} ]
}
Rust vulnerability
The Rust advisory DB exports this format. Here’s an example entry:
{
"schema_version": "1.2.0",
"id": "RUSTSEC-2019-0033",
"published": "2019-11-16T00:00:00Z",
"modified": "2021-01-04T19:02:00Z",
"aliases": ["CVE-2020-25574", "CVE-2019-25008"],
"summary": "Integer Overflow in HeaderMap::reserve() can cause Denial of Service",
"details": "HeaderMap::reserve() used usize::next_power_of_two() to calculate\nthe increased capacity. However, next_power_of_two() silently overflows\nto 0 if given a sufficently large number in release mode.\n\nIf the map was not empty when the overflow happens, the library will invoke self.grow(0)\nand start infinite probing. This allows an attacker who controls\nthe argument to reserve() to cause a potential denial of service (DoS).\n\nThe flaw was corrected in 0.1.20 release of http crate.\n",
"references": [
{"type": "REPORT", "url": "https://github.com/hyperium/http/issues/352"},
{"type": "ADVISORY", "url": "https://rustsec.org/advisories/RUSTSEC-2019-0033.html"}
],
"affected": [ {
"package": {
"ecosystem": "crates.io",
"name": "http"
},
"ranges": [
{
"type": "SEMVER",
"events": [
{"introduced": "0"},
{"fixed": "0.1.20"}
]
}
],
"ecosystem_specific": {
"functions": ["http::header::HeaderMap::reserve"],
"keywords": ["http", "integer-overflow", "DoS"],
"categories": ["denial-of-service"],
"severity": "HIGH"
}
} ]
}
Python vulnerability
Python currently has a community vulnerability database using this format. Here is a potential encoding of a vulnerability entry.
{
"schema_version": "1.2.0",
"id": "PYSEC-2021-XXXX",
"published": "2021-04-01T20:15:00Z",
"modified": "2021-04-07T15:14:00Z",
"aliases": ["CVE-2021-29421"],
"summary": "XXE in pikepdf",
"details": "models/metadata.py in the pikepdf package 2.8.0 through 2.9.2 for Python allows XXE when parsing XMP metadata entries.",
"references": [
{"type": "FIX", "url": "https://github.com/pikepdf/pikepdf/commit/3f38f73218e5e782fe411ccbb3b44a793c0b343a"}
],
"affected": [ {
"package": {
"ecosystem": "PyPI",
"name": "pikepdf"
},
"ranges": [
{
"type": "GIT",
"repo": "https://github.com/pikepdf/pikepdf",
"events": [
{ "introduced": "0" },
{ "fixed": "3f38f73218e5e782fe411ccbb3b44a793c0b343a" }
]
},
{
"type": "ECOSYSTEM",
"events": [
{ "introduced": "2.8.0" },
{ "fixed": "2.10.0" }
]
}
],
"versions": [
"2.8.0", "2.8.0.post1", "2.8.0.post2", "2.9.0", "2.9.1", "2.9.2"
],
"ecosystem_specific": {
"severity": "HIGH"
}
} ]
}
Debian vulnerability
Debian security advisory (DSA) does not use this format currently, but using the conversion tool, the DSA can be converted into the OSV format.
{
"id": "DSA-3029-1",
"summary": "nginx - security update",
"affected": [
{
"package": {
"ecosystem": "Debian:7",
"name": "nginx"
},
"ranges": [
{
"type": "ECOSYSTEM",
"events": [
{
"introduced": "0"
},
{
"fixed": "1.2.1-2.2+wheezy3"
}
]
}
]
}
],
"aliases": [
"CVE-2014-3616"
],
"published": "2014-09-20T00:00:00Z",
"modified": "2014-09-20T08:18:07Z",
"details": "\nAntoine Delignat-Lavaud and Karthikeyan Bhargavan discovered that it was\npossible to reuse cached SSL sessions in unrelated contexts, allowing\nvirtual host confusion attacks in some configurations by an attacker in\na privileged network position.\n\n\nFor the stable distribution (wheezy), this problem has been fixed in\nversion 1.2.1-2.2+wheezy3.\n\n\nFor the testing distribution (jessie), this problem has been fixed in\nversion 1.6.2-1.\n\n\nFor the unstable distribution (sid), this problem has been fixed in\nversion 1.6.2-1.\n\n\nWe recommend that you upgrade your nginx packages.\n\n\n",
"references": [
{
"type": "ADVISORY",
"url": "https://www.debian.org/security/2014/dsa-3029"
}
]
}
Ruby vulnerability
Ruby does not use this format currently, but here is a potential translation of one Ruby advisory:
{
"schema_version": "1.2.0",
"id": "CVE-2019-3881",
"published": "2018-04-23T00:00:00Z",
"modified": "2021-05-10T00:00:00Z",
"summary": "Insecure path handling in Bundler",
"details": "Bundler prior to 2.1.0 uses a predictable path in /tmp/, created with insecure permissions as a storage location for gems, if locations under the user's home directory are not available. If Bundler is used in a scenario where the user does not have a writable home directory, an attacker could place malicious code in this directory that would be later loaded and executed.",
"affected": [ {
"package": {
"ecosystem": "RubyGems",
"name": "bundler"
},
"ranges": [ {
"type": "ECOSYSTEM",
"events": [
{ "introduced": "1.14.0" },
{ "fixed": "2.1.0" }
]
} ],
"versions": [
"1.14.0", "1.14.1", "1.14.2", "1.14.3", "1.14.4", "1.14.5",
"1.14.6", "1.15.0.pre.1", "1.15.0.pre.2", "1.15.0.pre.3",
"1.15.0.pre.4", "1.15.0", "1.15.1", "1.15.2", "1.15.3", "1.15.4",
"1.16.0.pre.1", "1.16.0.pre.2", "1.16.0.pre.3", "1.16.0",
"1.16.1", "1.16.2", "1.16.3", "1.16.4", "1.16.5", "1.16.6",
"1.17.0.pre.1", "1.17.0.pre.2", "1.17.0", "1.17.1", "1.17.2",
"1.17.3", "2.0.0.pre.1", "2.0.0.pre.2", "2.0.0.pre.3", "2.0.0",
"2.0.1", "2.0.2", "2.1.0.pre.1", "2.1.0.pre.2", "2.1.0.pre.3"
]
} ],
"references": [
{"type": "ADVISORY", "url": "https://github.com/advisories/GHSA-g98m-96g9-wfjq"}
]
}
Change Log
- 2021-03-29 added “withdrawn” field
- 2021-04-07 changed “details” to Markdown, change “references” to a list of objects with a new “type” field in addition to the URL.
- 2021-04-23 handful of changes, see Status - 2021-04-23 below for details. Corrected examples.
- 2021-04-26 changed
database-specific
andecosystem-specific
todatabase_specific
andecosystem_specific
for easier access from languages that access JSON field keys using x.field notation. - 2021-06-08 Added “purl” to the “package” field and some minor clarifications.
- 2021-06-30 Fixed an incorrect/typoed specification for “affects” from an array of objects to an object.
- 2021-08-17 Support multiple packages per entry by moving
packages
,ecosystem_specific
anddatabase_specific
intoaffected
. Theaffected
field is intentionally named differently to the previousaffects
field to make migration easier. Also use “events” containing single versions to represent affected version ranges instead. - 2021-09-08 Promoted schema to 1.0.
- 2022-01-19 Released version 1.2.0. Includes various changes suggested by
GitHub (
schema_version
, top-leveldatabase_specific
,credits
,severity
, relaxation of version enumeration requirement). - 2022-03-24 Released version 1.3.0. Added
last_affected
event type anddatabase_specific
toaffected[].ranges[]
. Context: https://github.com/ossf/osv-schema/issues/35. - 2023-02-21 Released version 1.4.0. Added per package
severity
and credit types. - 2023-04-26 Released version 1.5.0. Added new reference types.
Status - 2021-04-07
The unresolved open issues boil down to what the use case is for this data.
The use case we had in mind was enabling computer processing of vulnerability databases, so that for example:
A web site can display information about a vulnerability fetched from an unaffiliated database. A security researcher can fetch precise info about which versions are vulnerable for offline analysis. A vulnerability-checking tool can check a build manifest against a collection of these entries to see which are included in the build and then report a summary.
The vision we had for this was that the same underlying vulnerability might be described by multiple databases. We wanted a way for databases to be able to cross-link with each other and understand what the others were publishing.
See also the “Goal: Standard Schema for Vulnerability Databases” section in https://security.googleblog.com/2021/02/know-prevent-fix-framework-for-shifting.html, which I’ll quote here:
Goal: Standard Schema for Vulnerability Databases
Infrastructure and industry standards are needed to track and maintain open source vulnerabilities, understand their consequences, and manage their mitigations. A standard vulnerability schema would allow common tools to work across multiple vulnerability databases and simplify the task of tracking, especially when vulnerabilities touch multiple languages or subsystems.
It was a non-goal to unify the entries in different existing databases into a single entry for a particular vulnerability.
The open issues that remain seem to be pushing toward a new use case, which is to be able to unify the entries in different existing databases into a single entry (for a particular vulnerability). That was a non-goal: the assumption is that there will always be multiple databases, because at the least each ecosystem will have its own database with its own custom metadata that doesn’t really make sense to other databases.
The open issue for “delete severity” illustrates this. If you are only collecting info from an ecosystem’s database, it probably does help to be able to find out what that ecosystem calls the severity of the vulnerability. But obviously if you are trying to provide a one-true-entry, then it’s not going to make sense to have a global idea of severity.
The open issue for being able to give the affected versions across multiple ecosystems also illustrates this. If each ecosystem has its own database and suppose there is a Rust TLS library with a vulnerability and a Go wrapper of that library, then you’d have two entries: one in the Go database and one in the Rust database. There’s no need to try to put the Rust and Go versions into a single entry. Both entries could mention the other in “aliases”, and maybe they’d both also list the same CVE in “aliases” as well. On the other hand if the goal is to define a “one true entry” then obviously you do need to be able to make the affected versions per-ecosystem, along with potentially most of the other fields.
For the purpose of discussion, I assert that “one true entry” is still a non-goal for this schema and that aggregators can easily separate out “affected Rust versions” and “affected Go versions” from the two entries. I’m skeptical about a single global database both because of the overhead of coordination (it bogs down as more and more ecosystems get involved) and how much harder it makes it to customize and experiment on a per-ecosystem basis. I think there’s still a lot to learn about what we want from these databases, and a more distributed, federated model makes more sense to me.
Do people think that’s a mistake? Does someone want to make the counter-argument that we should expand the scope to being able to define the entries in a single global database?
Thanks.
Status - 2021-04-23
Affected versions. I added more explanatory text to the discussion of the
affected field, to make clear what is expected as far as ecosystem-specific
encodings of ranges. I also added a new ECOSYSTEM
range type.
Non-unique database prefixes. I changed the discussion in “id, modified” to handle the case of databases that decide to use identifiers drawn from some larger space, such as a language that only issues CVE numbers, not its own identifiers.
Multi-ecosystem vulnerabilities. There was an open issue, discussed at length in the previous status update, about whether to be able to capture a vulnerability in multiple ecosystems in a single report. As noted in the previous update, attempting to do that seems like it introduces unnecessary coordination problems while also calling into question literally every top-level field that might now have to be duplicated for each ecosystem. That is, even the description may need to be tailored for each ecosystem, meaning that the new multi-ecosystem format ends up being a map from ecosystem to the current format. At that point, it’s indistinguishable from multiple current-format records.
Also, it appears that even GHSA uses distinct entries for the same CVE from different ecosystems. For example, CVE 2019-8331 in Bootstrap is both https://github.com/advisories/GHSA-fxwm-579q-49qq (nuget) and https://github.com/advisories/GHSA-wh77-3x4m-4q9g (npm).
Similarly, the example I happened to pick for NPM above was assigned SNYK-JS-ELLIPTIC-1064899 but has the related entry SNYK-JAVA-ORGWEBJARSNPM-1069836 for that code vendored into the Java world.
I’ve resolved the open issue in favor of single-ecosystem entries rather than multi-ecosystem entries. Following a suggestion from Robert Schultheis, I added a “related” list next to “aliases” to capture this kind of close relationship. I’ve updated the NPM example, which already listed SNYK-JS-ELLIPTIC-1064899 in “aliases”, to list SNYK-JAVA-ORGWEBJARSNPM-1069836 in “related”.
Severity. Based on the discussion, removed severity as a top-level field (it is too ill-specified and not generally interpretable). Databases can provide it as a database-specific field instead.
Extra. Based on the discussion and general confusion about what belonged here, split back into two fields: ecosystem-specific and database-specific.
References. Updated the examples to use the new objects with “type” and “url”. Based on that, added some new reference types.
My plan is to rewrite the italicized paragraph at the top of the doc on Monday and then share the link publicly to gather more feedback from groups that have not yet seen it.
Status - 2021-08-24
The biggest change to the schema is our decision to support multiple packages and ecosystems per entry and the way we specify version ranges.
These changes are primarily in the interests of supporting better interoperability with other vulnerability schemas, such as the CVE JSON schema, where multiple packages are supported in a single entry. We’ve also been suggesting other changes to the CVE schema for better alignment (1, 2, 3, 4).
Supporting multiple packages is a reversal of our decision back in April (see “Status - 2021-04-23” for our rationale at the time).
The other major change is the way we specify ranges. Instead of specifying half open ranges as [introduced, fixed), ranges are encoded with with “events” in a “timeline” of sorts:
Instead of
"ranges": [{
"type: "GIT",
"introduced": "058504edd02667eef8fac9be27ab3ea74332e9b4",
"fixed": "3533e50cbee8ff086bfa04176ac42a01ee3db37d"
}, {
"type": "GIT",
"introduced": "058504edd02667eef8fac9be27ab3ea74332e9b4",
"fixed": "c5157b3e775dac31d51b11f993a06a84dc11fc8c" }
}, {
"type": "SEMVER",
"introduced": "1.0.0",
"fixed": "1.0.2",
}, {
"type": "SEMVER",
"introduced": "1.1.0",
"fixed": "1.1.3",
}]
We now have:
"ranges": [{
"type: "GIT",
"events": [
{ "introduced": "058504edd02667eef8fac9be27ab3ea74332e9b4" },
{ "fixed": "3533e50cbee8ff086bfa04176ac42a01ee3db37d" },
{ "fixed": "c5157b3e775dac31d51b11f993a06a84dc11fc8c" }
]
}, {
"type: "SEMVER",
"events": [
{ "introduced": "1.0.0" },
{ "fixed": "1.0.2" },
{ "introduced": "1.1.0" },
{ "fixed": "1.1.3" }
]
}]
This avoids repetition with the previous approach and better fits non-linear versioning schemes like git ranges. We also introduced a “limit” event, which allows git range specification to be scoped to a small num single branch. This was not possible before.
These are breaking changes, but we hope to make migration easier by renaming the “affects” field to “affected” to allow existing consumers and producers of this data to more easily handle old and new versions of entries.
Status - 2021-09-08
We’ve decided to promote our schema to 1.0, based on positive feedback from existing consumers and producers of this data. From this point on, no backwards incompatible changes will be made in any 1.x version.
Appendix
Android Ecosystem Components
A package name in the Android
ecosystem indicates the component of Android that the advisory applies to. This matches the component names used in Android Security Bulletins (ASB). The complete list of components used in ASB is as follows:
- Framework
- Library
- Media Framework
- System
- Android Runtime
- Kernel
- Kernel LTS
- Android TV
- AMLogic
- ARM
- Broadcom
- FPC
- HTC
- LG
- MediaTek
- Marvell
- NVIDIA
- Qualcomm
- Telecommunication
- Unisoc
- Widevine DRM