I have a particular fascination with the threat of supply chain compromise via package manager operations. Not so much that a malicious library will be embedded into the final product; rather, that when the programmer installs a package, such as from NPM, PyPI, or crates.io, arbitrary code is executed, which may deposit a backdoor that grants access to the developer’s access, secrets, etc.
Again, as developers, we should remember that simply installing a source code package from a repository can invoke arbitrary code on your system.
One way to monitor for these sorts of attacks is to do large scale installations of all available packages and see what behavior we encounter. This is one thing that the Open Source Security Foundation (OpenSSF) does.
The OSSF Package Analysis project:
seeks to understand the behavior and capabilities of packages available on open source repositories: what files do they access, what addresses do they connect to, and what commands do they run? The project also tracks changes in how packages behave over time, to identify when previously safe software begins acting suspiciously.
- Package repositories are monitored for new packages.
- Each new package is scheduled to be analyzed by a pool of workers.
- A worker performs dynamic analysis of the package inside a sandbox.
- Results are stored and imported into BigQuery for inspection.
Sandboxing via gVisor containers ensures the packages are isolated. Detonating a package inside the sandbox allows us to capture strace and packet data that can indicate malicious interactions with the system as well as network connections that can be used to leak sensitive data or allow remote access.
Notably, the project exposes their BigQuery data set here.
The data set currently includes packages from:
- npm
- pypi
- rubygems
- packagist
- crates.io
I started exploring crates.io packages first, since I follow the Rust ecosystem and am familiar with the crates.io infrastructure. In my experience, NPM has many more packages and also quite a bit of malware. This can be overwhelming. I think crates.io has enough data to get started while not being exhausting.
Sifting through crates.io packages
I inspected the dynamic analysis results captured by OSSF Package Analysis as they installed and built Rust packages from crates.io. I expect all of the interesting activity will come from build.rs
scripts that can do arbitrary things at compile time.
My general strategy is to aggregate activity and estimate the prevalence of various techniques, and then look in the “long tail” of uncommon behavior. For example, by reviewing the DNS names resolved during package installation, and filtering out the domains related to crates.io, I can discover when curl
(or similar) is used to fetch untrusted remote content.
DNS resolutions during crates.io package installation
crates.io package name, version, and DNS resolutions during installation. Just to get a sense for the data, so selected from a recent day and limited to 100.
SELECT
T.Package,
Queries.Hostname
FROM
`ossf-malware-analysis.packages.analysis` AS T,
T.Analysis.install.DNS as DNS,
DNS.Queries as Queries
WHERE
TIMESTAMP_TRUNC(CreatedTimestamp, DAY) = TIMESTAMP("2023-09-12")
AND Package.Ecosystem = "crates.io"
;
Name | Version | Hostname |
---|---|---|
actix-client-ip-cloudflare | 0.1.0 | github.com |
actix-client-ip-cloudflare | 0.1.0 | crates.io |
actix-client-ip-cloudflare | 0.1.0 | static.crates.io |
actix-client-ip-cloudflare | 0.1.0 | api.github.com |
bnf_sampler | 0.2.0 | api.github.com |
bnf_sampler | 0.2.0 | github.com |
bnf_sampler | 0.2.0 | crates.io |
bnf_sampler | 0.2.0 | static.crates.io |
cargo-scaffold | 0.8.12 | crates.io |
cargo-scaffold | 0.8.12 | static.crates.io |
… | … | … |
Prevalence of DNS resolutions during crates.io package installation
SELECT
Queries.Hostname,
COUNT(*) AS `count`
FROM
`ossf-malware-analysis.packages.analysis` AS T,
T.Analysis.install.DNS as DNS,
DNS.Queries AS Queries
WHERE
TIMESTAMP_TRUNC(CreatedTimestamp, YEAR) = TIMESTAMP("2023-01-01")
AND Package.Ecosystem = "crates.io"
GROUP BY Queries.Hostname
ORDER BY `count` DESC
;
Hostname | count |
---|---|
github.com |
142785 |
crates.io |
141911 |
static.crates.io |
141894 |
api.github.com |
70351 |
objects.githubusercontent.com |
214 |
proxy.golang.org |
18 |
storage.googleapis.com |
16 |
codeload.github.com |
15 |
software.ditto.live |
11 |
zlib.net |
10 |
static.crates.iocrates |
9 |
download.mosek.com |
6 |
files.pythonhosted.org |
5 |
pypi.org |
5 |
www.fftw.org |
4 |
raw.githubusercontent.com |
4 |
cdn.intrepidcs.net |
3 |
pkg-containers.githubusercontent.com |
2 |
ghcr.io |
2 |
apache.jfrog.io |
2 |
jfrog-prod-usw2-shared-oregon-main.s3.amazonaws.com |
2 |
www.apache.org |
1 |
ip-api.com |
1 |
api.telegram.org |
1 |
archive.apache.org |
1 |
sourceware.org |
1 |
git.openprivacy.ca |
1 |
_pgpkey-http._tcp.keyserver.ubuntu.com |
1 |
drive.google.com |
1 |
www.byond.com |
1 |
doc-08-24-docs.googleusercontent.com |
1 |
keyserver.ubuntu.com |
1 |
dlib.net |
1 |
binaries.soliditylang.org |
1 |
At least two entries stand out as suspicious:
ip-api.com
: “whats my IP?” to be included in a reconaissance report?api.telegram.org
: command and control via Telegram?
static.crates.iocrates
seems a little weird, but is quickly explained by a bug in crates.io in this incident report: crates.io Postmortem: Broken Crate Downloads.
Uncommon DNS names resolved during crates.io package installation
We can generate the direct download URL for the crate given the name and version. The content is a gzip-compressed tar archive containing the Rust source code and Cargo.toml
metadata file.
SELECT
Queries.Hostname,
T.Package.Name,
T.Package.Version,
FORMAT(
"https://crates.io/api/v1/crates/%s/%s/download",
T.Package.Name,
T.Package.Version) AS url
FROM
`ossf-malware-analysis.packages.analysis` AS T,
T.Analysis.install.DNS as DNS,
DNS.Queries AS Queries
WHERE
Package.Ecosystem = "crates.io"
AND Queries.Hostname NOT IN (
"github.com", "crates.io", "static.crates.io",
"api.github.com", "static.crates.iocrates")
ORDER BY
Queries.Hostname,
T.Package.Name,
T.Package.Version
DESC
;
Hostname | Name | Version | url |
---|---|---|---|
_pgpkey-http._tcp.keyserver.ubuntu.com |
nginx-sys | 0.2.0 | download |
apache.jfrog.io |
kuzu | 0.0.5-pre.1 | download |
apache.jfrog.io |
kuzu | 0.0.5 | download |
api.telegram.org |
xrvrv | 0.1.1 | download |
archive.apache.org |
kuzu | 0.0.5-pre.1 | download |
binaries.soliditylang.org |
svm-rs-builds | 0.2.0 | download |
cdn.intrepidcs.net |
libicsneo-sys | 0.2.0 | download |
cdn.intrepidcs.net |
libicsneo-sys | 0.1.19 | download |
cdn.intrepidcs.net |
libicsneo-sys | 0.1.18 | download |
cfhcable.dl.sourceforge.net |
libtirpc-sys | 0.2.0 | download |
chromium.googlesource.com |
wren | 0.1.12 | download |
chromium.googlesource.com |
wren-sys | 0.2.5 | download |
codeload.github.com |
cblas-src | 0.1.3 | download |
codeload.github.com |
cudd | 0.1.4 | download |
codeload.github.com |
cudd | 0.1.3 | download |
codeload.github.com |
cudd | 0.1.2 | download |
codeload.github.com |
cudd | 0.1.1 | download |
codeload.github.com |
cudd-sys | 1.0.0 | download |
codeload.github.com |
d4 | 0.3.7 | download |
codeload.github.com |
d4-bigwig | 0.3.6 | download |
codeload.github.com |
d4-hts | 0.3.9 | download |
codeload.github.com |
d4-hts | 0.3.7 | download |
codeload.github.com |
ipopt | 0.5.4 | download |
codeload.github.com |
ipopt-sys | 0.5.5 | download |
… | … | … | … |
So we can see that api.telegram.org
was resolved by xrvrv@0.1.1
, which was nicely described by Phylum here: Rust Malware Staged on Crates.io.
I triaged the remaining uncommon DNS resolutions and didn’t find anything malicious.
Commands executed during crates.io package installation
SELECT
T.Package.Name,
T.Package.Version,
ARRAY_TO_STRING(Commands.Command, " ") AS command
FROM
`ossf-malware-analysis.packages.analysis` AS T,
T.Analysis.install.Commands as Commands
WHERE
Package.Ecosystem = "crates.io"
AND TIMESTAMP_TRUNC(CreatedTimestamp, MONTH) = TIMESTAMP("2023-09-01")
ORDER BY
T.Package.Name,
T.Package.Version
ASC
LIMIT 10
;
Name | Version | command |
---|---|---|
a1_notation | 0.4.0 | rustc --crate-name a1_notation --edition=2021 /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/a1_notation-0.4.0/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 -C metadata=c1fc3b749240cbbe -C extra-filename=-c1fc3b749240cbbe --out-dir /app/target/debug/deps -L dependency=/app/target/debug/deps --extern serde=/app/target/debug/deps/libserde-8dcf5821f9268294.rmeta --cap-lints allow |
a1_notation | 0.4.0 | rustc --crate-name serde --edition=2018 /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.188/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C embed-bitcode=no -C debuginfo=2 --cfg feature=""default"" --cfg feature=""derive"" --cfg feature=""serde_derive"" --cfg feature=""std"" -C metadata=8dcf5821f9268294 -C extra-filename=-8dcf5821f9268294 --out-dir /app/target/debug/deps -L dependency=/app/target/debug/deps --extern serde_derive=/app/target/debug/deps/libserde_derive-065cdb214387bc01.so --cap-lints allow |
… | … | … |
…which is kind of noisy and difficult to scan. So, let’s focus just on the executable, not complete command line.
Prevalence of programs executed during crates.io package installation
SELECT
Commands.Command[offset(0)] AS exe,
COUNT(*) AS `count`,
FROM
`ossf-malware-analysis.packages.analysis` AS T,
T.Analysis.install.Commands as Commands
WHERE
Package.Ecosystem = "crates.io"
AND TIMESTAMP_TRUNC(CreatedTimestamp, MONTH) = TIMESTAMP("2023-09-01")
AND Commands.Command[offset(0)] NOT LIKE "/app/target/%"
GROUP BY exe
ORDER BY `count` DESC
;
exe | count |
---|---|
rustc |
144349 |
cc |
34076 |
/usr/lib/gcc/x86_64-linux-gnu/11/collect2 |
28135 |
/usr/bin/ld |
27778 |
python3 |
7666 |
sleep |
7656 |
cargo |
7647 |
as |
6464 |
rm |
6329 |
/usr/lib/gcc/x86_64-linux-gnu/11/cc1 |
6171 |
/bin/bash |
2583 |
sed |
2501 |
/bin/sh |
2191 |
/usr/bin/cmake |
1672 |
freebsd-version |
1350 |
clang |
1320 |
mv |
1126 |
grep |
1016 |
dirname |
987 |
/usr/bin/gmake |
966 |
ln |
954 |
/usr/bin/cc |
945 |
/usr/lib/llvm-14/bin/clang |
816 |
ar |
763 |
/usr/lib/gcc/x86_64-linux-gnu/11/cc1plus |
618 |
c++ |
489 |
cat |
410 |
/usr/bin/sed |
402 |
/usr/bin/mkdir |
392 |
pkg-config |
382 |
expr |
267 |
make |
226 |
/usr/bin/clang |
182 |
mkdir |
166 |
/usr/bin/c++ |
158 |
rustfmt |
146 |
/usr/bin/grep |
139 |
git |
135 |
gcc |
88 |
basename |
82 |
uname |
81 |
clang++ |
79 |
cmake |
75 |
tr |
70 |
/usr/bin/uname |
69 |
m4 |
69 |
llvm-config |
68 |
../mpn/m4-ccas |
65 |
rmdir |
57 |
install |
51 |
/usr/bin/nm |
45 |
sort |
44 |
chmod |
44 |
/usr/bin/perl |
43 |
awk |
43 |
g++ |
39 |
date |
37 |
/usr/bin/pkg-config |
32 |
mawk |
31 |
ls |
31 |
mktemp |
31 |
cmp |
30 |
/usr/bin/rustfmt |
30 |
/usr/bin/install |
30 |
sh |
29 |
ranlib |
29 |
touch |
27 |
./conftest |
24 |
cp |
22 |
diff |
22 |
/usr/bin/clang++ |
22 |
echo |
22 |
hostname |
20 |
/usr/lib/git-core/git-sh-i18n--envsubst |
18 |
CMAKE_Fortran_COMPILER-NOTFOUND |
18 |
command |
14 |
print |
14 |
cut |
13 |
/usr/bin/hostinfo |
11 |
/usr/convex/getsysinfo |
11 |
/bin/arch |
11 |
/bin/uname |
11 |
/usr/bin/arch |
11 |
/bin/universe |
11 |
/bin/machine |
11 |
/usr/bin/oslevel |
11 |
/usr/lib/git-core/git-submodule |
9 |
uniq |
9 |
true |
9 |
strip |
9 |
getconf |
9 |
objdump |
9 |
./gen-bases |
8 |
./gen-fib |
8 |
/usr/bin/ar |
8 |
nm |
8 |
/usr/bin/ranlib |
8 |
/usr/bin/dd |
8 |
tar |
7 |
id |
7 |
rustdoc |
6 |
/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/prost-build-0.8.0/third-party/protobuf/protoc-linux-x86_64 |
6 |
bash |
6 |
wc |
6 |
python |
6 |
go |
5 |
./gen-fac |
4 |
file |
4 |
./gen-jacobitab |
4 |
./configure |
4 |
./a.out |
4 |
./gen-trialdivtab |
4 |
od |
4 |
env |
4 |
./gen-sieve |
4 |
./gen-psqr |
4 |
/sbin/ldconfig.real |
4 |
ldconfig |
4 |
../../gmp-src/mpn/m4-ccas |
4 |
which |
3 |
timeout |
3 |
pg_config |
3 |
./runtests.sh |
3 |
/usr/bin/protoc |
3 |
/usr/lib/go-1.18/pkg/tool/linux_amd64/link |
3 |
./runtests-quiet.sh |
3 |
./232c93d07b74.t |
3 |
link |
2 |
../gmp-src/configure |
2 |
fgrep |
2 |
tail |
2 |
/usr/bin/go |
2 |
/usr/lib/go-1.18/pkg/tool/linux_amd64/compile |
2 |
ld |
2 |
readelf |
2 |
cmake3 |
2 |
perl |
2 |
capnp |
2 |
curl-config |
1 |
/usr/lib/go-1.18/pkg/tool/linux_amd64/asm |
1 |
/bin/nasm |
1 |
/usr/local/sbin/nasm |
1 |
/tmp/cguFXqiB/dummy |
1 |
lean |
1 |
/sbin/nasm |
1 |
/usr/bin/nasm |
1 |
/tmp/cgRZejKa/dummy |
1 |
/tmp/go-build3600745051/b001/exe/godeps |
1 |
/usr/sbin/nasm |
1 |
/tmp/cgDqPer7/dummy |
1 |
/usr/local/cargo/bin/nasm |
1 |
gzip |
1 |
/tmp/cgwqnVEe/dummy |
1 |
/tmp/go-build1160633071/b001/exe/godeps |
1 |
/usr/local/bin/nasm |
1 |
/usr/bin/file |
1 |
nasm |
1 |
I wonder why some packages use curl
during installation? Let’s see which packages those are.
crates.io packages invoking a specific program during installation
SELECT
T.Package.Name,
T.Package.Version,
ARRAY_TO_STRING(Commands.Command, " ") AS command,
Commands.Command[OFFSET(0)] AS exe,
FORMAT(
"https://crates.io/api/v1/crates/%s/%s/download",
T.Package.Name,
T.Package.Version) AS url
FROM
`ossf-malware-analysis.packages.analysis` AS T,
T.Analysis.install.Commands AS Commands
WHERE
Package.Ecosystem = "crates.io"
AND TIMESTAMP_TRUNC(CreatedTimestamp, YEAR) = TIMESTAMP("2023-01-01")
AND Commands.Command[OFFSET(0)] = "curl"
ORDER BY
T.Package.Name,
T.Package.Version
ASC
;
Name | Version | exe | command |
---|---|---|---|
caffe2op-bisect | 0.1.4-alpha.0 | curl | curl http://www.fftw.org/fftw-3.3.6-pl1.tar.gz |
caffe2op-ceil | 0.1.4-alpha.0 | curl | curl http://www.fftw.org/fftw-3.3.6-pl1.tar.gz |
caffe2op-channelbackprop | 0.1.4-alpha.0 | curl | curl http://www.fftw.org/fftw-3.3.6-pl1.tar.gz |
caffe2op-collect | 0.1.4-alpha.0 | curl | curl http://www.fftw.org/fftw-3.3.6-pl1.tar.gz |
cudd | 0.1.1 | curl | curl -L https://github.com/ivmai/cudd/archive/refs/tags/cudd-3.0.0.tar.gz -o /app/target/debug/build/cudd-sys-ad36e18db6f8070c/out/cudd-3.0.0.tar.gz |
d4-hts | 0.3.9 | curl | curl -L http://github.com/madler/zlib/archive/refs/tags/v1.2.11.tar.gz |
d4-hts | 0.3.9 | curl | curl http://sourceware.org/pub/bzip2/bzip2-1.0.8.tar.gz |
deno_url | 0.107.0 | curl | curl -L -f -s -o /app/target/debug/gn_out/obj/librusty_v8.tmp https://github.com/denoland/rusty_v8/releases/download/v0.73.0/librusty_v8_release_x86_64-unknown-linux-gnu.a |
… | … | … | … |
Continuing work
Of course, you can apply all the above queries to the other package registries, such as NPM and Rubygems. Expect to see more data in every dimension, including ongoing attacks & malware.
I’ll keep this post updated as I craft further useful queries for exploring the OSSF Package Analysis data set. I’m keen to periodically run queries that highlight “new” activity, such as DNS names that haven’t been seen before. Perhaps I can find a way to publish that feed via RSS and encourage everyone to monitor new packages.