This is just preliminary support for identifying an assortment of
different AI model files.
So far, this detects the following types:
- GGML GGUF (.gguf)
- ONNX AI (.onnx)
- TensorFlow Lite (.tflite)
Additional types to consider:
- SafeTensors (.safetensors)
- TensorFlow (.pb, .ckpt, .tfrecords)
- Keras (.keras)
- pickle (.pkl)
- numpy (.npy, .npz)
- coreml (.coreml)
- PyTorch (.pt, .pth, .bin, .mar, .pte, .pt2, .ptl)
Outside of being able to differentiate by file type, the scanner
will treat CL_TYPE_AI_MODEL the same as CL_TYPE_BINARY_DATA.
We're not adding parsers to further process these files, for now.
When the --move or --remove options are used, ClamAV carefully traverses
the file path one layer at a time so as to avoid following a directory
that is a symlink or reparse point.
We do this for directories, but could also do it for files.
Only an admin should be able to create a reparse point for a file,
but it is better to be consistent.
Thank you to Maxim Suhanov for reporting this issue.
The ClamAV inflate64 module is based on zlib 1.2.3 source code with
significant changes to support extracting zip64 and some addressing
code quality issues.
This commit adds a zlib v1.2.9 fix for possible undefined behavior:
6a043145ca
Thank you to TITAN Team for reporting this issue.
ClamD's STATS API reports process memory stats on systems that
provide the `mallinfo()` system call.
This feature is used by ClamDTOP to show process memory usage.
When we switched to the CMake build system, we neglected to add the
check for the `mallinfo()` system call and so broke ClamD memory
usage reporting.
This commit adds the CMake check for `mallinfo()` and sets
HAVE_MALLINFO, if found.
Fixes: https://github.com/Cisco-Talos/clamav/issues/706
Jira: CLAM-2742
The bounds check for the loop iterating an OLE2 block during decryption
may have an integer unerflow if the `leftover + bytesToWrite` is less
than 16. That results in a significant buffer over read and a segfault.
The fix is simply to do addition on the left side of the check instead
of subtraction on the right.
Fixes https://issues.oss-fuzz.com/issues/372544101
At install, the CMake build may fail if it detects the same library
dependency in two locations. This happened for us with the following
error:
CMake Error at libfreshclam/cmake_install.cmake:157 (file):
file Multiple conflicting paths found for libcrypto-3-x64.dll:
C:/Users/clamav_jenkins_svc.TALOS/clam_dependencies/x64/lib/libcrypto-3-x64.dll
C:/WINDOWS/system32/libcrypto-3-x64.dll
C:\WINDOWS\system32/libcrypto-3-x64.dll
Call Stack (most recent call first):
cmake_install.cmake:96 (include)
This happens when system provided DLL names match exactly with the ones
we provide. ClamAV woudld't prefer that DLL at load time, because it
looks in the EXE directory first. But it does confuse the `file()`
command used to locate build dependencies.
The fix in this commit uses a regex to exclude all libraries found under
C:\Windows
Occasionally the MD5 hash for RSA-based digital signature
verification begins with zeros. A bug in how we convert the RSA
decoded plain text from a big number back to a hex string causes it
to write the number to the far left of the plain text buffer.
If the number is smaller than a hash, then zero-padding ends up on
the right when it should've been on the left.
Additional fix: BN_bn2bin() will write zero bytes if the bignum is 0.
So there is no point "error checking" the BN_bn2bin() call.
Thanks to Tom Judge for noticing these shenanigans.
Ref: https://github.com/openssl/openssl/issues/2101
Side note: BN_num_bytes() will also return 0 if the bignum is 0,
which is fine.
Freshclam may crash if using DatabaseCustomURL for a CVD and multiple
other files. The issue occurs because of a bad index in the "do not
prune" list.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1364
If the 'hexsig' for an image fuzzy hash subsignature has invalid unicode
it may cause a crash. The problem is we fail to allocate an error
message in this instance, so when it tries to print that message it gets
a NULL dereference.
This is not a security issue.
Fixes: https://issues.oss-fuzz.com/issues/376331488
Store URLs found in HTML `<a>` and `<form>` tags during scan of HTML files
when recording scan metadata.
HTML URL recording will be ON by default, but is a part of the
generate-metadata-json feature.
The generate-metadata-json feature is OFF by default.
This introduces a new general scan option:
- libclamav: `CL_SCAN_GENERAL_STORE_HTML_URLS`.
- ClamD: `JsonStoreHTMLUrls`.
- ClamScan: `--json-store-html-urls`
Thank you Matt Jolly for the helpful comment on the pull request.
Add keys to the metadata.json file that informs the user that a scanned
ole2 file is encrypted. Information about the type of encryption is
provided when the information is available. This feature co-authored by
Micah Snyder.
There is presently no limit for the max-recursion scan option.
Selecting a max-recursion limit that is too high will cause confusing
errors. E.g.:
/home/aragusa/install.alz/bin/clamscan -d clamav.hdb . --max-recursion=9999999999
LibClamAV Error: fmap_fd: Attempted to get fd for NULL fmap
/home/aragusa/issue/clamav.hdb: Can't allocate memory ERROR
LibClamAV Error: fmap_fd: Attempted to get fd for NULL fmap
/home/aragusa/issue/test.sh: Can't allocate memory ERROR
This commit prevents setting the max-recursion limit higher than 100.
The log module used by clamd and freshclam may follow symlinks.
This is a potential security concern since the log may be owned by
the unprivileged service but may be opened by the service running as
root on startup.
For Windows, we'll define O_NOFOLLOW so the code works, though the issue
does not affect Windows.
Issue reported by Detlef.
The `find_length()` function in the PDF parser incorrectly assumes that
objects found are located in the main PDF file map, and fails to take
into account whether the objects were in fact found in extracted PDF
object streams. The resulting pointer is then invalid and may be an out
of bounds read.
This issue was found by OSS-Fuzz.
This fix checks if the object is from an object stream, and then
calculates the pointer based on the start of the object stream instead
of based on the start of the PDF.
I've also added extra checks to verify the calculated pointer and object
size are within the stream (or PDF file map). I'm not entirely sure this
is necessary, but better safe than sorry.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=69617