This is just preliminary support for identifying an assortment of
different AI model files.
So far, this detects the following types:
- GGML GGUF (.gguf)
- ONNX AI (.onnx)
- TensorFlow Lite (.tflite)
Additional types to consider:
- SafeTensors (.safetensors)
- TensorFlow (.pb, .ckpt, .tfrecords)
- Keras (.keras)
- pickle (.pkl)
- numpy (.npy, .npz)
- coreml (.coreml)
- PyTorch (.pt, .pth, .bin, .mar, .pte, .pt2, .ptl)
Outside of being able to differentiate by file type, the scanner
will treat CL_TYPE_AI_MODEL the same as CL_TYPE_BINARY_DATA.
We're not adding parsers to further process these files, for now.
It may be necessary to differentiate between *.pyc and other binary
types in case additional processing is needed.
Outside of being able to differentiate the by file type, the scanner
will treat CL_TYPE_PYTHON_COMPILED the same as CL_TYPE_BINARY_DATA.
That is - we're not adding parser at this time to further break down
.pyc files.
Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.
This commit updates the ordering of the internal FTM sigs to
match what's in daily.ftm today. No FTM signature changes are
included as part of this commit (only re-ordering).
Adds debug output to the JPEG format validator to help resolve issues
with unusually formatted JPEGs and to validate that the JPEG parser is
working correctly.
Relaxes the rules around duplicate application markers or application
markers that appear later than expected, due to prior XMP metadata, etc.
Removed the requirement for an application marker to exist, as some
older JPEGs don't appear to use JFIF, Exif, or SPIFF application
extensions.
I tested against a relatively large data set of JPEGs from Mac & Windows
stock photos, personal photos, and assorted downloaded photos. FP rates
when alerting on broken media should be very low.
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
This commit improves the layout of the tmp file output and the JSON
metadata output when using the --leave-temps and --gen-json options.
For all scans, each scan target will get a unique tmp sub-directory. If
using --leave-temps, that subdir will include the basename of the
original file to make it easier to identify. Additionally, when using
--leave-temps option, all extracted objects will have their
subdirectories extracted in recursive subdirectories including filename
prefixes where available. When not using the --leave-temps option, the
layout of the tmp sub-directory will remain flat, so as to alleviate the
possibility of exceeding PATH_MAX.
The JSON metadata generated by the --gen-json option is now generated
for all file types, not just a select few. The format is also
pretty-printed for readability and now includes filenames and file paths
when available.
Also:
- Added missing ALLMATCH check when determining if bytecode hooks should
be run.
- Added cl_engine_get_str API to windows libclamav symbol export file.
EGG extraction support includes deflate, bzip2, and lzma decompression. AZO (LZO?) decompression not yet supported. Solid archives not yet supported. Split archives may have some limited success.
This commit also includes updates to autoconf iconv.m4 file enable detection of libiconv in alternative install locations.