Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.
* Add new clamd and clamscan option --cache-size
This option allows you to set the number of entries the cache can store.
Additionally, introduce CacheSize as a clamd.conf
synonym for --cache-size.
Fixes#867
Add a new cl_engine_set_clcb_vba() function to set a cb_vba callback
function and add clcb_generic_data handler prototype to the clamav.h
public API.
The cb_vba callback function will be run whenever VBA is extracted from
office documents. The provided data will be a normalized copy of the
original VBA. This callback is added to support Sigtool so it can use
the same VBA extraction logic as when scanning documents.
Change the Sigtool temp directory creation for any commands that use
temp directories so that you can select a custom temp directory with the
`--tempdir=PATH` option, and can retain the temp files with the
`--leave-temps` option.
Added `--tempdir` and `--leave-temps` to the Sigtool `--help` output.
Added `--tempdir` and `--leave-temps` to the Sigtool manpage.
* Add a new function cl_cvdgetage() to the libclamav API.
This function will retrieve the age of the youngest file in a
database directory, or the age of a single CVD (or CLD) file.
* Add new clamscan option --fail-if-cvd-older-than=days
When passed, causes clamscan to exit with a non-zero return code
if the virus database is older than the specified number of days.
* Add new clamd option --fail-if-cvd-older-than=days
When passed, causes clamd to exit on start-up with a non-zero
return code if the virus database is older than the specified
number of days.
Additionally, we introduce FailIfCvdOlderThan as a clamd.conf
synonym for --fail-if-cvd-older-than.
Fixes#820
The current implementation sets a "next layer attributes" flag field
in the scan context. This may introduce bugs if accidentally not cleared
during error handling, causing that attribute to be applied to a
different layer than intended.
This commit resolves that by adding an attribute flag to the major
internal scan functions and removing the "next layer attributes" from
the scan context. This attributes flag shares the same flag fields as
the attributes flag in the new file inspection callback and the flags
are defined in `clamav.h`.
libclamav callbacks can be used to access embedded file content at each
layer of extraction during the course of a scan. The existing callbacks
only provide access to the file descriptor and a guess at the file type.
This patch adds a new callback for the purposes of file/archive
inspection that provides additional insight into the embedded file.
This includes:
- ancestors: an array of parent file names
- parent file size: the size of the direct parent layer
- file name: current layer's filename, if any.
- file buffer (pointer)
- file size: size of file buffer
- file type: just a guess at the current file's type
- file descriptor: may be -1 if the layer is in-memory only.
- layer attributes: a flag field. see LAYER_ATTRIBUTE_* defines in clamav.h
Two new example apps are added that are automatically built when
compiling under CMake:
- ex2 demonstrates the prescan callback.
- ex3 demonstrates the new file inspection callback.
The examples are now installed if enabled, so you can test them in the
Docker image, and so that they'll be colocated with the DLLs so you can
test them on Windows. The installed examples should also be able to find
the UnRAR library at run time, without having to set LD_LIBRARY_PATH.
This commit also sets the fmap->name in an fmap-scan using the basname
of the provided filename if the caller provided the filename and the
provided fmap does not have the name set.
Add `cl_cvdunpack()` function to the public API.
This new API has an option to disable verification, but otherwise it
will attempt to verify that the CVD is correctly signed.
The fmap module provides a mechanism for creating a mapping into an
existing map at an offset and length that's used when a file is found
with an uncompressed archive or when embedded files are found with
embedded file type recognition in scanraw(). This is the
"fmap_duplicate()" function. Duplicate fmaps just reference the original
fmap's 'data' or file handle/descriptor while allowing the caller to
treat it like a new map using offsets and lengths that don't account for
the original/actual file dimensions.
fmap's keep track of this with m->nested_offset & m->real_len, which
admittedly have confusing names. I found incorrect uses of these in a
handful of locations. Notably:
- In cli_magic_scan_nested_fmap_type().
The force-to-disk feature would have been checking incorrect sizes and
may have written incorrect offsets for duplicate fmaps.
- In XDP parser.
- A bunch of places from the previous commit when making dupe maps.
This commit fixes those and adds lots of documentation to the fmap.h API
to try to prevent confusion in the future.
nested_offset should never be referenced outside of fmap.c/h.
The fmap_* functions for accessing or reading map data have two
implementations, mem_* or handle_*, depending the data source.
I found issues with some of these so I made a unit test that covers each
of the functions I'm concerned about for both types of data sources and
for both original fmaps and nested/duplicate fmaps.
With the tests, I found and fixed issues in these fmap functions:
- handle_need_offstr(): must account for the nested_offset in dupe maps.
- handle_gets(): must account for nested_offset and use len & real_len
correctly.
- mem_need_offstr(): must account for nested_offset in dupe maps.
- mem_gets(): must account for nested_offset and use len & real_len
correctly.
Moved CDBRANGE() macro out of function definition so for better
legibility.
Fixed a few warnings.
Fixup input output params to be anotated with [in,out], not [in/out].
Note: skipped some other incorrectly annodated [out] params that are
already staged to be fixed in a different PR.
Add progress callbacks to libclamav for:
- database load
- engine compile
- engine free
Add a progress bar to clamscan for load & compile.
These are disabled if you run with --debug or stdout is not a TTY or you
are using one of --quiet, --infected, or --no-summary.
Added code so you can test the engine-free callback by building with
ENABLE_ENGINE_FREE_PROGRESSBAR defined.
The compile & free progress callbacks pre-calculate the number of
tasks to complete to estimate the progress. Some tasks may take longer
than others so the progress speed my appear to vary a little.
The callbacks return type is a cl_error_t but doesn't currently do
anything. It is reserved for future use.
Minor formatting change in matcher-ac.c to counteract weird
clang-format behavior, and to make it easier to read.
Added progress callbacks and clamscan progress bars to the news.
Improvements to use modern block list and allow list verbiage.
blacklist -> block list
whitelist -> allow listed
blacklisted -> blocked
whitelisted -> allowed
In the case of certificate verification, use "trust" or "verify" when
something is allowed.
Also changed domainlist -> domain list (or DomainList) to match.
Added a new scan option to alert on broken media (graphics) file
formats. This feature mitigates the risk of malformed media files
intended to exploit vulnerabilities in other software. At present
media validation exists for JPEG, TIFF, PNG, and GIF files.
To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or
use the `--alert-broken-media` option when using `clamscan`.
These options are disabled by default for now.
Application developers may enable this scan option by enabling
`CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit
field.
Fixed PNG parser logic bugs that caused an excess of parsing errors
and fixed a stack exhaustion issue affecting some systems when
scanning PNG files. PNG file type detection was disabled via
signature database update for 0.103.0 to mitigate effects from these
bugs.
Fixed an issue where PNG and GIF files no longer work with Target:5
(graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as
CL_TYPE_GRAPHICS. Target types now support up to 10 possible file
types to make way for additional graphics types in future releases.
Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse"
errors when file format validation fails. Instead, the scan will alert
with the "Heuristics.Broken.Media" signature prefix and a descriptive
suffix to indicate the issue, provided that the "alert broken media"
feature is enabled.
GIF format validation will no longer fail if the GIF image is missing
the trailer byte, as this appears to be a relatively common issue in
otherwise functional GIF files.
Added a TIFF dynamic configuration (DCONF) option, which was missing.
This will allow us to disable TIFF format validation via signature
database update in the event that it proves to be problematic.
This feature already exists for many other file types.
Added CL_TYPE_JPEG and CL_TYPE_TIFF types.
Add notices to man pages and help strings cautioning against running
bytecode signatures from untrusted sources.
Also adds missing BytecodeUnsigned option to clamd.conf.sample files.
Some detections, like phishing, are considered heuristic alerts because
they match based on behavior more than on content. A subset of these
are considered "potentially unwanted" (low-severity). These
low-severity alerts include:
- phishing
- PDFs with obfuscated object names
- bytecode signature alerts that start with "BC.Heuristics"
The concept is that unless you enable "heuristic precedence" (a method
of lowing the threshold to immediateley alert on low-severity
detections), the scan should continue after a match in case a higher
severity match is found. Only at the end will it print the low-severity
match if nothing else was found.
The current implementation is buggy though. Scanning of archives does
not correctly bail out for the entire archive if one email contains a
phishing link. Instead, it sets the "heuristic found" flag then and
alerts for every subsequent file in the archive because it doesn't know
if the heuristic was found in an embedded file or the target file.
Because it's just a heuristic and the status is "clean", it keeps
scanning.
This patch corrects the behavior by checking if a low-severity alerts
were found at the end of scanning the target file, instead of at the end
of each embedded file.
Additionally, this patch fixes an in issue with phishing alerts wherein
heuristic precedence mode did not cause a scan to stop after the first
alert.
The above changes required restructuring to create an fmap inside of
cl_scandesc_callback() so that scan_common() could be modified to
require an fmap and set up so that the current *ctx->fmap pointer is
never NULL when scan_common() evaluates match results.
Also fixed a couple minor bugs in the phishing unit tests and cleaned up
the test code for improved legitibility and type safety.
A way is needed to record scanned file names for two purposes:
1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.
2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.
This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.
To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required. The zip and gpt parsers required some modification to record
file names. The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.
Also:
- Added recursive extraction to the tmp directory when the
--leave-temps option is enabled. When not enabled, the tmp directory
structure remains flat so as to prevent the likelihood of exceeding
MAX_PATH. The current tmp directory is stored in the scan context.
- Made the cli_scanfile() internal API non-static and added it to
scanners.h so it would be accessible outside of scanners.c in order to
remove code duplication within libmspack.c.
- Added function comments to scanners.h and matcher.h
- Converted a TDB-type macros and LSIG-type macros to enums for improved
type safey.
- Converted more return status variables from `int` to `cl_error_t` for
improved type safety, and corrected ooxml file typing functions so
they use `cli_file_t` exclusively rather than mixing types with
`cl_error_t`.
- Restructured the magic_scandesc() function to use goto's for error
handling and removed the early_ret_from_magicscan() macro and
magic_scandesc_cleanup() function. This makes the code easier to
read and made it easier to add the recursive tmp directory cleanup to
magic_scandesc().
- Corrected zip, egg, rar filename extraction issues.
- Removed use of extra sub-directory layer for zip, egg, and rar file
extraction. For Zip, this also involved changing the extracted
filenames to be randomly generated rather than using the "zip.###"
file name scheme.
Add Data-Loss-Prevention option to detect credit cards only, excluding
debit and private label cards where possible.
You can select the credit card-only DLP mode for clamscan with the
`--structured-cc-mode` command-line option.
You can select the credit card-only DLP mode for clamd with the
`StructuredCCOnly` clamd.conf config option.
This patch also adds credit card matching for additional vendors:
- Mastercard 2016
- China Union Pay
- Discover 2009
Signature alerts on content extracted into a new fmap such as normalized
HTML resulted in checking FP signatures against the fmap's hash value
that was initialized to all zeroes, and never computed.
This patch seeks will enable FP signatures of normalized HTML files or
other content that is extracted to a new fmap to work. This patch
doesn't resolve the issue that normal people will write FP signatures
targeting the original file, not the normalized file and thus won't
really see benefit from this bug-fix.
Additional work is needed to traverse the fmap recursion lists and
FP-check all parent fmaps when an alert occurs. In addition, the HTML
normalization method of temporarily overriding the ctx->fmap instead of
increasing the recursion depth and doing ctx->fmap++/-- will need to be
corrected for fmap reverse recursion traversal to work.
Removing problematic call to convert file descriptors to filepaths.
Added filename and tempfile names to scandesc calls in clamd.
Added a general scan option to treat the scan engine as unprivileged,
meaning that the scan engine will not have read access to the file.
Added check to drop a temp file for RAR's where the we don't have
read access to the filepath provided (i.e. unprivileged is set, or
access() check fails).
Instead of checking the Authenticode header as an FP prevention
mechanism, we now check it in the beginning if it exists. Also,
we can now do actual blacklisting with .crb rules (previously, a
blacklist rule just let you override a whitelist rule).
This doesn't add support to actually verify whitelisting rules
against SHA384 signatures, but makes it so that verification
doesn't fail completely if there is a SHA384 certificate somewhere
in the signature.
Updated libclamav documentation detailing new scan options structure.
Renamed references to 'algorithmic' detection to 'heuristic' detection. Renaming references to 'properties' to 'collect metadata'.
Renamed references to 'scan all' to 'scan all match'.
Renamed a couple of 'Hueristic.*' signature names as 'Heuristics.*' signatures (plural) to match majority of other heuristics.