Store URLs found in HTML `<a>` and `<form>` tags during scan of HTML files
when recording scan metadata.
HTML URL recording will be ON by default, but is a part of the
generate-metadata-json feature.
The generate-metadata-json feature is OFF by default.
This introduces a new general scan option:
- libclamav: `CL_SCAN_GENERAL_STORE_HTML_URLS`.
- ClamD: `JsonStoreHTMLUrls`.
- ClamScan: `--json-store-html-urls`
Thank you Matt Jolly for the helpful comment on the pull request.
Add keys to the metadata.json file that informs the user that a scanned
ole2 file is encrypted. Information about the type of encryption is
provided when the information is available. This feature co-authored by
Micah Snyder.
There is presently no limit for the max-recursion scan option.
Selecting a max-recursion limit that is too high will cause confusing
errors. E.g.:
/home/aragusa/install.alz/bin/clamscan -d clamav.hdb . --max-recursion=9999999999
LibClamAV Error: fmap_fd: Attempted to get fd for NULL fmap
/home/aragusa/issue/clamav.hdb: Can't allocate memory ERROR
LibClamAV Error: fmap_fd: Attempted to get fd for NULL fmap
/home/aragusa/issue/test.sh: Can't allocate memory ERROR
This commit prevents setting the max-recursion limit higher than 100.
The `find_length()` function in the PDF parser incorrectly assumes that
objects found are located in the main PDF file map, and fails to take
into account whether the objects were in fact found in extracted PDF
object streams. The resulting pointer is then invalid and may be an out
of bounds read.
This issue was found by OSS-Fuzz.
This fix checks if the object is from an object stream, and then
calculates the pointer based on the start of the object stream instead
of based on the start of the PDF.
I've also added extra checks to verify the calculated pointer and object
size are within the stream (or PDF file map). I'm not entirely sure this
is necessary, but better safe than sorry.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=69617
Clamscan and ClamD will throw an error if you use the
'--fail-if-cvd-older-than=DAYS' / 'FailIfCvdOlderThan' option and
try to load any plaintext signature files.
That is, it throws an error when encountering plain signature files like
`.ign2`, `.ldb`, `.hdb`, etc.
This feature should only verify CVD / CLD files.
The feature (and bug) was introduced in ClamAV 1.1.0, here:
e4fe6654c1
With this change, the `cl_cvdgetage` checks will skip any file that is
not a CVD or CLD.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1174
The clamscan test "assorted_test.py::TC::test_pe_cert_trust" is about to
fail because the "test.exe" test file was signed with a cert set to
expire after only 2 years, and it has been 23 months.
While attempting to generate a new one that will last 73000 days (200
years), I discovered that any signing certificate set to expire after
2038 will fail the trust-check because the `ca.not_after` variable is
maxed out `time_t` incapable of expressing a higher number.
To fix this, I've upgraded the variables to `uint64_t`.
I also had to replace a bunch of generated signatures to match the new
"test.exe".
Finally, I noticed that "ca.not_before" was being set to the token[8]
instead of token[9], which presumably mean the "NotBefore" field for
Trusted and Revoked Certificates was non-functional, as it was treating
the "CertSign" boolean as the "NotBefore" value.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1300
fmap_need_off_once() may return an unaligned pointer. This in return
leads to an unaligned access during the load of the uint32_t variables
loading to failures on architectures not supporting unaligned access.
This was reported to the Debian BTS as #1073128.
[bigeasy: Commit message, reworked the patch a bit].
Link: https://bugs.debian.org/1073128
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
If SCAN_COLLECT_METADATA is enabled, and caching is disabled, we zero-out
the hash after recording it.
This results in a non-NULL and invalid-hash that may be passed to
`cli_scan_fmap()` for the "raw mode" scan.
It's an uncommon code path, but would result in comparing hash-sigs with
a zeroed hash rather than the valid hash.
This bug could result in a missed hash-based sig matches.
There is no reason to invalidate or zero-out the hash if we happen to
calculate it. We avoid the cache-lookup by checking the engine setting,
not by checking if we have a hash.
ClamAV initalization's rarload() function tries to load
libclamunrar_iface from the install path before checking under
LD_LIBRARY_PATH.
This means the unit tests will use the wrong unrar library if testing on
a system where ClamAV is already installed.
In the event there is an ABI break between versions, this will cause a
bunch of tests to fail.
This commit fixes the issue by checking for libclamunrar_iface under
LD_LIBRARY_PATH *first* before checking in the install lib directory.
Note in the previous version we were also checking LD_LIBRARY_PATH on
Windows, which is not a thing. I removed this.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1249
Also removed check for WARN_DLOPEN_FAIL define, which was not used, and
mistakenly set for the unrar library build target.
The C-Rust FFI code is needlessly complex. Now that we are calling into
magic_scan from Rust, we can simply hand off the <style> block contents
to Rust code to handle extraction and scanning.
Immediately store pointers as new pointer type rather than using
intermediate uint8_t pointer.
Also "unneed" some of the "needed" pointers as soon as we're able to
release them rather than holding on until the end of the UDF image.
Add assorted debug messages and code comments.
Make FileSetDescriptor optional as minor step towards supporting
ExtendedFileEntries.
Minor variable name changes for readability.
Use tag_identifier enum for variable type rather than uint16_t and
add "INVALID_DESCRIPTOR" (0) to enum and use it in the switch. This way
we're not comparing enums with ints.
Move GenericVolumeStructureDescriptor to udf.h.
As of ClamAV 0.105, libjson-c is required.
There is also no option to disable libjson-c support.
This commit removes the dead code associated with the old build
option.
As of ClamAV 0.105, libz is required.
There is also no option to disable zlib support.
This commit removes the dead code associated with the old build
option.
As of ClamAV 0.105, libbz2 is required.
There is also no option to disable bz2 support.
This commit removes the dead code associated with the old build
option.
As of ClamAV 0.105, libxml2 is required.
There is also no option to disable PCRE support.
This commit removes the dead code associated with the old build
option.
As of ClamAV 0.105, PCRE2 is required. PCRE (1) is not an option, and
there is also no option to disable PCRE support.
This commit removes the dead code associated with those old build
options.
The in_iconv_u16() function resolves "alignment" issues where the length
of the input string is not mod(4). The solution trims the extra bytes
off the input string. If the input string is total less than 4 bytes,
then those extra bytes are put in a 4-byte array and are converted.
However, if the input string is longer, then those extra bytes are lost.
This fix saves the extra "unaligned" bytes in the 4-byte array and
converts them afterwards so we don't accidentally lose 1 to 2
characters.
The delharc crate used to add LZH archive support appears to add
a dependency on macOS CoreFoundation library.
The error is:
[ 78%] Linking C shared library libclamav.dylib
Undefined symbols for architecture x86_64:
"_CFRelease", referenced from:
iana_time_zone::platform::get_timezone_inner::hc7da204717a39974 in libclamav_rust.a(iana_time_zone-bc4762a47da73d72.iana_time_zone.1863eb20d202562a-cgu.0.rcgu.o)
...
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [libclamav/libclamav.12.0.2.dylib] Error 1
We already link with CoreFoundation for libfreshclam and clamsubmit, so
this commit extends that to libclamav as well.
Primarily this commit fixes an issue with the size of the parameters
passed to cli_checklimits(). The parameters were "unsigned long", which
varies in size depending on platform.
I've switched them to uint64_t / u64.
While working on this, I observed some concerning warnigns on Windows,
and some less serious ones, primarily regarding inconsistencies with
`const` parameters.
Finally, in `scanmem.c`, there is a warning regarding use of `wchar_t *`
with `GetModuleFileNameEx()` instead of `GetModuleFileNameExW()`.
This made me realize this code assumes we're not defining `UNICODE`,
which would have such macros use the 'A' variant.
I have fixed it the best I can, although I'm still a little
uncomfortable with some of this code that uses `char` or `wchar_t`
instead of TCHAR.
I also remove the `if (GetModuleFileNameEx) {` conditional, because this
macro/function will always be defined. The original code was checking a
function pointer, and so this was a bug when integrating into ClamAV.
Regarding the changes to `rijndael.c`, I found that this module assumes
`unsigned long` == 32bits. It does not.
I have corrected it to use `uint32_t`.
We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.
Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.
Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
Variables like the number of signature parts are considered trusted user input
and so allocations based on those values need not check the memory allocation
limit.
Specifically for the allocation of the normalized buffer in cli_scanscript,
I determined that the size of SCANBUFF is fixed and so safe, and the maxpatlen
comes from the signature load and is therefore also trusted, so we do not
need to check the allocation limit.
Allocations for bytecode signatures to work need not check against the
memory allocation limit, as bytecode signatures are considered trusted
user input.
You may note that I did not remove allocation limits from the bytecode
API functions that may be called by the signatures such as adding json
objects, hashsets, lzma and bz2 decompressors, etc. This is because it
is likely that a bytecode signature may call them more times based on
the structure of the file being scanned - particularly for the json objects.
Bytecode signature's are able to allocate buffers, but should probably
adhere to clamav's max allocation limit. This adds a check to make sure
they don't accidentally alloc too much based on untrusted user input.
The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.
There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.
Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.
I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.
This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC
I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.
I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.
There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.