Primarily this commit fixes an issue with the size of the parameters
passed to cli_checklimits(). The parameters were "unsigned long", which
varies in size depending on platform.
I've switched them to uint64_t / u64.
While working on this, I observed some concerning warnigns on Windows,
and some less serious ones, primarily regarding inconsistencies with
`const` parameters.
Finally, in `scanmem.c`, there is a warning regarding use of `wchar_t *`
with `GetModuleFileNameEx()` instead of `GetModuleFileNameExW()`.
This made me realize this code assumes we're not defining `UNICODE`,
which would have such macros use the 'A' variant.
I have fixed it the best I can, although I'm still a little
uncomfortable with some of this code that uses `char` or `wchar_t`
instead of TCHAR.
I also remove the `if (GetModuleFileNameEx) {` conditional, because this
macro/function will always be defined. The original code was checking a
function pointer, and so this was a bug when integrating into ClamAV.
Regarding the changes to `rijndael.c`, I found that this module assumes
`unsigned long` == 32bits. It does not.
I have corrected it to use `uint32_t`.
We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.
Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.
Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
Variables like the number of signature parts are considered trusted user input
and so allocations based on those values need not check the memory allocation
limit.
Specifically for the allocation of the normalized buffer in cli_scanscript,
I determined that the size of SCANBUFF is fixed and so safe, and the maxpatlen
comes from the signature load and is therefore also trusted, so we do not
need to check the allocation limit.
Allocations for bytecode signatures to work need not check against the
memory allocation limit, as bytecode signatures are considered trusted
user input.
You may note that I did not remove allocation limits from the bytecode
API functions that may be called by the signatures such as adding json
objects, hashsets, lzma and bz2 decompressors, etc. This is because it
is likely that a bytecode signature may call them more times based on
the structure of the file being scanned - particularly for the json objects.
Bytecode signature's are able to allocate buffers, but should probably
adhere to clamav's max allocation limit. This adds a check to make sure
they don't accidentally alloc too much based on untrusted user input.
The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.
There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.
Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.
I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.
This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC
I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.
I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.
There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
Image fuzzy hashing is enabled by default. The following options have
been added to allow users to disable it, if desired.
New clamscan options:
--scan-image[=yes(*)/no]
--scan-image-fuzzy-hash[=yes(*)/no]
New clamd config options:
ScanImage yes(*)/no
ScanImageFuzzyHash yes(*)/no
New libclamav scan options:
options.parse &= ~CL_SCAN_PARSE_IMAGE;
options.parse &= ~CL_SCAN_PARSE_IMAGE_FUZZY_HASH;
This commit also changes scan behavior to disable image fuzzy hashing
for specific types when the DCONF (.cfg) signatures disable those types.
That is, if DCONF disables the PNG parser, it should not only disable
the CVE/format checker for PNG files, but also disable image fuzzy
hashing for PNG files.
Also adds a DCONF option to disable image fuzzy hashing:
OTHER_CONF_IMAGE_FUZZY_HASH
DCONF allows scanning features to be disabled using a configuration
"signature".
Store temp files with obj id and gen id so analysts know which is which.
Don't dump decoded objects immediately. They'll get dumped later at the
end of pdf_extract_obj().
At the end of PDF object extraction, we don't need to find out the
"dumpid" (aka the object index in our list of pdf objects).
It isn't actually used! So I removed the unused parameter.
It may be necessary to differentiate between *.pyc and other binary
types in case additional processing is needed.
Outside of being able to differentiate the by file type, the scanner
will treat CL_TYPE_PYTHON_COMPILED the same as CL_TYPE_BINARY_DATA.
That is - we're not adding parser at this time to further break down
.pyc files.
Fix Coverity issues 192935, 192932, 192928, and 192917.
None of these are particularly serious. I thought I'd clean them up
while trying to track down a strange crash that occurs in Windows debug
builds with my specific setup when freeing the metadata filename pointer
malloced by the UnRAR iface "peek" function.
I wasn't able to figure out why freeing that causes a crash, so instead
I converted it to an array that need not be freed, and my troubles
melted away.
The fmap structure has some stuff that differs in size in memory between
Linux and Windows, and between 32bit and 64bit architectures.
Notably, `time_t` appears to be defined by the Rust bindgen module as
`ulong` which may be either 8 bytes or 4 bytes, depending architecture
(thanks, C). To resolve this, we'll store time as a uint64_t instead.
The other problem in the fmap structure is the windows file and map
handles should always be exist, and may only be used in Windows, for
consistency in sizing of the structure.
Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.
Some PDF's with an empty password can't be decrypted. Investigation
found that the problem is a strlen check to prevent an overflow rather
than passing down the actual length of the allocated field.
Specifically, the UE buffer may have NULL values in it, so a strlen
check will claim the field is shorter than it is and then later checks
fail because the length is the wrong size.
While at it, I improved code comments on the function reading dictionary
key-value strings and switched a flag use a bool rather than an int.
The --alert-exceeds-max feature should alert for all files larger than
2GB because 2GB is the internal limit for individual files.
This isn't working correctly because the `goto done;` exit condition
after recording the exceeds-max heuristic skips over the logic that
reports the alert.
This fix moves the ">2GB" check up to the location where the
max-filesize engine option is set by clamd or clamscan.
If max-filesize > 2GB - 1 is requested, then max-filesize is set to
2GB - 1.
Additionally, a warning is printed if max-filesize > 2GB is requested
(with an exception for when it's maxed out by setting --max-filesize=0).
Resolves: https://github.com/Cisco-Talos/clamav/issues/1030
Developers of FreeBSD base system are currently working to upgrade its
LLVM/Clang/LLDB/LLD to 17. As a part of it they tried building all
ports in FreeBSD ports collections to check if build of them succeeds
with LLVM/Clang/LLD 17. As a result there are some ports that fail to
be built with it and unfortunately `security/clamav` is one of
them. The build of it fails with link error as following.
```
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_cvdunpack' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_dbgmsg_internal' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'init_domainlist' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'init_whitelist' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_parse_add' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_bytecode_context_clear' failed: symbol not defined
cc: error: linker command failed with exit code 1 (use -v to see invocation)
```
According to the investigation of ClamAV's source code,
`cli_cvdunpack` is a static function so it isn't visible to external
consumers. And other mentioned symbols aren't found anywhere. So fix
link error by removing all of them from linker version script.
Having the filename is useful for certain callbacks, and will likely be
more useful in the future if we can start comparing detected filetypes
with file extensions.
E.g. if filetype is just "binary" or "text" we may be able to do better
by trusting a ".js" extension to determine the type.
Or else if detected file type is "pe" but the extension is ".png" we may
want to say it's suspicious.
Also adjusted the example callback program to disable metadata option.
The CL_SCAN_GENERAL_COLLECT_METADATA is no longer required for the Zip
parser to record filenames for embedded files, and described in the
previous commit.
This program can be used to demonstrate that it is behaving as desired.
Previously clamd would report "Cannot allocate memory" when a file
exceeded max file size. This commit corrects it to report
"Heuristics.Limits.Exceeded.MaxFileSize."
Fixes: https://github.com/Cisco-Talos/clamav/issues/670.
When decompressing a zlib stream, it's possible to reach end of stream
before running out of available bytes. In the DMG parser, this may cause
an infinite loop.
This commit adds a check for the condition where stream has ended before
running out of input.
Fixes: https://github.com/Cisco-Talos/clamav/issues/925
In aspack decrypt function, there's a check to make sure that backbytes
doesn't exceed 57, because it is used as an index in init_array.
However, it is mathematically impossible.
So this commit removes the check.
`cli_getpagesize()` may return -1 in an error condition.
If it does, let's just treat it as 4096.
I believe the actual coverity complaint is a false positive, but it's
fair to account for the error case and this should shut it up.
The bytes_remaining variable may be set negative by mistake, when really
we just want to decrement it.
This issue may result in a 1-byte over read but does not cause any
crash.
We determined that this issue is not a vulnerability.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58475
In `cli_scanudf()` it's possible to `goto done;` before memset'inng the
fileIdentifierList and fileEntryList. This would likely cause
uninitialized pointer reads.
Instead of memset, just initialize the structs with `= {0};`
These symbols are used by an internal python tool for generating
signatures:
- fuzzy_hash_calculate_image
- ffierror_fmt
`ffierror_fmt` is required to free the error structure passed back in
case of an error.
Since version 1.1.0 started using libclamav.map again, we need to
explicitly export these symbols.