The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.
There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.
Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.
I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.
This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC
I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.
I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.
There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
Image fuzzy hashing is enabled by default. The following options have
been added to allow users to disable it, if desired.
New clamscan options:
--scan-image[=yes(*)/no]
--scan-image-fuzzy-hash[=yes(*)/no]
New clamd config options:
ScanImage yes(*)/no
ScanImageFuzzyHash yes(*)/no
New libclamav scan options:
options.parse &= ~CL_SCAN_PARSE_IMAGE;
options.parse &= ~CL_SCAN_PARSE_IMAGE_FUZZY_HASH;
This commit also changes scan behavior to disable image fuzzy hashing
for specific types when the DCONF (.cfg) signatures disable those types.
That is, if DCONF disables the PNG parser, it should not only disable
the CVE/format checker for PNG files, but also disable image fuzzy
hashing for PNG files.
Also adds a DCONF option to disable image fuzzy hashing:
OTHER_CONF_IMAGE_FUZZY_HASH
DCONF allows scanning features to be disabled using a configuration
"signature".
Store temp files with obj id and gen id so analysts know which is which.
Don't dump decoded objects immediately. They'll get dumped later at the
end of pdf_extract_obj().
At the end of PDF object extraction, we don't need to find out the
"dumpid" (aka the object index in our list of pdf objects).
It isn't actually used! So I removed the unused parameter.
It may be necessary to differentiate between *.pyc and other binary
types in case additional processing is needed.
Outside of being able to differentiate the by file type, the scanner
will treat CL_TYPE_PYTHON_COMPILED the same as CL_TYPE_BINARY_DATA.
That is - we're not adding parser at this time to further break down
.pyc files.
Fix Coverity issues 192935, 192932, 192928, and 192917.
None of these are particularly serious. I thought I'd clean them up
while trying to track down a strange crash that occurs in Windows debug
builds with my specific setup when freeing the metadata filename pointer
malloced by the UnRAR iface "peek" function.
I wasn't able to figure out why freeing that causes a crash, so instead
I converted it to an array that need not be freed, and my troubles
melted away.
The fmap structure has some stuff that differs in size in memory between
Linux and Windows, and between 32bit and 64bit architectures.
Notably, `time_t` appears to be defined by the Rust bindgen module as
`ulong` which may be either 8 bytes or 4 bytes, depending architecture
(thanks, C). To resolve this, we'll store time as a uint64_t instead.
The other problem in the fmap structure is the windows file and map
handles should always be exist, and may only be used in Windows, for
consistency in sizing of the structure.
Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.
Some PDF's with an empty password can't be decrypted. Investigation
found that the problem is a strlen check to prevent an overflow rather
than passing down the actual length of the allocated field.
Specifically, the UE buffer may have NULL values in it, so a strlen
check will claim the field is shorter than it is and then later checks
fail because the length is the wrong size.
While at it, I improved code comments on the function reading dictionary
key-value strings and switched a flag use a bool rather than an int.
The --alert-exceeds-max feature should alert for all files larger than
2GB because 2GB is the internal limit for individual files.
This isn't working correctly because the `goto done;` exit condition
after recording the exceeds-max heuristic skips over the logic that
reports the alert.
This fix moves the ">2GB" check up to the location where the
max-filesize engine option is set by clamd or clamscan.
If max-filesize > 2GB - 1 is requested, then max-filesize is set to
2GB - 1.
Additionally, a warning is printed if max-filesize > 2GB is requested
(with an exception for when it's maxed out by setting --max-filesize=0).
Resolves: https://github.com/Cisco-Talos/clamav/issues/1030
Developers of FreeBSD base system are currently working to upgrade its
LLVM/Clang/LLDB/LLD to 17. As a part of it they tried building all
ports in FreeBSD ports collections to check if build of them succeeds
with LLVM/Clang/LLD 17. As a result there are some ports that fail to
be built with it and unfortunately `security/clamav` is one of
them. The build of it fails with link error as following.
```
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_cvdunpack' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_dbgmsg_internal' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'init_domainlist' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'init_whitelist' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_parse_add' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_bytecode_context_clear' failed: symbol not defined
cc: error: linker command failed with exit code 1 (use -v to see invocation)
```
According to the investigation of ClamAV's source code,
`cli_cvdunpack` is a static function so it isn't visible to external
consumers. And other mentioned symbols aren't found anywhere. So fix
link error by removing all of them from linker version script.
Having the filename is useful for certain callbacks, and will likely be
more useful in the future if we can start comparing detected filetypes
with file extensions.
E.g. if filetype is just "binary" or "text" we may be able to do better
by trusting a ".js" extension to determine the type.
Or else if detected file type is "pe" but the extension is ".png" we may
want to say it's suspicious.
Also adjusted the example callback program to disable metadata option.
The CL_SCAN_GENERAL_COLLECT_METADATA is no longer required for the Zip
parser to record filenames for embedded files, and described in the
previous commit.
This program can be used to demonstrate that it is behaving as desired.
Previously clamd would report "Cannot allocate memory" when a file
exceeded max file size. This commit corrects it to report
"Heuristics.Limits.Exceeded.MaxFileSize."
Fixes: https://github.com/Cisco-Talos/clamav/issues/670.
When decompressing a zlib stream, it's possible to reach end of stream
before running out of available bytes. In the DMG parser, this may cause
an infinite loop.
This commit adds a check for the condition where stream has ended before
running out of input.
Fixes: https://github.com/Cisco-Talos/clamav/issues/925
In aspack decrypt function, there's a check to make sure that backbytes
doesn't exceed 57, because it is used as an index in init_array.
However, it is mathematically impossible.
So this commit removes the check.
`cli_getpagesize()` may return -1 in an error condition.
If it does, let's just treat it as 4096.
I believe the actual coverity complaint is a false positive, but it's
fair to account for the error case and this should shut it up.
The bytes_remaining variable may be set negative by mistake, when really
we just want to decrement it.
This issue may result in a 1-byte over read but does not cause any
crash.
We determined that this issue is not a vulnerability.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58475
In `cli_scanudf()` it's possible to `goto done;` before memset'inng the
fileIdentifierList and fileEntryList. This would likely cause
uninitialized pointer reads.
Instead of memset, just initialize the structs with `= {0};`
These symbols are used by an internal python tool for generating
signatures:
- fuzzy_hash_calculate_image
- ffierror_fmt
`ffierror_fmt` is required to free the error structure passed back in
case of an error.
Since version 1.1.0 started using libclamav.map again, we need to
explicitly export these symbols.
The checks for the encryption info cspName and encryption verifier don't
have the size of the overall file available for the check and may
overflow.
This commit passes in the size of the file to the
initialize_encryption_key() function and does all size checks within
that function instead of doing the overall size check before that
function.
Resolves: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60563
The prior fix for the wwunpack overread in commit
89cd0df3d7 was a little too late, but
also removed an earlier, smaller guard for a write.
This commit just moves the larger guard a little earlier to protect
against both.
Resolves: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60655
If a signature has a pattern that is too short will fail to load the
signature but does not cause the entire load process to abort.
This is bad for two reasons:
1) It is not immediately apparent that the signature is bad, and so it
could be published accidentally.
2) The signature is partially loaded by the time the bad pattern is
observed and that may cause a crash later.
Because of (1), it is not worth it to try to unload the first part of the
signature. Instead, we should just abort the signature load.
Fixes: https://github.com/Cisco-Talos/clamav/issues/923
We should also abort loading if the filter pattern for the boyer-moore
matcher is shorter than 2 bytes.
Also, do not print the final "Loading" progress bar if an error occurred.
A buffer over-read may occur when unpacking wwpack'd PE files if the
file is very small.
This commit adds a CLI_CONTAINS buffer wrap check to ensure we aren't
reading beyond the exe buffer.
We determined that this issue is not a vulnerability.
Resolves: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=57374
* Add new clamd and clamscan option --cache-size
This option allows you to set the number of entries the cache can store.
Additionally, introduce CacheSize as a clamd.conf
synonym for --cache-size.
Fixes#867
The code to extract CSS from HTML <style> blocks contains an off by one
in case there is no actual content it will have a chunk_size of -1.
Whoops.
Removed the -1 so it is correct, and added an extra safety check in case
something else crazy happens.