Prototypes (or the declarations themselves, if there is no
corresponding prototype) for functions that take no arguments are
required by the C standard to specify (void) as their argument list;
for example,
regex_pcre.h:79:1: error: function declaration isn't a prototype
[-Werror=strict-prototypes]
79 | cl_error_t cli_pcre_init_internal();
Future versions of clang may become strict about this, and there's no
harm in conforming to the standard right now, so we fix all such
instances in this commit.
The bytecode compiler has a bug that was observed to cause a crash
with a specific signature when testing on FreeBSD.
This PR adds an index size check when looking up a constant to
prevent this crash in case a bugged signature gets through QA.
An additional NULL check is added just to be safe.
While reviewing scan time performance, the time limit checks have a
notable effect on scan time. I tested reducing the number of time checks
to every n'th signature evaluation. For the given sample which has a
large number of very tiny embedded scans, I found that overall scan time
was best when checked every 10th signature. Reducing the number of
checks any further had no noticable effect.
Bear in mind that these numbers include daily/main/bytecode CVD load
time:
| current | mod 2 | mod 10 |
| ----------------- | ----------------- | ----------------- |
| 63.848 s ±1.188 s | 61.773 s ±0.652 s | 59.831 s ±0.975 s |
| mod 50 | mod 100 | mod 1000 |
| ----------------- | ----------------- | ----------------- |
| 59.279 s ±1.652 s | 59.198 s ±1.147 s | 59.440 s ±1.304 s |
ClamAV requires non-default build options for TomsFastMath to support
bigger floating point numbers. Without this change, database and
Windows EXE/DLL authenticode certificate validation may fail.
The option to build with an external TomsFastMath library should be
removed. The `ENABLE_EXTERNAL_TOMSFASTMATH` build should be ignored,
with some sort of warning.
This commit removes the option. If used, the build will not fail, but
CMake will print a warning that the unused option was ignored.
This bug was introduced during 1.0 development and does not affect any
stable release.
An OLE2 document that claims to have a sector size (big block size) over
28 will cause a signed integer overflow for our local off_t variable,
when multiplied by 4. This will then result in an invalid read and
segfault.
The fix is two-fold:
1. Make sure the sector size (big block size) is not greater
than 28, so we don't accidentaly overflow an int32_t.
2. Perform the size check _before_ using the variable.
This commit also fixes some variable types from `unsigned int` or
`off_t` over to the more appropriate `size_t`, and from `int` or
`uint32_t`, over to `bool`.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=52639
In preparation for the release candidate, changing the version suffix
from devel to rc.
Increment SO current version to 11, because we added to the public API.
Increment SO age to 2, because we didn't change anything or remove
anything from the public API.
Added functionality level values to the bytecode API enum for recent
patch versions.
Coverity found two slightly different buffer length issues
with the memcmp at the end of the verify_key_aes() function.
The problem is that the hash must be a SHA1, but the the verifier hash
field is up to 32 bytes, and the structure may also lie about the size
of the hash, saying it's more than 32 bytes.
Realistically, it should be 20 bytes if it is functioning correctly,
so we shouldn't really care about the extra 12 bytes.
Coverity static analysis is complaining that the ctx variable
is checked for NULL after it has already been dereferenced.
This code was copied from elsewhere in the file, so fixed it
there as well.
Add error checking to make sure we aren't calling into aes decrypt
functions with larger key sizes, that would make decrypt read past the
end of it's data array.
The function `cli_scan_fmap()` is where we check PE authenticode
certificates. But it is only returning CL_CLEAN, CL_VIRUS, or file
types. It should propagate errors as well as CL_VERIFIED, if the
authenticode finds we should trust the file.
RSA certificate verification is failing because we accidentally lowered
the max size of numbers for floating point multiplication with
TomsFastMath when upgrading the vendored library code.
This commit restores the default from 4096 to 8192.
Adds in object extraction recursion limits for object extraction as well
as well parsing string, array, and dictionaries during extraction.
The limit is set to 25.
Places recursion-depth variable in pdf parse context structure.
Signatures that have conditions on a Container type or Intermediates
types may not match if the target file is scanned outside of the
container first, and cached as clean.
The solution to this is to disable caching for layers that would
have matched except for the Container (or Intermediates) condition.
Increases internal max allocation size to allow support for
files up to 1GB.
Provides a more helpful, less dramatic warning message if this
limit is exceeded.
Some small files that appear to include archive content (zip entries,
RAR archives, etc) later in the file are no longer set to that SFX type
initially. However, the current way we skip those drops the initial
type detection on the floor, meaning that something like a small image
may end up as CL_TYPE_ANY instead of CL_TYPE_PNG.
This commit fixes that by changing where in the process the SFX types
are ignored.
Some files that take longer to scan may exceed the scan time by a fair
bit during the actual pattern matching part of the scan (vs.
decompressing archives or running bytecode functions, or whatever).
ClamAV does pattern matching and hashing of files in chunks.
This change adds a time check before each chunk is matched.
In the logical sig evaluation function, we are calling the run bytecode
function even if the lsig doesn't depend on bytecode. That function
immediately returns CL_ENULLARG, which previously was ignored because we
were only checking for CL_VIRUS. Now that we're checking if NOT success,
the error propagates and scanning of that layer ends early.
The above bug is actually in some duplicate code, so when fixing this I
went ahead and refactored the function to remove this, and added
additional comments and readability improvements.
Error handling for bytecode sig evaluation, as well as logical and yara
sig evaluation only consider CL_VIRUS or CL_SUCCESS and fail to
consider terminal errors such as CL_ETIMEOUT.
This commit fixes the error handling in those functions so we properly
abort instead of continuing to evaluate more sigs.
This commit also adds some a scan time limit checks:
1. shortly after a bytecode time limit exceeded.
2. after evaluating each lsig or yara sig.
3. when processing unzip central directory file headers.
The `cli_memcpy` function is a version of memcpy that will catch
exceptions on Windows if the memcpy fails because the memory-mapped file
fails to read (like if a network drive is disconnected or something).
It presently returns an `int`. Since it is only used in one spot, I've
swapped it over to a `cl_error_t`, and created a unique variable so
we're not piggybacking with the `cli_file_t` enum used to check the
result of a filetyping scan.
Fix pretty silly double-free, and fix issue where 'extracted_file' was
not being set to true.
Also rename `uncompressed` variable to `uncompressed_block`, because
there is another variable with the same name, above.
Because of an oversight, the `ctx->abort_scan` flag was propagating
timeout and perhaps other error types to the final verdict.
Fixed this in the `result_should_goto_done()` function.
Some non-fatal errors (notably CL_ETIMEOUT) may appear between the call
to the magic-scan, and the end of scan_common().
We need to filter these status results at this level as well to prevent
this from happening.
I also fixed a warning accidentally introduced earlier caused by trying
to use a string after free'ing it and setting it to NULL.
And I explicitly set the status to CL_SUCCESS if the cache check comes
back clean, rather than trusting that no one had messed with the `ret`
variable since it was let set (to CL_SUCCESS). It's more readable and
less prone to future bugs.
The cli_bc_ctx->outfd struct member was not properly initialized to -1.
Perhaps previous developers figured 0 was invalid-enough. All of the
checks for that file descriptor assumed 0 was the invalid value, going
so far as to explicitly set outfd to 0 if `open()` returned -1.
I didn't know this, so when I cleaned up the error handling in
`cli_unpackelf()` and `cli_unpackmacho()`, I had it `close(outfd)` when
not -1. That of course ended up closing stdin... and then all subsequent
file scans opened the file as fd `0`,... which interestingly caused
`read()` and `stat()` errors, but only after scanning a macho or elf
file, first.
Anyways... this commit fixes the issue by properly initializing outfd to
-1, and by changing any checks from 0 to -1.
I also found that it appears that the bytecode timeout wasn't being
applied to bytecode functions associated with logical signaures (that
is, those run by `cli_bytecode_runlsig()`).
What I see is that `ctx->bytecode_timeout` is only set to a non-zero
value in `cli_bytecode_context_alloc()`.
But for `cli_bytecode_runlsig()`, the bytecode context sits on the stack
and is memset instead. To resolve this, and ensure the bytecode context
is properly initialized, I created a new function that does this and
had it do the memset instead of using a calloc in the allocation
function.
I also removed the `bytecode_context_clear()` function because it simply
called `bytecode_context_reset()` and then did a memset. The memset is
unnecessary, especially since in most cases it's memsetting a stack
structure immediately before a return.
The header parsing / executable metadata collecting functions for the
PE, ELF, and Mach-O file types were using `int` for the return type.
Mostly they were returning 0 for success and -1, -2, -3, or -4 for
failure. But in some cases they were returning cl_error_t enum values
for failure. Regardless, the function using them was treating 0 as
success and non-zero as failure, which it stored as -1 ... every time.
This commit switches them all to use cl_error_t. I am continuing to
storeo the final result as 0 / -1 in the `peinfo` struct, but outside of
that everything has been made consistent.
While I was working on that, I got a tad side tracked. I noticed that
the target type isn't an enum, or even a set of #defines. So I made an
enum and then changed the code that uses target types to use the enum.
I also removed the `target` parameter from a number of functions that
don't actually use it at all. Some recursion was masking the fact that
it was an unused parameter which is why there was no warning about it.
Also fixed a number of conditions where magic_scan() critical errors may
be ignored.
To ensure that the scan truly aborts for signature matches (not in
allmatch mode) and for timeouts, the `ctx->abort` option is now set in
these two conditions, and checked in several spots in magic_scan().
Additionally, I've consolidated some of the "scan must halt" type of
checks (mostly large switch statements) into a function so that we can
use the exact same logic in a few places in magic_scan().
I've also fixed a few minor warnings and code format issues.
Signatures that start with "PUA.", "Heuristics.", or "BC.Heuristics."
are perceived to be less serious, or more likely to have false
positives, than other signatures that we would think of us as "strong
indicators".
At present, only a subset of "Heuristics." signatures, such as those
added by the phishing module, are added as "potentially unwanted".
Unless you're using heuristic-precedence mode, these "potentially
unwanted" indicators are recorded but not reported unless no other
signature alerts. This behavior should apply to all signatures that
start with "PUA." and "Heuristics.". We already do a string match
comparison on the signature name to apply that behavior to bytecode
matches that start with "BC.Heuristics.".
I moved that string comparison logic used for "BC.Heuristics." into the
main `cl_append_virus()` function and extended it to cover the other two
cases.
I also replaced all hardcoded calls to append "Heuristics." signatures
to append using the `cli_append_potentially_unwanted()` function, so we
can skip the string compare in these cases. That function will of course
append them as strong indicators if heuristic-precedence mode is
enabled.
If initial file type recognition comes back as an SFX type, which may
happen for small files that do not get recognized as any other file type
and contain a zip entry somewhere in the middle, then the type will be
set to that SFX type. This is a problem because later on when we go to
do embedded file type recognition, we explicitly skip SFX types, in
addition to TAR's and other types that are parsed elsewhere and have a
high embedded file type recognition FP-rate because they aren't
compressed.
This commit prohibits that initial FTM check from selecting an SFX type.
The SFX type will be rediscovered in `scanraw()` where the type is
handled/parsed.
Added inline documentation and did some general cleanup of the
`cli_scan_buff`, and then updated the function comment now that I
understand the function a little better.
While doing this, I found that the calls to cli_ac_initdata were being
done regardless of whether or not logically initialized matcher data was
required or used. But the call to free that matcher data was only being
done when AC-data was not provided by the caller. This would be a leak.
I fixed this by only initalizing the AC data when AC data is not
provided by the caller.
Significantly tidy the `cli_scan_fmap()` function, and add comments to
explain how it all works.
Add SHA1 and SHA256 digest variables to the FMAP structure in addition
to the existing MD5. Add a function to set the hash so that when we
calculate the hashes for hash matching, we save them for subsequent FP
matching. This enabled me to remove the extra "hash-only" FP check from
`cli_scan_fmap()`. This will also make it easier to switch the clean
cache hash algorithm to SHA256 in the future.
Remove extra allmatch checks that are no longer required.
Add a new header to prevent #include order issues.
Removed the allmatch checks that are no longer necessary now that it is
done when appending alerts.
Cleaned up error handling in the process to use the `goto done;` method.
I believe this fixes a minor bug wherein a detection in CAB or CHM
metadata scanning would end the scan early in allmatch mode.
Based on changes observed in the commits:
- 08fef61fea
- 1aa8768db2
... I believe that the iptr variable was intended to be used and was
accidentally not used.
In addition, adding the `ULL` suffix in the second location to match the
first, because it appears to have been accidentally omited.
The bytecode source files largely use `int` instead of the appropriate
`cl_errot_t` for clamav status codes, as well for boolean variables.
This hides warnings that would indicate bugs, and makes it harder to
read the code.
I haven't gone as in depth as with some other code cleanups. This
largely just replaces function interfactes and ret variables that use
`int` with `cl_error_t`. I also swapped a couple of `int`s to `bool`s.
While doing so I found that the `cli_bytecode_context_setpdf()` function
was incorrectly placed in the `bytecode_api.c` file instead of the next
to similar functions (`cli_bytecode_context_setpe`, etc.) in bytecode.c.
It's not an API function, so I moved it to the correct location.
I also eliminated a couple of compiler warnings:
- LLVM's CFG.h header emits a warning about a multi-line comment, so
that crops up with using LLVM for the bytecode runtime.
I disabled the warning through CMake.
- C doesn't like using the `inline` keyword on cli_dbgmsg in the
declaration in `bytecode2llvm.c` because we're compiling the bytecode
runtimes as a separate object file from the rest of libclamav.
It doesn't appear to be a functional issue, but I swapped that file
over to use `cli_dbgmsg_no_inline()` instead, just in case.
I would hope link-time-optimization will inline it anyways.