I found that the `url(data:` type does not matter to a browser.
In addition, whitespace may be placed in a few locations and the browser
will ignore it.
This commit accounts for this, and updates the test accordingly.
This commit adds a feature to find, decode, and scan each image found
within HTML <style> tags where the image data is embedded in `url()`
function parameters a base64 blob
In C in the html normalization process we extract style tag contents
to new buffer for processing. We call into a new feature in Rust code to
find and decode each image (if there are multiple).
Once extracted, the images are scanned as contained files of unknown
type, and file type identifcation will determine the actual type.
The verdict is being recorded before the preclass bytecode hook meaning
that the final verdict may come back as "clean" in allmatch mode,
even if the preclass bytecode hook matches something.
This commit moves the verdict check to occur AFTER the preclass bytecode
hook executes.
While searching for the end of the character class ']', there was no
bounds checking to prevent reading past the end of the regular
expression.
This commit fixes the issue by adding length checking to regex_parsing.
Resolves: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47780
Prototypes (or the declarations themselves, if there is no
corresponding prototype) for functions that take no arguments are
required by the C standard to specify (void) as their argument list;
for example,
regex_pcre.h:79:1: error: function declaration isn't a prototype
[-Werror=strict-prototypes]
79 | cl_error_t cli_pcre_init_internal();
Future versions of clang may become strict about this, and there's no
harm in conforming to the standard right now, so we fix all such
instances in this commit.
The bytecode compiler has a bug that was observed to cause a crash
with a specific signature when testing on FreeBSD.
This PR adds an index size check when looking up a constant to
prevent this crash in case a bugged signature gets through QA.
An additional NULL check is added just to be safe.
While reviewing scan time performance, the time limit checks have a
notable effect on scan time. I tested reducing the number of time checks
to every n'th signature evaluation. For the given sample which has a
large number of very tiny embedded scans, I found that overall scan time
was best when checked every 10th signature. Reducing the number of
checks any further had no noticable effect.
Bear in mind that these numbers include daily/main/bytecode CVD load
time:
| current | mod 2 | mod 10 |
| ----------------- | ----------------- | ----------------- |
| 63.848 s ±1.188 s | 61.773 s ±0.652 s | 59.831 s ±0.975 s |
| mod 50 | mod 100 | mod 1000 |
| ----------------- | ----------------- | ----------------- |
| 59.279 s ±1.652 s | 59.198 s ±1.147 s | 59.440 s ±1.304 s |
ClamAV requires non-default build options for TomsFastMath to support
bigger floating point numbers. Without this change, database and
Windows EXE/DLL authenticode certificate validation may fail.
The option to build with an external TomsFastMath library should be
removed. The `ENABLE_EXTERNAL_TOMSFASTMATH` build should be ignored,
with some sort of warning.
This commit removes the option. If used, the build will not fail, but
CMake will print a warning that the unused option was ignored.
This bug was introduced during 1.0 development and does not affect any
stable release.
An OLE2 document that claims to have a sector size (big block size) over
28 will cause a signed integer overflow for our local off_t variable,
when multiplied by 4. This will then result in an invalid read and
segfault.
The fix is two-fold:
1. Make sure the sector size (big block size) is not greater
than 28, so we don't accidentaly overflow an int32_t.
2. Perform the size check _before_ using the variable.
This commit also fixes some variable types from `unsigned int` or
`off_t` over to the more appropriate `size_t`, and from `int` or
`uint32_t`, over to `bool`.
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=52639
In preparation for the release candidate, changing the version suffix
from devel to rc.
Increment SO current version to 11, because we added to the public API.
Increment SO age to 2, because we didn't change anything or remove
anything from the public API.
Added functionality level values to the bytecode API enum for recent
patch versions.
Coverity found two slightly different buffer length issues
with the memcmp at the end of the verify_key_aes() function.
The problem is that the hash must be a SHA1, but the the verifier hash
field is up to 32 bytes, and the structure may also lie about the size
of the hash, saying it's more than 32 bytes.
Realistically, it should be 20 bytes if it is functioning correctly,
so we shouldn't really care about the extra 12 bytes.
Coverity static analysis is complaining that the ctx variable
is checked for NULL after it has already been dereferenced.
This code was copied from elsewhere in the file, so fixed it
there as well.
Add error checking to make sure we aren't calling into aes decrypt
functions with larger key sizes, that would make decrypt read past the
end of it's data array.
The function `cli_scan_fmap()` is where we check PE authenticode
certificates. But it is only returning CL_CLEAN, CL_VIRUS, or file
types. It should propagate errors as well as CL_VERIFIED, if the
authenticode finds we should trust the file.
RSA certificate verification is failing because we accidentally lowered
the max size of numbers for floating point multiplication with
TomsFastMath when upgrading the vendored library code.
This commit restores the default from 4096 to 8192.
Adds in object extraction recursion limits for object extraction as well
as well parsing string, array, and dictionaries during extraction.
The limit is set to 25.
Places recursion-depth variable in pdf parse context structure.
Signatures that have conditions on a Container type or Intermediates
types may not match if the target file is scanned outside of the
container first, and cached as clean.
The solution to this is to disable caching for layers that would
have matched except for the Container (or Intermediates) condition.
Increases internal max allocation size to allow support for
files up to 1GB.
Provides a more helpful, less dramatic warning message if this
limit is exceeded.
Some small files that appear to include archive content (zip entries,
RAR archives, etc) later in the file are no longer set to that SFX type
initially. However, the current way we skip those drops the initial
type detection on the floor, meaning that something like a small image
may end up as CL_TYPE_ANY instead of CL_TYPE_PNG.
This commit fixes that by changing where in the process the SFX types
are ignored.
Some files that take longer to scan may exceed the scan time by a fair
bit during the actual pattern matching part of the scan (vs.
decompressing archives or running bytecode functions, or whatever).
ClamAV does pattern matching and hashing of files in chunks.
This change adds a time check before each chunk is matched.
In the logical sig evaluation function, we are calling the run bytecode
function even if the lsig doesn't depend on bytecode. That function
immediately returns CL_ENULLARG, which previously was ignored because we
were only checking for CL_VIRUS. Now that we're checking if NOT success,
the error propagates and scanning of that layer ends early.
The above bug is actually in some duplicate code, so when fixing this I
went ahead and refactored the function to remove this, and added
additional comments and readability improvements.
Error handling for bytecode sig evaluation, as well as logical and yara
sig evaluation only consider CL_VIRUS or CL_SUCCESS and fail to
consider terminal errors such as CL_ETIMEOUT.
This commit fixes the error handling in those functions so we properly
abort instead of continuing to evaluate more sigs.
This commit also adds some a scan time limit checks:
1. shortly after a bytecode time limit exceeded.
2. after evaluating each lsig or yara sig.
3. when processing unzip central directory file headers.
The `cli_memcpy` function is a version of memcpy that will catch
exceptions on Windows if the memcpy fails because the memory-mapped file
fails to read (like if a network drive is disconnected or something).
It presently returns an `int`. Since it is only used in one spot, I've
swapped it over to a `cl_error_t`, and created a unique variable so
we're not piggybacking with the `cli_file_t` enum used to check the
result of a filetyping scan.
Fix pretty silly double-free, and fix issue where 'extracted_file' was
not being set to true.
Also rename `uncompressed` variable to `uncompressed_block`, because
there is another variable with the same name, above.
Because of an oversight, the `ctx->abort_scan` flag was propagating
timeout and perhaps other error types to the final verdict.
Fixed this in the `result_should_goto_done()` function.
Some non-fatal errors (notably CL_ETIMEOUT) may appear between the call
to the magic-scan, and the end of scan_common().
We need to filter these status results at this level as well to prevent
this from happening.
I also fixed a warning accidentally introduced earlier caused by trying
to use a string after free'ing it and setting it to NULL.
And I explicitly set the status to CL_SUCCESS if the cache check comes
back clean, rather than trusting that no one had messed with the `ret`
variable since it was let set (to CL_SUCCESS). It's more readable and
less prone to future bugs.
The cli_bc_ctx->outfd struct member was not properly initialized to -1.
Perhaps previous developers figured 0 was invalid-enough. All of the
checks for that file descriptor assumed 0 was the invalid value, going
so far as to explicitly set outfd to 0 if `open()` returned -1.
I didn't know this, so when I cleaned up the error handling in
`cli_unpackelf()` and `cli_unpackmacho()`, I had it `close(outfd)` when
not -1. That of course ended up closing stdin... and then all subsequent
file scans opened the file as fd `0`,... which interestingly caused
`read()` and `stat()` errors, but only after scanning a macho or elf
file, first.
Anyways... this commit fixes the issue by properly initializing outfd to
-1, and by changing any checks from 0 to -1.
I also found that it appears that the bytecode timeout wasn't being
applied to bytecode functions associated with logical signaures (that
is, those run by `cli_bytecode_runlsig()`).
What I see is that `ctx->bytecode_timeout` is only set to a non-zero
value in `cli_bytecode_context_alloc()`.
But for `cli_bytecode_runlsig()`, the bytecode context sits on the stack
and is memset instead. To resolve this, and ensure the bytecode context
is properly initialized, I created a new function that does this and
had it do the memset instead of using a calloc in the allocation
function.
I also removed the `bytecode_context_clear()` function because it simply
called `bytecode_context_reset()` and then did a memset. The memset is
unnecessary, especially since in most cases it's memsetting a stack
structure immediately before a return.
The header parsing / executable metadata collecting functions for the
PE, ELF, and Mach-O file types were using `int` for the return type.
Mostly they were returning 0 for success and -1, -2, -3, or -4 for
failure. But in some cases they were returning cl_error_t enum values
for failure. Regardless, the function using them was treating 0 as
success and non-zero as failure, which it stored as -1 ... every time.
This commit switches them all to use cl_error_t. I am continuing to
storeo the final result as 0 / -1 in the `peinfo` struct, but outside of
that everything has been made consistent.
While I was working on that, I got a tad side tracked. I noticed that
the target type isn't an enum, or even a set of #defines. So I made an
enum and then changed the code that uses target types to use the enum.
I also removed the `target` parameter from a number of functions that
don't actually use it at all. Some recursion was masking the fact that
it was an unused parameter which is why there was no warning about it.
Also fixed a number of conditions where magic_scan() critical errors may
be ignored.
To ensure that the scan truly aborts for signature matches (not in
allmatch mode) and for timeouts, the `ctx->abort` option is now set in
these two conditions, and checked in several spots in magic_scan().
Additionally, I've consolidated some of the "scan must halt" type of
checks (mostly large switch statements) into a function so that we can
use the exact same logic in a few places in magic_scan().
I've also fixed a few minor warnings and code format issues.