I found mixed types and multiple bugs in the hashtable/map/set code, and
very little documentation.
The most documentation available is the bytecode compiler users manual.
Although I also found one discrepancy there with the return value for
the BC API map_remove function that calls cli_map_removekey() and so put
in an issue with the compiler project for the documentation.
Most notably is that this hashtab.c had a lot of functions returning
negative enum values instead of returning the enums and then having the
caller evaluate the return code to return a negative/0/1 result.
This commit fixes all of that, and adds in a bunch of documentation to
explain the purpose and behavior of each function and structure provided
by hashtab.c/.h.
Specific bugs that I know I fixed outside of code quality improvements:
- cli_hashset_toarray() was returning CL_ENULLARG / CL_EMEM on failure,
when the caller is expecting a ssize_t to indicate how big of an array
is allocated. It now returns -1 on failure.
I also found that an attempt was made to have the same API that takes a
mempool pointer even if mempool is disabled. I preserved that, but made
it so the macro is in all-caps so it's more obvious what is going on.
Rework the append_virus mechanism to store evidence (strong indicators,
pua indicators, and eventually weak indicators) in vectors. When
appending a "virus", we will return CLEAN when in allmatch-mode, and
simply add the indicator to the appropriate vector.
Later we can check if there were any alerts to return a vector by
summing the lengths of the strong and pua indicator vectors.
This does away with storing the latest "virname" in the scan context.
Instead, we can query for the last indicator in the evidence, giving
priority to strong indicators.
When heuristic-precendence is enabled, add PUA as Strong instead of
as PotentiallyUnwanted. This way, they will be treated equally and
reported in order in allmatch mode.
Also document reason for disabling cache with metadata JSON enabled
A copy-paste error in the markdown support for building the macOS-
installer caused an issue where markdown module may be accidentally
used instead of pytest. Strangely, this still worked okay up until we
converted the clamscan tests into a directory of tests, at which point
the `python3 -m markdown` invocation was upset that it was trying to
execute a directory instead of a .py file.
This commit removes the line that overwrites the python test command
with using the markdown module.
Adds a test set authored by Andrew Williams that validates correct
allmatch behavior using as many features as possible to alert on a
test.exe program. Source for building the test.exe program is provided,
for those who are curious what it is and what it does, or in case it
needs to be re-built for some reason.
In addition to adding a test that verifies each of the sigs that should
alert, do alert, this adds a test to verify that if an authenticode
trust signature is added, none of the signatures alert. That test is
presently failing (expected failure, so the tests all pass) and should
be updated when the certificate verification bug is fixed.
Adds a set of clamscan tests to verify trusting files by hash at the
current (normalized, in this case), the parent (non-normalized), and
grandparent (zipped) layers.
This test is added because of work done in the `cli_scan_fmap()`
function that trimmed out extra hash FP-checks.
The `clamscan_test.py` file is getting way too long.
Created a new `unit_tests/clamscan` directory and separated all tests
into separate test files.
I also fixed an issue with the clamscan `ign2` test:
The `ign2` test wasn't written correctly and was actually testing
detection despite using the `-d` parameter to try to ignore a signature.
There is a minor bug where `ign2` files may be loaded after other files
when using the `-d` option. It is only guaranteed to be loaded first if
you load all the sigs from the same directory. I fixed the test.
In the future, we should make it so all database files are sorted in a
list before load time regardless of where they're sourced from.
Add new tests to verify correct behaviour combining:
- a "potentially unwanted" indicator
- a regular alert (strong indicator)
The test file is a zip that:
1. first has a malformed PNG file used to trigger the heuristic alert
when using `--alert-broken-media`.
2. second has the clam.exe program used to trigger the normal alert.
Included are tests with this file for a number of situations:
1. Test the heuristic alert must not alert because neither allmatch
is specified, nor --heuristic-scan-precedence.
2. Test the heuristic alert must alert because we don't use the sig
for the clam.exe file.
3. Test the heuristic alert must alert first because
--heuristic-scan-precedence is enabled.
We won't see the other alert because it's not allmatch mode.
4. Test we use --allmatch but we don't use --heuristic-scan-precedence.
That means the NDB sig should alert first, even though the heuristic
is encountered first.
Note the verify_output() uses STRICT_ORDER.
5. Test we use --allmatch AND we use --heuristic-scan-precedence.
That means the heuristic is encountered first and should be treated
equally, so it should alert first.
Note the verify_output() uses STRICT_ORDER.
There is a bug where because embedded file type recognition is during
the raw-scan (i.e. pattern match of a given layer), if it finds a
malware sig match in the raw-scan it will lose the metadata about
embedded file types found, and will fail to extract them.
In all-match mode, we want it to keep extracting and scanning despite
matches, so this is a bug.
This test verifies that it will match on a signature, and also find the
embedded zip, extract it, and match on the stuff within.
Add tests to verify an alert on the base file in addition to embedded
file type recognition (for ZIPSFX extraction) and then subsequent
detection of content extracted from the embedded zip.
Add tests:
- Test an import hash with wildcard size when all-match mode is
disabled.
- Test that bytecode rules will run after content match alerts in
all-match mode.
Print bytecode signature names in debug log when running, and not just
the bytecode signature id number.
Place bytecode signature-created temp files in the correct temp
sub-directory, and give it a name prefix that is more identifiable.
The current implementation sets a "next layer attributes" flag field
in the scan context. This may introduce bugs if accidentally not cleared
during error handling, causing that attribute to be applied to a
different layer than intended.
This commit resolves that by adding an attribute flag to the major
internal scan functions and removing the "next layer attributes" from
the scan context. This attributes flag shares the same flag fields as
the attributes flag in the new file inspection callback and the flags
are defined in `clamav.h`.
libclamav callbacks can be used to access embedded file content at each
layer of extraction during the course of a scan. The existing callbacks
only provide access to the file descriptor and a guess at the file type.
This patch adds a new callback for the purposes of file/archive
inspection that provides additional insight into the embedded file.
This includes:
- ancestors: an array of parent file names
- parent file size: the size of the direct parent layer
- file name: current layer's filename, if any.
- file buffer (pointer)
- file size: size of file buffer
- file type: just a guess at the current file's type
- file descriptor: may be -1 if the layer is in-memory only.
- layer attributes: a flag field. see LAYER_ATTRIBUTE_* defines in clamav.h
Two new example apps are added that are automatically built when
compiling under CMake:
- ex2 demonstrates the prescan callback.
- ex3 demonstrates the new file inspection callback.
The examples are now installed if enabled, so you can test them in the
Docker image, and so that they'll be colocated with the DLLs so you can
test them on Windows. The installed examples should also be able to find
the UnRAR library at run time, without having to set LD_LIBRARY_PATH.
This commit also sets the fmap->name in an fmap-scan using the basname
of the provided filename if the caller provided the filename and the
provided fmap does not have the name set.
Some basic testing is needed for the new cl_cvdunpack() API, so this
commit adds basic unit tests for that.
For reasons unknown, a number of cl_* API's have stubs for unit tests
that weren't filled out. The CVD load/verify ones in particular
required access to a signed CVD. We actually ship a very basic signed
CVD with the databases now, so I added tests for those while I was at it.
In the interest of using the public API's as much as possible for our
own applications (dog-fooding the API), this commit swaps sigtool and
freshclam `cli_cvdunpack()` calls to `cl_cvdunpack()`.
Add `cl_cvdunpack()` function to the public API.
This new API has an option to disable verification, but otherwise it
will attempt to verify that the CVD is correctly signed.
The CheckFmapFeatures module is failing because HAVE_SYS_STAT_H and
HAVE_SYS_TYPES_H are not defined within the check_c_source_compiles
environment.
This commit adds them in the same way as is done for CheckFDPassing
module.
We already set the default build type to RelWithDebInfo for CMake, but
we were setting it *after* adding the Rust module. We need to do it
before, or else the Rust stuff will still default to Debug, which makes
for really slow scans of images that get fuzzy hashed.
Fix a possible overread in `handle_de()` where we dereference tokens
without boundchecking. The over-read does not cause a crash.
Resolves: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=44849
Also clean up very sloppy bounds checking in `match_parameters()`.
I don't have any evidence of an overflow. The code just looks sus.
Eliminated a benign warning in `cli_js_process_buffer()`.
I was unaware that while Yara rule regex strings may-or-may-not
escape '/' characters in the regex string, Clam logical sigs MUST escape
them. The Yara rule parser automatically removes the unnecessary '/':
https://github.com/Cisco-Talos/clamav/blob/clamav-0.105.1/libclamav/yara_lexer.l#L509-L514
That's a good feature, we don't want to remove that. But the Clam
logical sigs don't have an equivalent feature. So I changed the LDB
version of the regex '/' + ':' test to include the escape '\/'
character.
This commit also adds some new tests to make sure we don't break support
for LDB sigs with multiple PCRE subsignatures in the future, and to test
that the offset feature and the case-insensitive feature work for PCRE
subsignatures.
My recent fix for the issue where a '\' followed by ':' in a Yara regex
string would fail to parse introduced a new issue that broke loading a
signature in the current daily.ldb database.
Unbeknownst to me at the time, you can have multiple PCRE subsignatures
in a logical signature, so long as they're the last subsignatures.
The previous fix made it so the signature parser muddled more than one
PCRE subsignature into one messed up regex string.
This commit essentially reverts the previous fix, while keeping some of
the code readability improvements in that function.
Instead, it addresses the problem a different way. To resolve the
original problem, I'm simply checking if the signame starts with "YARA".
If it does, we don't tokenize it by ':' delimiters.
Precompiling the test executable causes `sudo make install` to fail.
The issue is that because we don't know the OUTPUT file path for the
test executable, CMake can't know if it doesn't need to rerun that
command, so it will run it with any call to `make`. The problem is that
cargo is will fail to run with `sudo` on many systems.
This is generally okay, because we just fixed the issue where we had
been compiling the dependencies for each module. Now that we only build
the dependencies once, building the test executable is actually very
fast. So compiling it during the test isn't so bad.
Right now, each Rust-based target added in CMake is being built in its
own directory under the build path. This causes Rust to build each
module from scratch, meaning any dependencies they have in common are
built twice.
The solution in this commit is to specify the top level build directory
as the target directory for every Rust build and test.
Note this also changes where the `clamav_rust.h` file is generated. It
is now also placed in the top-level build directory, instead of under the
`build/libclamav_rust` directory. That's a bit of a side-effect, and
could be rectified if needed, but it appears to have no ill-effects.
It's the same location that we drop the clamav-types.h file, so I think
it's fine, for now. Note that `clamav_rust.h` is not a public header,
it's just so libclamav functions can call into libclamav_rust functions.
Precompile the test executable during the main build, after libclamav
has built. This will make it so the compile time does not count against
the test time.
It also, unfortunately, make the main build take way longer.
Removed 3 duplicate variables from the test ENVIRONMENT variable.