A bug introduced in the OLE2 BIFF XLM & image extraction code is causing
some file scans to fail when part of the macro extraction fails, such as
failing to transcode UTF16LE (Windows unicode) macros to UTF-8.
This commit allows scanning to continue without failing out if the
expected BIFF temp files aren't found.
I also changed the cli_codepage_to_utf8() "incomplete multibyte
sequence" warning to be a debug message, because it is too common, and
too verbose.
This is a cherry-pick of commit 24f225c21f
Modification to unrar codebase allowing skipping of files within
Solid archives when parsing in extraction mode, enabling us to skip
encrypted files while still scanning metadata and potentially
scanning unencrypted files later in the archive.
Updates to prepare for the 0.104 release candidate:
- Change documentation to explain current bytecode runtime situation.
- Document Python 2 pytest issue.
- Add additional contributors to acknowledgements.
- Update Install instructions to note that Autotool has been removed.
- Add *.cat SHA256 support and PDF bytecode hook bugfix to the News.
- Clarify purpose of the clamscan `--gen-json` option in the
clamscan --help.
In 0.104.0 we added new load/compile/free progress callback APIs to
clamav.h
This is a backwards compatible change, so we're bumping the current and
age fields, and resetting the revision.
See http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
for more info about libtool style .so versioning.
Bytecode signatures targeting PDF files (target 10) fail to evaluate
match conditions. This occurs because the raw scan step during a magic
scan currently occurs _after_ the PDF scan, thus at the time the hook is
triggered, no matches have been performed and the logical sig eval is
always negative.
This commit fixes the issue by relocating the PDF parsing step to occur
after the raw scan, thereby enabling the bytecode lsig evaluation to
match and the signature to execute.
The embedded file type recognition feature scans files for embedded
files. This can identify things like self extracting zips (ZIPSFX), as
well as file types like DMG and MHTML files that can't be easily
identified using the start of the file.
It can waste CPU if you detect SFX files and then use the embedded
file type recognition scan on those SFX files, potentially detecting and
processing the same portions over and over.
There is loop in the SIS scanning parser where an SIS header may point
to the beginning of the file as the start of an archived file.
This would be an infinite loop if not for the scan recursion limit.
This commit fixes that by making sure that both the file records and the
individual file pointers start after the main SIS header.
Python 3.5 compatibility fixes for Debian 9, etc that lack 3.6+.
Change a python f-string to an old-style `"".format()`.
Convert Path objects to strings for older `shutil` APIs that don't
accept Paths.
Fix missing return values for progress callbacks.
Fix Windows build.
The cli_debug_flag variable is not exported on Windows. The correct way
to check if in debug-mode is to check the command line options.
Added a test to verify that clamscan can extract images from an XLS
document. The document has 2 images: a PNG and JPEG version of the
clamav demon/logo. The test requires the json metadata feature to verify
that the MD5 of the images are correct.
No other image formats were tested because despite the format allegedly
supporting other imate formats, Excel converts TIFF, BMP, and GIF images
to PNG files when you insert them.
The split test files are flagged by some AV's because they look like
broken executables. Instead of splitting the test files to prevent
detections, we should encrypt them. This commit replaces the "reassemble
testfiles" script with a basic "XOR testfiles" script that can be used
to encrypt or decrypt test files. This commit also of course then
replaces all the split files with xor'ed files.
The test and unit_tests directories were a bit of a mess, so I
reorganized them all into unit_tests with all of the test files placed
under "unit_tests/input" using subdirectories for different types of files.
Fixup input output params to be anotated with [in,out], not [in/out].
Note: skipped some other incorrectly annodated [out] params that are
already staged to be fixed in a different PR.
The previous image extraction logic would search from the beginning of
the drawing group for the image file type magic bytes and then just
assume the rest of the file is that type. This is super hacky, didn't
support more than one image extraction, and resulted in "image files"
that contain a bunch of extra garbage data (which may include more
images or maybe just some metadata about how the images are used).
This commit implemented part of the office draw file specification to
correctly identify the start and size of each embedded image. Instead of
processing the drawing group as though it is one image to be extracted,
it collects the drawing group data into a single buffer, and then parses
the records within to identify the images within Blip records.
Based on: https://interoperability.blob.core.windows.net/files/MS-ODRAW/%5bMS-ODRAW%5d.pdf
Also resolved the following issue:
If XLM (and now images) are found when parsing an ole2 files the
following other embedded content may not be processed:
- document summary metadata
- embedded ole10 files
- ole2 temp subdirectories (i.e. recursion)
The logic to process the above ole2 extracted temp files was present in
the function which processes extracted VBA. When we added support for
extracting XLM macros, processing these other data was lost.
Really, the above need to be processed if any temp files were saved.
I fixed this by restructuring the features to extract any type of temp
file into separate functions per type of temp file. I then wrappped
those in an ole2 temp dir scanning function. OLE2 temp directory scanning
is recursive if there are subdirectories.
Added a feature to extract images from OLE2 BIFF streams.
This work was derived from InQuests blog post about extracting XLM and
images from XLS files:
https://inquest.net/blog/2019/01/29/Carving-Sneaky-XLM-Files
Assorted ole2 parser code cleanup and massive error handling cleanup.
Also fixed the following:
- The XLS parser may fail to process all BIFF records if some of the
records contain unexpected data or is otherwise malformed. Because the
record size is already known, we can skip over the "malformed" record
and continue with the rest.
- Fixed an issue where the ole2 header size was improperly calculated,
failing to account for the new "has_xlm" boolean added for context.
Trusted SHA256-based Authenticode hashes can now be loaded in
from .cat files. In addition:
- Files that are covered by Authenticode hashes loaded in from
.cat files will now be treated as VERIFIED like executables
where the embedded Authenticode sig is deemed to be trusted
based on .crb rules. This fixes a regression introduced in
0.102 (I think).
- The Authenticode hashes for signed EXEs without .crb coverage
will no longer be computed in cli_check_auth_header unless
hashes from .cat rules have been loaded. This fixes a slight
performance regression introduced in 0.102 (I think).
Add progress callbacks to libclamav for:
- database load
- engine compile
- engine free
Add a progress bar to clamscan for load & compile.
These are disabled if you run with --debug or stdout is not a TTY or you
are using one of --quiet, --infected, or --no-summary.
Added code so you can test the engine-free callback by building with
ENABLE_ENGINE_FREE_PROGRESSBAR defined.
The compile & free progress callbacks pre-calculate the number of
tasks to complete to estimate the progress. Some tasks may take longer
than others so the progress speed my appear to vary a little.
The callbacks return type is a cl_error_t but doesn't currently do
anything. It is reserved for future use.
Minor formatting change in matcher-ac.c to counteract weird
clang-format behavior, and to make it easier to read.
Added progress callbacks and clamscan progress bars to the news.
Adds a basic test to validate that ExcludePath correctly excludes a
subdirectory but does not exclude subsequent files. As with the other
ClamD/Scan tests, it will test in each mode: regular, stream, and
fdpass (if available).
Unlike the other tests, this one tests ClamDScan with Valgrind instead
of ClamD.
Refactored the clamd_test.py file to reduce duplicate code, and support
enabling and disabling valgrind when running ClamDScan and ClamD.
Add pytest to the github actions environments because the results when
using pytest are far easier to read.
ClamDScan will leak the memory for the scan target filename if using
`--fdpass` or using `--stream`. This commit fixes that leak.
Resolves: https://bugzilla.clamav.net/show_bug.cgi?id=12648
ClamDScan will fail to scan any file after running into an
"ExcludePath" exclusion when using `--fdpass` or `--stream` AND
--multiscan (-m). The issue is because the parallel_callback()
callback function used by file tree walk (ftw) feature returns an
error code for excluded files rather than "success".
Memory for the accidentally-excluded paths for a given directory also
appears to be leaked.
This commit resolves this accidental-abort issue and the memory leak.
There was an additional single file path memory leak when using
`--fdpass` caused by bad error handling in `cli_ftw()`.
This was fixed by removing the confusing ternaries, and using
separate pointers for each filename copy.
ClamDScan with ExcludePath regex may fail to exclude absolute paths
when performing relative scans because the exclude-check function may
match using provided relative path (E.g. `/some/path/../another/path`)
rather than an absolute path (E.g. `/some/path/another/path`).
This issue is resolved by getting the real path at the start of the
scan, eliminating `.` and `..` relative pathing from all filepaths.
TODO 1: In addition to being recursive (bad for stack safety), the
File Tree Walk (FTW) implementation is a spaghetti code and should
be refactored.
TODO 2: ExcludePath will print out "Excluded" for each path that is
excluded when using `--fdpass` or `--stream`, and for each path
directly scanned that is directly excluded. But in a recursive
regular-scan, the "Excluded" message for the those paths is missing.
There appear to be minors leak in clamd that can occur when shutting-
down immediately after a command (e.g. RELOAD).
These are causing intermittent clamd test failures.
It seems like they're caused by a thread leaking occasionally,
due to not exiting before the program terminates.
I don't believe these to be a serious issue. Tracking down the exact
cause and crafting a fix for the leaks isn't worth the effort.
This commit adds valgrind suppression rules to stabilize the tests.
Added feature to start FreshClam & Clamd as Windows services
Special thanks to Gianluigi Tiesi for allowing us to integrate this
feature from ClamWin directly into ClamAV.
Added internal --service-mode option for FreshClam and ClamD
This is used when Windows starts FreshClam or ClamD as a service so
that they will register with the service manager.
Code found in service.c.
Windows XP had a maximum section count of 96, and this has been
the max for ClamAV forever as well. Raising this prevents malicious
executables from being able to evade certain ClamAV signatures by
having 97 or more sections.
The non-existent file test has a hack to "expect" a wierd error message
caused by the '\v' character rather than the file not actually existing.
Recently something(?) changed and the test started reporting yet a
different message or no message.
Removing the '\v' special character fixes the test so it actually tests
a non-existent file and returns the same message as on other operating
systems.
Previously we'd not clang-formatted the c++ bytecode files because:
A) It's a massive difference in format
B) I wasn't sure, at the time, which code was "ours"
Reformatting now that the LLVM source is all removed and before it gets
updated to support modern LLVM versions.
Add a test where freshclam received a zero-byte cdiff to trigger a whole
CVD database download, and the CVD served is older than advertised.
This is a regression test for a bug found & fixed by Andrew Williams.
This commit fixes a bug in the libfreshclam error handling to where if
either of the following scenarios are encountered, the CVD download
attempt may be retried multiple times and always result in failure:
Scenario 1:
- Incremental downloads via CDIFFs are stopped because an empty CDIFF
file is encountered, and
- The CVD downloaded from the configured mirror is older than the
version advertised via DNS (for example, due to caching)
Scenario 2:
- Incremental downloads via CDIFFs fail, and
- The local database is more than 1 version out of date, and
- The CVD downloaded from the configured mirror is older than the
version advertised via DNS (for example, due to caching)
This bug was discovered by Coverity:
317956 Logically dead code
In updatedb: Code can never be reached because of a logical
contradiction
Adds 3 tests to validate that:
1. a CDIFF update works
2. a CDIFF partial update (with 1 missing CDIFF) works
and that a subsequent update is ok with being 1 behind
3. a CDIFF partial update (with 2 missing CDIFFs) works
and that a subsequent update will try to get the WHOLE CVD -
because being 2+ CDIFFs behind without any update isn't good enough.
Also fixed a minor bug so that the database name is properly displayed
when a partial update occurs instead of displaying "(null)".
Also changed the freshclam test port to 8001 to deconflict with
CVD-Update, in case that's running in the background.
TODO: Make the tests smarter so they find an open port instead of
hoping that 8001 is available.
The URL registry.hub.docker.com was apparently deprecated for a while,
and started to give 404 errors as of today for some repo's. The correct
URL is index.docker.io, so lets use that instead.
Signed-off-by: Olliver Schinagl <oliver@schinagl.nl>
Cloudflare deprecated the __cfduid cookie which caused ClamSubmit
failures on systems that stopped receiving the cookie.
This commit removes support for the __cfduid cookie.
Also made the session cookie optional, in case that disappears too.
Changed error messages over to use the logg() function like our other apps.
Tidied up some of the logic, and changed "cleanup" label to "done" to
match other code.
The for loop in cli_bcomp_scanbuf contains a few "continue" directives
that do not free the three-bytes subsigid buffer allocated within the
loop. This code path is triggered only when a signature contains more
than one byte compare subsignatures. Over a significant amount of time,
as for example when using clamd, this leads to memory exhaustion.