We add the _OR_GOTO_DONE suffix to the macros that go to done if the
allocation fails. This makes it obvious what is different about the
macro versus the equivalent function, and that error handling is
built-in.
Renamed the cli_strdup to safer_strdup to make it obvious that it exists
because it is safer than regular strdup. Regular strdup doesn't have the
NULL check before trying to dup, and so may result in a NULL-deref
crash.
Also remove unused STRDUP (_OR_GOTO_DONE) macro, since the one with the
NULL-check is preferred.
The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.
There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.
Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.
I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.
This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC
I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.
I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.
There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
The strncpy intentionally is not copying the NULL terminator for the log
message prefix. The NULL will be added by vsnprintf, after.
Switching to memcpy eliminates the warning.
Increases internal max allocation size to allow support for
files up to 1GB.
Provides a more helpful, less dramatic warning message if this
limit is exceeded.
* Windows: Fix utf8 filename support issue
The function used to verify the real path for the file being
scanned fails on some utf8 filenames, like:
file_with_ümlaut.eicar.com
The scan continues despite the failure, but the error reported from
clamd and warnings from clamscan are confusing because it looks like
the scan failed even if a verdict shows up later.
The fix here is to convert the path from utf8 to utf16 and then use
the CreateFileW() API to grab the file handle for the real path check.
Resolves: https://github.com/Cisco-Talos/clamav/issues/418
Resolves: CLAM-1782
* Windows: Fix utf8 libclamav logging issues
Print to the log using rust eprint() instead of fputs()
Rust's eprint() function supports full utf8, while fputs() gets
confused and prints stuff like:
file_with_├╝mlaut.eicar.com
instead of:
file_with_ümlaut.eicar.com
* Added loglevel parameter to logg()
* Fix logg and mprintf internals with new loglevels
* Update all logg calls to set loglevel
* Update all mprintf calls to set loglevel
* Fix hidden logg calls
* Executed clam-format
Use bindgen to generate Rust-bindings for some libclamav internal
functions and structures in an new "sys.rs" module.
"sys.rs" will be generated at build-time and unfortunately must be
dropped in the source/libclamav_rust/src directory rather than under the
build directory. As far as I know, Cargo/Rust provide no way to set an
include path to the build directory to separately place generated files.
TODO: Verify if this is true and move sys.rs to the build directory if
possible.
Using the new bindings with:
- the logging module.
- the cdiff module.
Also:
- Removed clamav_rust.h from .gitignore, because we generate it in the
build directory now.
- Removed the hand-written bindings from the cbindgen exclusions.
lib.rs has an annotation that prevents cbindgen from looking at sys.rs.
- Fixed a `u8` -> `c_char` type issue in cdiff in the cli_getdsig() call
parameters.
Also resolve alleged uninitialized memory use by initializing the
`digest` variable in `cli_md5buff()`. MSAN blames it for putting
uninitialized data in the `name_salt` global, though in debugging and in
review I can't find any evidence that it isn't initialized by the call
to `cl_hash_data()` in `cli_md5buff()`.
This MSAN complaint has been a blocker to enabling MSAN in OSS-Fuzz.
Convert cli_dbgmsg to inline function to ensure ctx check for debug flag
is always run
Add copyright and licensing info
Fix valgrind uninitialized buffer issue in cliunzip.c
Windows build fix
The `realpath()` function on macOS will fail when scanning a symlink
that doesn't exist because it tries to determine the real path of the
thing the symlink points to and fails when that thing doesn't exist.
This behavior is different than on Linux or FreeBSD Unix.
I'm fixing this by opening the symlink with `O_SYMLINK` then getting
realpath of the link using `cli_get_filepath_from_filedesc()`, much
like we do on Windows.
Side note: I did try using macOS's _DARWIN_BETTER_REALPATH variant to
see if that resolved the issue, but it didn't seem to.
This resolves https://bugzilla.clamav.net/show_bug.cgi?id=12792
This commit also removes the problematic "access denied" clamd test from
the test suite. This commit broke that test on macOS because the error
message when clamdscan fails to `open()` the file to check the
realpath() is different than the error message when clamd tries to scan
it, but has access denied.
(It's like "Permission denied" instead of "Access denied")
It's not worth #ifdef'ing around macOS to get a test pass because this
test is already causing problems in 32-bit Docker environments where
access isn't actually denied to the user (!). Honestly, it's not worth
keeping the test that simply verifies ClamD's error message when denied
matches expectations.
I also switched to use the C_DARWIN macro instead of __APPLE__ because
I kno C_DARWIN is defined in clamav-config.h on macOS, and I found that
__APPLE__ wasn't defined when I tested.
docs: Fix a few typos
There are small typos in:
- libclamav/others_common.c
- libclamav/pe.c
- libclamav/unzip.c
Fixes:
- Should read `descriptor` rather than `desriptor`.
- Should read `record` rather than `reocrd`.
- Should read `overarching` rather than `overaching`.
Cause: _wopen() on Windows doesn't work on directories and gives a
Permission Denied error.
The old approach used _wopen() to get a file descriptor and gets the
realpath from that.
The new approach opens a HANDLE with CreateFileA() with
FILE_FLAG_BACKUP_SEMANTICS to support directories.
Refactor the cli_get_filepath_from_filedesc() function by adding
cli_get_filepath_from_handle().
Added a feature to extract images from OLE2 BIFF streams.
This work was derived from InQuests blog post about extracting XLM and
images from XLS files:
https://inquest.net/blog/2019/01/29/Carving-Sneaky-XLM-Files
Assorted ole2 parser code cleanup and massive error handling cleanup.
Also fixed the following:
- The XLS parser may fail to process all BIFF records if some of the
records contain unexpected data or is otherwise malformed. Because the
record size is already known, we can skip over the "malformed" record
and continue with the rest.
- Fixed an issue where the ole2 header size was improperly calculated,
failing to account for the new "has_xlm" boolean added for context.
Adds a basic test to validate that ExcludePath correctly excludes a
subdirectory but does not exclude subsequent files. As with the other
ClamD/Scan tests, it will test in each mode: regular, stream, and
fdpass (if available).
Unlike the other tests, this one tests ClamDScan with Valgrind instead
of ClamD.
Refactored the clamd_test.py file to reduce duplicate code, and support
enabling and disabling valgrind when running ClamDScan and ClamD.
Add pytest to the github actions environments because the results when
using pytest are far easier to read.
ClamDScan will leak the memory for the scan target filename if using
`--fdpass` or using `--stream`. This commit fixes that leak.
Resolves: https://bugzilla.clamav.net/show_bug.cgi?id=12648
ClamDScan will fail to scan any file after running into an
"ExcludePath" exclusion when using `--fdpass` or `--stream` AND
--multiscan (-m). The issue is because the parallel_callback()
callback function used by file tree walk (ftw) feature returns an
error code for excluded files rather than "success".
Memory for the accidentally-excluded paths for a given directory also
appears to be leaked.
This commit resolves this accidental-abort issue and the memory leak.
There was an additional single file path memory leak when using
`--fdpass` caused by bad error handling in `cli_ftw()`.
This was fixed by removing the confusing ternaries, and using
separate pointers for each filename copy.
ClamDScan with ExcludePath regex may fail to exclude absolute paths
when performing relative scans because the exclude-check function may
match using provided relative path (E.g. `/some/path/../another/path`)
rather than an absolute path (E.g. `/some/path/another/path`).
This issue is resolved by getting the real path at the start of the
scan, eliminating `.` and `..` relative pathing from all filepaths.
TODO 1: In addition to being recursive (bad for stack safety), the
File Tree Walk (FTW) implementation is a spaghetti code and should
be refactored.
TODO 2: ExcludePath will print out "Excluded" for each path that is
excluded when using `--fdpass` or `--stream`, and for each path
directly scanned that is directly excluded. But in a recursive
regular-scan, the "Excluded" message for the those paths is missing.
Coverity warnings:
- 293628 Uninitialized pointer read - In reload_db: Reads
an uninitialized pointer or its target. A fail case
could lead to `rldata` being used before initialization
- 293627 Uninitialized pointer read - In reload_th: Reads
an uninitialized pointer or its target. A fail case could
lead to `engine` being used before initialization
- 265483 Uninitialized pointer write - In parseEmailFile:
Write to target of an uninitialized pointer. A fail case
could lead `ret` to be dereferenced and written to
- 265482 Resource leak - In parseEmailFile: Leak of memory
or pointers to system resources. A fail case could lead
to `head` being leaked
- 225221 Resource leak - In onas_get_opt_list: Leak of memory
or pointers to system resources. A fail case could lead to
`opt_list` being leaked
- 225181 Resource leak - In onas_ht_rm_hierarchy: Leak of
memory or pointers to system resources. A fail case could
lead to `prntname` being leaked
- 193874 Resource leak - In cli_genfname: Leak of memory
or pointers to system resources. A fail case could lead
to `sanitized_prefix` being leaked
- 225196 Resource leak - In onas_fan_eloop: Leak of memory
or pointers to system resources. A fail cases could lead
to `event_data` being leaked
Also, I added some unresolved comments regarding clamonacc
functionality, and added a version compatibility check that
is shown in the example code in the `fanotify` man page
If you set an ExcludePath regex in clamd.conf and then perform a
ClamDScan scan with --fdpass --multiscan, it will segfault.
The same issue also affects --fdpass --multiscan scans when using
ExcludePath when scanning a patch that doesn't exist.
The issue is that the filepath isn't being passed along for the path
exclusion regex match, resulting in a NULL deref.
This commit also fixes a possible memory leak if by duplicating the path
for the handle_entry() call _after_ the callback() runs, in case ret
isn't CL_SUCCESS and the function exits without every using the entry
structure or free'ing the copied filename.
The above work temporarily caused a test failure in check_clamd and a
valgrind failure in clamd for the nonexistent file test due to a minor
memory leak. This made it apparent that there were a few other nearby
possible memory leaks.
This commit fixes the above plus cleans up the error handling in clamd's
the file tree walk functions.
The security improvement to perform file realpath lookups prior to a
scan has the adverse effect of causing file scans to fail on Windows
when scanning on some filesystems.
Specifically, it was observed that the ImDisk driver doesn't handle the
IRP_MJ_QUERY_INFORMATION message so the call to look up the realpath
using GetFinalPathNameByHandleW() doesn't work.
There are two other API's I've found which can query the real file path.
The first is to create a file mapping of the target file and then use
GetMappedFileNameW() to get the file path. The other is to use the
NtQueryObject() undocumented NT API to get the file path. Each of
these should return roughly the same thing. For files in an ImDisk
RAM-disk drive, the resulting filepath for R:\clam.exe would
be \\Device\ImDisk0\clam.exe. The trouble is, mapping
\\Device\ImDisk0\clam.exe back to R:\clam.exe would rely on an
assumption that ImDisk is using the default drive letter, which is a tad
hacky.
Instead, this patch simply allows the scan to proceed if the realpath
lookup failed. If the user is using the quarantine (remove/move)
features AND if the scan target filepath has a directory junction (soft
link), then the quarantine action will fail. It's not ideal but it is
quite unlikely.
At least some unicode filenames may fail to scan in 0.102.4+ because
while Windows char* strings may be UTF8, the GetFinalPathNameByHandleA
function does not return UTF8 strings and instead does lossy conversion
to ASCII. To fix this, we need to use GetFinalPathNameByHandleW instead
and then convert from UTF16-LE to UTF8.
While fixing this bug, I found and fixed a couple other serious issues
with the Win32 implementation of cli_codepage_to_utf8().
If a file is on a network share, the realpath comes back with a path
name that looks like "\\\\?\\UNC\\<host>\\<share>\\...". In thi scase,
the "\\\\?\\UNC\\" prefix is critical or else clamscan.exe won't be able
to open the file. This patch checks for the "\\\\?\\UNC" prefix and if
it exists, it keeps the prefix, else it trims the "\\\\?\\" portion as
before. This should fix scanning of files on network shares.
This patch adds experimental-quality CMake build tooling.
The libmspack build required a modification to use "" instead of <> for
header #includes. This will hopefully be included in the libmspack
upstream project when adding CMake build tooling to libmspack.
Removed use of libltdl when using CMake.
Flex & Bison are now required to build.
If -DMAINTAINER_MODE, then GPERF is also required, though it currently
doesn't actually do anything. TODO!
I found that the autotools build system was generating the lexer output
but not actually compiling it, instead using previously generated (and
manually renamed) lexer c source. As a consequence, changes to the .l
and .y files weren't making it into the build. To resolve this, I
removed generated flex/bison files and fixed the tooling to use the
freshly generated files. Flex and bison are now required build tools.
On Windows, this adds a dependency on the winflexbison package,
which can be obtained using Chocolatey or may be manually installed.
CMake tooling only has partial support for building with external LLVM
library, and no support for the internal LLVM (to be removed in the
future). I.e. The CMake build currently only supports the bytecode
interpreter.
Many files used include paths relative to the top source directory or
relative to the current project, rather than relative to each build
target. Modern CMake support requires including internal dependency
headers the same way you would external dependency headers (albeit
with "" instead of <>). This meant correcting all header includes to
be relative to the build targets and not relative to the workspace.
For example, ...
```c
include "../libclamav/clamav.h"
include "clamd/clamd_others.h"
```
... becomes:
```c
// libclamav
include "clamav.h"
// clamd
include "clamd_others.h"
```
Fixes header name conflicts by renaming a few of the files.
Converted the "shared" code into a static library, which depends on
libclamav. The ironically named "shared" static library provides
features common to the ClamAV apps which are not required in
libclamav itself and are not intended for use by downstream projects.
This change was required for correct modern CMake practices but was
also required to use the automake "subdir-objects" option.
This eliminates warnings when running autoreconf which, in the next
version of autoconf & automake are likely to break the build.
libclamav used to build in multiple stages where an earlier stage is
a static library containing utils required by the "shared" code.
Linking clamdscan and clamdtop with this libclamav utils static lib
allowed these two apps to function without libclamav. While this is
nice in theory, the practical gains are minimal and it complicates
the build system. As such, the autotools and CMake tooling was
simplified for improved maintainability and this feature was thrown
out. clamdtop and clamdscan now require libclamav to function.
Removed the nopthreads version of the autotools
libclamav_internal_utils static library and added pthread linking to
a couple apps that may have issues building on some platforms without
it, with the intention of removing needless complexity from the
source. Kept the regular version of libclamav_internal_utils.la
though it is no longer used anywhere but in libclamav.
Added an experimental doxygen build option which attempts to build
clamav.h and libfreshclam doxygen html docs.
The CMake build tooling also may build the example program(s), which
isn't a feature in the Autotools build system.
Changed C standard to C90+ due to inline linking issues with socket.h
when linking libfreshclam.so on Linux.
Generate common.rc for win32.
Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef
from misc.c.
Add CMake files to the automake dist, so users can try the new
CMake tooling w/out having to build from a git clone.
clamonacc changes:
- Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other
similar macros.
- Added a new clamav-clamonacc.service systemd unit file, based on
the work of ChadDevOps & Aaron Brighton.
- Added missing clamonacc man page.
Updates to clamdscan man page, add missing options.
Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use
libclamav).
Rename Windows mspack.dll to libmspack.dll so all ClamAV-built
libraries have the lib-prefix with Visual Studio as with CMake.
This patch relocates the real-path check from clamdscan and clamonacc
to clamd. While clamonacc is unlikely to send directories or symlinks
to be scanned, clamdscan may send directories. Real-path checks have
to be performed on the files, not the directories -- both because the
directories may contain symlinks and because the cli_realpath()
function wasn't written to support directories on Windows.
Notably the commit adds a heuristic alert when VBA is extracted using
the new VBA extraction code and similarly adds "HasMacros":true to the
JSON scan properties.
In addition, a change was added to the cli_sanitize_filepath() function
so it converts posix pathseps to Windows pathseps on Windows and also
outputs a sanitized basename pointer (optional) which is used when
generating a temporary filename so that using a prefix with pathseps in
it won't cause file creation failures (observed with --leave-temps where
original filenames are incorporated into temporarily filenames).
Included soem error handling improvements for cli_vba_scandir() to
better track alert and macro detections.
Downgraded utf8 conversion error messages to debug messages because they
are too verbose in files with invalid filenames (observed in some
malware).
Changed the xlm macro and vba project temp filenames to include
"xlm_macros" and "vba_project" prefix, to make it easier to find them.
Relocated XLM and VBA temp files from the top-level tmp directory to the
current sub_tmpdir, so tempfiles for a given scan are more organized.
A malicious user could replace a scan target's directory with a symlink
to another path to trick clamscan, clamdscan, or clamonacc into removing
or moving a different file (eg. a critical system file). The issue would
affect users that use the `--move` or `--remove` options for clamscan,
clamdscan, and clamonacc.
This patch gets the real path for the scan target before the scan,
and if the file alerts and the --move or --remove quarantine features
are used, it mitigates the symlink attack by traversing the path one
directory at a time until reaching the leaf directory where the scan
target file resides before unlinking (or renaming) the file directly.
This commit applies a similar tactic used in the previous commit for
Windows builds, using the Win32 Native API to traverse a path and delete
or move files by handle rather than by file path.
I had some trouble using SetFileInformationByHandle to rename a file by
handle, so for Windows instead it will copy the file to the new location
and then use the safe unlink technique to remove the old file. If the
symlink attack occurs, the unlink will fail, and the system will not be
damaged.
For more information about AV quarantine attacks using links, see the
[RACK911 Lab's report](https://www.rack911labs.com/research/exploiting-almost-every-antivirus-software)
In regression testing we observed occasional errors creating files or
directories. It appears as though these are caused by tmp files that use
identical prefixes. Such prefixed tmp file names currently have a
5-character hash suffix. Though these temporary files or directories are
deleted when no longer needed, there is still the possibility that there
could be a hash collision which causes the scan to error out.
This patch doubles the size of the short-hash to 10-characters to reduce
the chance of a collision.
A way is needed to record scanned file names for two purposes:
1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.
2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.
This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.
To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required. The zip and gpt parsers required some modification to record
file names. The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.
Also:
- Added recursive extraction to the tmp directory when the
--leave-temps option is enabled. When not enabled, the tmp directory
structure remains flat so as to prevent the likelihood of exceeding
MAX_PATH. The current tmp directory is stored in the scan context.
- Made the cli_scanfile() internal API non-static and added it to
scanners.h so it would be accessible outside of scanners.c in order to
remove code duplication within libmspack.c.
- Added function comments to scanners.h and matcher.h
- Converted a TDB-type macros and LSIG-type macros to enums for improved
type safey.
- Converted more return status variables from `int` to `cl_error_t` for
improved type safety, and corrected ooxml file typing functions so
they use `cli_file_t` exclusively rather than mixing types with
`cl_error_t`.
- Restructured the magic_scandesc() function to use goto's for error
handling and removed the early_ret_from_magicscan() macro and
magic_scandesc_cleanup() function. This makes the code easier to
read and made it easier to add the recursive tmp directory cleanup to
magic_scandesc().
- Corrected zip, egg, rar filename extraction issues.
- Removed use of extra sub-directory layer for zip, egg, and rar file
extraction. For Zip, this also involved changing the extracted
filenames to be randomly generated rather than using the "zip.###"
file name scheme.
This commit improves the layout of the tmp file output and the JSON
metadata output when using the --leave-temps and --gen-json options.
For all scans, each scan target will get a unique tmp sub-directory. If
using --leave-temps, that subdir will include the basename of the
original file to make it easier to identify. Additionally, when using
--leave-temps option, all extracted objects will have their
subdirectories extracted in recursive subdirectories including filename
prefixes where available. When not using the --leave-temps option, the
layout of the tmp sub-directory will remain flat, so as to alleviate the
possibility of exceeding PATH_MAX.
The JSON metadata generated by the --gen-json option is now generated
for all file types, not just a select few. The format is also
pretty-printed for readability and now includes filenames and file paths
when available.
Also:
- Added missing ALLMATCH check when determining if bytecode hooks should
be run.
- Added cl_engine_get_str API to windows libclamav symbol export file.