The cli_max_malloc, cli_max_calloc, and cli_max_realloc functions
provide a way to protect against allocating too much memory
when the size of the allocation is derived from the untrusted input.
Specifically, we worry about values in the file being scanned being
manipulated to exhaust the RAM and crash the application.
There is no need to check the limits if the size of the allocation
is fixed, or if the size of the allocation is necessary for signature
loading, or the general operation of the applications.
E.g. checking the max-allocation limit for the size of a hash, or
for the size of the scan recursion stack, is a complete waste of
time.
Although we significantly increased the max-allocation limit in
a recent release, it is best not to check an allocation if the
allocation will be safe. It would be a waste of time.
I am also hopeful that if we can reduce the number allocations
that require a limit-check to those that require it for the safe
scan of a file, then eventually we can store the limit in the scan-
context, and make it configurable.
We have some special functions to wrap malloc, calloc, and realloc to
make sure we don't allocate more than some limit, similar to the
max-filesize and max-scansize limits. Our wrappers are really only
needed when allocating memory for scans based on untrusted user input,
where a scan file could have bytes that claim you need to allocate
some ridiculous amount of memory. Right now they're named:
- cli_malloc
- cli_calloc
- cli_realloc
- cli_realloc2
... and these names do not convey their purpose
This commit renames them to:
- cli_max_malloc
- cli_max_calloc
- cli_max_realloc
- cli_max_realloc2
The realloc ones also have an additional feature in that they will not
free your pointer if you try to realloc to 0 bytes. Freeing the memory
is undefined by the C spec, and only done with some realloc
implementations, so this stabilizes on the behavior of not doing that,
which should prevent accidental double-free's.
So for the case where you may want to realloc and do not need to have a
maximum, this commit adds the following functions:
- cli_safer_realloc
- cli_safer_realloc2
These are used for the MPOOL_REALLOC and MPOOL_REALLOC2 macros when
MPOOL is disabled (e.g. because mmap-support is not found), so as to
match the behavior in the mpool_realloc/2 functions that do not make use
of the allocation-limit.
There are a large number of allocations for fix sized buffers using the
`cli_malloc` and `cli_calloc` calls that check if the requested size is
larger than our allocation threshold for allocations based on untrusted
input. These allocations will *always* be higher than the threshold, so
the extra stack frame and check for these calls is a waste of CPU.
This commit replaces needless calls with A -> B:
- cli_malloc -> malloc
- cli_calloc -> calloc
- CLI_MALLOC -> MALLOC
- CLI_CALLOC -> CALLOC
I also noticed that our MPOOL_MALLOC / MPOOL_CALLOC are not limited by
the max-allocation threshold, when MMAP is found/enabled. But the
alternative was set to cli_malloc / cli_calloc when disabled. I changed
those as well.
I didn't change the cli_realloc/2 calls because our version of realloc
not only implements a threshold but also stabilizes the undefined
behavior in realloc to protect against accidental double-free's.
It may be worth implementing a cli_realloc that doesn't have the
threshold built-in, however, so as to allow reallocaitons for things
like buffers for loading signatures, which aren't subject to the same
concern as allocations for scanning possible malware.
There was one case in mbox.c where I changed MALLOC -> CLI_MALLOC,
because it appears to be allocating based on untrusted input.
Use the existing clamscan image fuzzy hash tests to verify that both
--scan-image=no and also --scan-image-fuzzy-hash=no will disable image
fuzzy hash based detection (at least for PNG files).
The --force-to-disk option is missing from the clamscan --help and
clamscan manpage documentation.
Also change clamd.conf.sample suggestions to differ the from default
settings so that the sample is easier to use.
Image fuzzy hashing is enabled by default. The following options have
been added to allow users to disable it, if desired.
New clamscan options:
--scan-image[=yes(*)/no]
--scan-image-fuzzy-hash[=yes(*)/no]
New clamd config options:
ScanImage yes(*)/no
ScanImageFuzzyHash yes(*)/no
New libclamav scan options:
options.parse &= ~CL_SCAN_PARSE_IMAGE;
options.parse &= ~CL_SCAN_PARSE_IMAGE_FUZZY_HASH;
This commit also changes scan behavior to disable image fuzzy hashing
for specific types when the DCONF (.cfg) signatures disable those types.
That is, if DCONF disables the PNG parser, it should not only disable
the CVE/format checker for PNG files, but also disable image fuzzy
hashing for PNG files.
Also adds a DCONF option to disable image fuzzy hashing:
OTHER_CONF_IMAGE_FUZZY_HASH
DCONF allows scanning features to be disabled using a configuration
"signature".
Make the usage clear to the user that the option specified by
'--datadir' must be an absolute path to a directory that already exists,
and is writeable by freshclam and readable by clamscan/clamd.
The '%f' filename format character has been disabled and will no longer
be replaced with the file name, due to command injection security concerns.
Use the 'CLAM_VIRUSEVENT_FILENAME' environment variable instead.
For the same reason, you should NOT use the environment variables in the
command directly, but should use it carefully from your executed script.
The CMake code to install the empty database directory doesn't appear to
do anything. Further, it is causing build issues for some users.
I tested some with trying to install the database directory directly,
like this:
```cmake
if(IS_ABSOLUTE ${DATABASE_DIRECTORY})
INSTALL(DIRECTORY DESTINATION $ENV{DESTDIR}${DATABASE_DIRECTORY} COMPONENT programs)
else()
INSTALL(DIRECTORY DESTINATION $ENV{DESTDIR}${CMAKE_INSTALL_PREFIX}/${DATABASE_DIRECTORY} COMPONENT programs)
endif()
```
However, this causes build failures for Windows and macOS installers.
Windows:
```
CMake Error at C:/Users/build/Documents/jenkins/workspace/test-pipelines_build-1.3/clamav-devel/build64/freshclam/cmake_install.cmake:66 (message):
ABSOLUTE path INSTALL DESTINATION forbidden (by caller):
C:/Users/build/Documents/jenkins/workspace/test-pipelines_build-1.3/clamav-devel/build64/install/database/
Call Stack (most recent call first):
C:/Users/build/Documents/jenkins/workspace/test-pipelines_build-1.3/clamav-devel/build64/cmake_install.cmake:133 (include)
```
macOS:
```
CMake Error at /Users/****/jenkins/workspace/test-pipelines_build-1.3/clamav-devel/build-pkg/freshclam/cmake_install.cmake:111 (file):
file INSTALL cannot make directory "/usr/local/clamav/share/clamav": No
such file or directory.
Call Stack (most recent call first):
/Users/****/jenkins/workspace/test-pipelines_build-1.3/clamav-devel/build-pkg/cmake_install.cmake:122 (include)
CPack Error: Error when generating package: ClamAV
```
I think my best option is just to remove this line of code since it's causing
problems for some and isn't working for anyone. Even if it would be
nice to have the database directory created at install time automatically.
Fixes: https://github.com/Cisco-Talos/clamav/issues/1142
Store temp files with obj id and gen id so analysts know which is which.
Don't dump decoded objects immediately. They'll get dumped later at the
end of pdf_extract_obj().
At the end of PDF object extraction, we don't need to find out the
"dumpid" (aka the object index in our list of pdf objects).
It isn't actually used! So I removed the unused parameter.
Some builds using the tarball with the vendored Rust dependencies are
failing.
The onenote-rs dependency is presently tied to a git branch from github
rather than using a release from crates.io. This is the differing factor
though I'm unsure why it is causing the build to try to update the repo
rather than just building the vendored source.
This commit adds a `--offline` parameter to the build options if the
vendored source is detected, in an attempt to force Cargo to use what it
has and stay offline.
It may be necessary to differentiate between *.pyc and other binary
types in case additional processing is needed.
Outside of being able to differentiate the by file type, the scanner
will treat CL_TYPE_PYTHON_COMPILED the same as CL_TYPE_BINARY_DATA.
That is - we're not adding parser at this time to further break down
.pyc files.
Fix Coverity issues 192935, 192932, 192928, and 192917.
None of these are particularly serious. I thought I'd clean them up
while trying to track down a strange crash that occurs in Windows debug
builds with my specific setup when freeing the metadata filename pointer
malloced by the UnRAR iface "peek" function.
I wasn't able to figure out why freeing that causes a crash, so instead
I converted it to an array that need not be freed, and my troubles
melted away.
The fmap structure has some stuff that differs in size in memory between
Linux and Windows, and between 32bit and 64bit architectures.
Notably, `time_t` appears to be defined by the Rust bindgen module as
`ulong` which may be either 8 bytes or 4 bytes, depending architecture
(thanks, C). To resolve this, we'll store time as a uint64_t instead.
The other problem in the fmap structure is the windows file and map
handles should always be exist, and may only be used in Windows, for
consistency in sizing of the structure.
In order to generate Rust bindings for C code, Rust's bindgen module
needs to know where to find all headers included by the API.
If they're all inside the project or inside the standard include path
(e.g. /usr/include and /usr/local/include) that's fine. But for third-
party C library headers from outside the standard include path, that's
a problem.
We didn't really notice this problem when generating on Unix systems
until we switched to use OpenSSL 3.1 and tested on systems that have
the OpenSSL 1.1.1 dev package installed.
The ability to find headers outside the project path is also needed to
generate bindings on Windows, if desired.
This commit solves the problem by passing include directories for the
ClamAV::libclamav CMake build target to the Rust build via the
CARGO_INCLUDE_DIRECTORIES environment variable.
Then, in the `libclamav_rust/build.rs` script, where we run bindgen,
we split that `;` separated string into invididual paths and add each
to the bindgen builder.
Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.