Some builds using the tarball with the vendored Rust dependencies are
failing.
The onenote-rs dependency is presently tied to a git branch from github
rather than using a release from crates.io. This is the differing factor
though I'm unsure why it is causing the build to try to update the repo
rather than just building the vendored source.
This commit adds a `--offline` parameter to the build options if the
vendored source is detected, in an attempt to force Cargo to use what it
has and stay offline.
It may be necessary to differentiate between *.pyc and other binary
types in case additional processing is needed.
Outside of being able to differentiate the by file type, the scanner
will treat CL_TYPE_PYTHON_COMPILED the same as CL_TYPE_BINARY_DATA.
That is - we're not adding parser at this time to further break down
.pyc files.
Fix Coverity issues 192935, 192932, 192928, and 192917.
None of these are particularly serious. I thought I'd clean them up
while trying to track down a strange crash that occurs in Windows debug
builds with my specific setup when freeing the metadata filename pointer
malloced by the UnRAR iface "peek" function.
I wasn't able to figure out why freeing that causes a crash, so instead
I converted it to an array that need not be freed, and my troubles
melted away.
The fmap structure has some stuff that differs in size in memory between
Linux and Windows, and between 32bit and 64bit architectures.
Notably, `time_t` appears to be defined by the Rust bindgen module as
`ulong` which may be either 8 bytes or 4 bytes, depending architecture
(thanks, C). To resolve this, we'll store time as a uint64_t instead.
The other problem in the fmap structure is the windows file and map
handles should always be exist, and may only be used in Windows, for
consistency in sizing of the structure.
In order to generate Rust bindings for C code, Rust's bindgen module
needs to know where to find all headers included by the API.
If they're all inside the project or inside the standard include path
(e.g. /usr/include and /usr/local/include) that's fine. But for third-
party C library headers from outside the standard include path, that's
a problem.
We didn't really notice this problem when generating on Unix systems
until we switched to use OpenSSL 3.1 and tested on systems that have
the OpenSSL 1.1.1 dev package installed.
The ability to find headers outside the project path is also needed to
generate bindings on Windows, if desired.
This commit solves the problem by passing include directories for the
ClamAV::libclamav CMake build target to the Rust build via the
CARGO_INCLUDE_DIRECTORIES environment variable.
Then, in the `libclamav_rust/build.rs` script, where we run bindgen,
we split that `;` separated string into invididual paths and add each
to the bindgen builder.
Includes rudimentary support for getting slices from FMap's and for
interacting with libclamav's context structure.
For now will use a Cisco-Talos org fork of the onenote_parser
until the feature to read open a onenote section from a slice (instead
of from a filepath) is added to the upstream.
HTML files with <style> blocks containing non-utf8 sequences are causing
warnings when processing them to extract base64 encoded images.
To resolve this, we can use the to_string_lossy() method that may
allocate and sanitize a copy of the content if the non-utf8 characters
are encountered.
Resolves: https://github.com/Cisco-Talos/clamav/issues/1082
Some PDF's with an empty password can't be decrypted. Investigation
found that the problem is a strlen check to prevent an overflow rather
than passing down the actual length of the allocated field.
Specifically, the UE buffer may have NULL values in it, so a strlen
check will claim the field is shorter than it is and then later checks
fail because the length is the wrong size.
While at it, I improved code comments on the function reading dictionary
key-value strings and switched a flag use a bool rather than an int.
Prevent allocating more than 1GB regardless of what is requested.
RAR dictionary sizes may not be larger than 1GB, at least in the current
version.
This is a cherry-pick of commit 9b444e7e02
This is a cherry-pick of commit 24f225c21f
Modification to unrar codebase allowing skipping of files within
Solid archives when parsing in extraction mode, enabling us to skip
encrypted files while still scanning metadata and potentially
scanning unencrypted files later in the archive.
UnRAR logic replaces directory symlinks found within archive file entry
file paths with actual directories by deleting them after they're
extracted.
Unfortunately, this logic extends to deleting existing directories if you
set the `DestName` instead of the `DestPath` in this API:
rc = RARProcessFile(hArchive, RAR_EXTRACT, NULL, destFilePath);
In the future UnRAR may change to disable the `LinksToDirs()` feature
if using the `DestName` parameter. In the meantime, this commit
completely disables it for our use case.
The --alert-exceeds-max feature should alert for all files larger than
2GB because 2GB is the internal limit for individual files.
This isn't working correctly because the `goto done;` exit condition
after recording the exceeds-max heuristic skips over the logic that
reports the alert.
This fix moves the ">2GB" check up to the location where the
max-filesize engine option is set by clamd or clamscan.
If max-filesize > 2GB - 1 is requested, then max-filesize is set to
2GB - 1.
Additionally, a warning is printed if max-filesize > 2GB is requested
(with an exception for when it's maxed out by setting --max-filesize=0).
Resolves: https://github.com/Cisco-Talos/clamav/issues/1030
Developers of FreeBSD base system are currently working to upgrade its
LLVM/Clang/LLDB/LLD to 17. As a part of it they tried building all
ports in FreeBSD ports collections to check if build of them succeeds
with LLVM/Clang/LLD 17. As a result there are some ports that fail to
be built with it and unfortunately `security/clamav` is one of
them. The build of it fails with link error as following.
```
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_cvdunpack' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_dbgmsg_internal' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'init_domainlist' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'init_whitelist' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_parse_add' failed: symbol not defined
ld: error: version script assignment of 'CLAMAV_PRIVATE' to symbol 'cli_bytecode_context_clear' failed: symbol not defined
cc: error: linker command failed with exit code 1 (use -v to see invocation)
```
According to the investigation of ClamAV's source code,
`cli_cvdunpack` is a static function so it isn't visible to external
consumers. And other mentioned symbols aren't found anywhere. So fix
link error by removing all of them from linker version script.
Prevent allocating more than 1GB regardless of what is requested.
RAR dictionary sizes may not be larger than 1GB, at least in the current
version.
This is a cherry-pick of commit 9b444e7e02
This is a cherry-pick of commit 24f225c21f
Modification to unrar codebase allowing skipping of files within
Solid archives when parsing in extraction mode, enabling us to skip
encrypted files while still scanning metadata and potentially
scanning unencrypted files later in the archive.
UnRAR logic replaces directory symlinks found within archive file entry
file paths with actual directories by deleting them after they're
extracted.
Unfortunately, this logic extends to deleting existing directories if you
set the `DestName` instead of the `DestPath` in this API:
rc = RARProcessFile(hArchive, RAR_EXTRACT, NULL, destFilePath);
In the future UnRAR may change to disable the `LinksToDirs()` feature
if using the `DestName` parameter. In the meantime, this commit
completely disables it for our use case.
json-c 0.17 defines the ssize_t type using a typedef on Windows.
We have been setting ssize_t for Windows to a different type using
a #define instead of a typedef.
We should have been using a typedef since it is a type.
However, we must also match the exact type they're setting it to or else
the compiler will baulk because the types are different.
Note: in C11, it's fine to use typedef the same type more than once, so
long as you're defining it the same every time.
Having the filename is useful for certain callbacks, and will likely be
more useful in the future if we can start comparing detected filetypes
with file extensions.
E.g. if filetype is just "binary" or "text" we may be able to do better
by trusting a ".js" extension to determine the type.
Or else if detected file type is "pe" but the extension is ".png" we may
want to say it's suspicious.
Also adjusted the example callback program to disable metadata option.
The CL_SCAN_GENERAL_COLLECT_METADATA is no longer required for the Zip
parser to record filenames for embedded files, and described in the
previous commit.
This program can be used to demonstrate that it is behaving as desired.