We must pass the LLVM library dependencies to the libclamav_rust
build.rs script so it links the libclamav_rust unit test executable with
LLVM.
Also:
- We can remove the libtinfo dependency that was hardcoded for the LLVM
3.6 support, and must remove it for the build to work on Alpine, macOS.
- Also, increased the libcheck default timeout from 60s to 300s after
experiencing a failure while testing this.
- Also made one of the valgrind suppressions more generic to account for
inline optimization differences observed in testing this.
* Broke out the variants of error/result handling in `frs_error.rs`.
Made syntax slightly cleaner for `frs_call!`, explicitly moving
the output variables *out* of the function call so as not to make
the parameter order confusing.
* Wrapped the FuzzyHash map into a container rather than exposing
the HashMap directly. Simplifies casting, and allows it to feel
more like a class with methods.
* Fixed various clippy complaints regarding unsafe, etc.
* Rename `frs_error.rs` to `ffi_utils.rs` and migrated ffi-specific
features like the `validate_str_param!()` macro to this new module.
Extends the new frs_error module to provide variants of the
frs_result!() macro that accept a Result as input instead of calling a
function on your behalf. This enables us to use the macro in conditions
where we don't want to return on success, and want to do other things
before we return.
Use the new frs_error module to return errors to the C calling functions
rather than logging the error in Rust-land.
Notably, this enables us to store more meaningful error messages in the
JSON output if we fail to calculate the image fuzzy hash.
The new frs_error module makes it so you can painlessly transfer Rust
errors to-and-from C.
On the Rust-side, on failure it returns an error struct to the C caller.
The C caller detects the failure by checking the return value.
On failure, the caller may use an additional frs_error Rust API to get a
formatted error message.
Two variants are provided:
- One for functions that return a boolean and have an out-parameter for
function output.
- One for functions that return a pointer, where NULL means failure and
non-NULL contains the successful function output.
Updates to image fuzzy hash algorithm to mimick the phash() function
from the Python imagehash package as closely as possible.
The primary difference is that the DCT is now a 2D DCT-2 instead of
a standard DCT-2 on a flattened array of the image data.
We also needed to:
- drop the alpha channel for images with transparency.
- use the grayscale coefficients used by Python's Pillow library.
- round rather than truncate when converting from f32 -> u8.
- use a median instead of a mean for finding the average frequency in
the low-frequency vector. Previously I had though we were
implementing phash_simple(), which used mean. Sorry Mickey!
- multiply the Rust DCT-2 results * 2, to match DCT results from the
Python package.
Note:
The image-rs crate (v0.24) doesn't currently support using custom
RGB->Luma constants and doesn't round when converting Luma f32 back to
Luma u8 values.
So, this commit includes a custom greyscale() function just for rgb8
images so we can match the greyscale algorithm used in the Python
Pillow package.
See also: https://github.com/image-rs/image/issues/1554
The change in the code that converts the hash bit-vector to a Vec<u8>,
removing map_while(), was done to support older Rust versions,
such as the one currently in Alpine 3.15.0.
Add a new logical signature subsignature type for matching on images
with image fuzzy hashes.
Image fuzzy hash subsigantures follow this format:
fuzzy_img#<hash>#<dist>
In this initial implementation, the hamming distance (dist) is ignored
and only exact fuzzy hash matches will alert.
Fuzzy hash matching is only performed for supported image types.
Also: removed some excessive debug log messages on start-up.
Fixed an issue where the signature name (virname) is being allocated and
stored for every subsignature or even ever sub-pattern in an AC-pattern
(i.e. NDB sig or LDB subsig) containing a `{n-m}` or `*` wildcard.
This fix is only for LDB subsigs though. NDB signatures are still
allocaing one virname per sub-pattern.
This fix was required because I needed a place to store the virname with
fuzzy-hash subsignatures. Storing it in the fuzzy-hash subsig
metadatathe way AC-pattern, PCRE, and BComp subsigs were doing it
wouldn't work because it would cross the C-Rust FFI boundary and giving
pointers to Rust allocated stuff is dicey. Not to mention native Rust
strings are different thatn C strings. Anyways, the correct thing to do
was to store the virname with the actual logical signature.
TODO: Keep track of NDB signatures in the same way and store the virname
for NDB sigs there instead of in AC-patterns so that we can get rid of
the virname field in the AC-pattern struct.
The current method of trying to determine the target triple based
the architecture, operating system, etc. is difficult to get right
and maintain. In particular, I found that some installations like the
Alpine package will use "alpine" in the name of the triple instead of
"unknown".
E.g. "aarch64-alpine-linux-musl" instead of "aarch64-unknown-linux-musl".
This makes it nearly impossible to figure out the exact target name
based on target system specifications.
I believe it will be better to use the default target 99% of the time
and require users that which to cross-compile to use a new CMake
variable `-D RUST_TARGET_TRIPLE=<target>` to choose the target.
This is basically how it works for C/C++ anyways.
* Work with data as &[u8] instead String/&str to avoid unnecessary UTF-8
validation and reuse read buffers.
* Make error handling more concise
* Address Clippy-raised issues
In testing, I found that libclang.so/clang.dll is required by bindgen
and was not found on all of our test machines. To resolve this we will
only generate sys.rs bindings when CMake MAINTAINER_MODE option is "ON".
This is unfortunate that we have to commit generated source to version
control. But as a benefit it makes rust-analyzer happier when editing a
workspace that hasn't yet been compiled. And it makes it more reasonable
that the generated sys.rs file generated to the source directory and not
the build directory (something we hadn't resolved yet).
Use bindgen to generate Rust-bindings for some libclamav internal
functions and structures in an new "sys.rs" module.
"sys.rs" will be generated at build-time and unfortunately must be
dropped in the source/libclamav_rust/src directory rather than under the
build directory. As far as I know, Cargo/Rust provide no way to set an
include path to the build directory to separately place generated files.
TODO: Verify if this is true and move sys.rs to the build directory if
possible.
Using the new bindings with:
- the logging module.
- the cdiff module.
Also:
- Removed clamav_rust.h from .gitignore, because we generate it in the
build directory now.
- Removed the hand-written bindings from the cbindgen exclusions.
lib.rs has an annotation that prevents cbindgen from looking at sys.rs.
- Fixed a `u8` -> `c_char` type issue in cdiff in the cli_getdsig() call
parameters.
Rustify cdiff_apply() and clean up error handling:
- Restore [some] safety and clean up error handling.
- Use rust-crypto sha2 instead of OpenSSL's.
Fix signedness of cli_versig2() dsig parameter.
c_char may be an i8 or u8 depending on platform:
https://doc.rust-lang.org/src/std/os/raw/mod.rs.html#91-133
Rustify cmd_close():
- Consolidate DEL/XCHG records.
- Tidy up ADD handling.
- Various error handling cleanup, etc.
Remove some extra clones that don't seem to be neceesary.
Replace writeln format macro with plain write command.
Use a buffered writer for deletes, exchanges, & writes.
Switching from individual `write` syscalls per change to a
buffered writer appears to speed up cdiff-apply by about 2x.
Apply both .cdiff and .script CVD patches.
Note: A script is a non-compressed and unsigned file containing cdiff
commands. There is no header or footer that should be processed.
This Rust-based implementation of the cdiff-apply feature includes
equivalent features as found in the C-based implementation:
- cdiff file signature validation against sha256 of the file contents
- Gz decoding of file contents
- File open command
- File close command
- Signature add command
- Line delete command
- Xchg command
- Move command
- Unlink command
This Rust implementation adds cdiff-apply unit tests to verify correct
functionality.
Add a basic unit test for the new libclamav_rust `logging.rs` module.
This test simply initializes logging and then prints out a message with
each of the `log` macros.
Also set the Rust edition to 2018 because the default is the 2015
edition in which using external crates is very clunky.
For the Rust test support in CMake this commit adds support for
cross-compiling the Rust tests.
Rust tests must be built for the same LLVM triple (target platform) as
the rest of the project. In particular this is needed to build both
x64 and x86 packages on a 64bit Windows host.
For Alpine, we observed that the LLVM triple for the host platform tools
may be either:
- x86_64-unknown-linux-musl, or
- x86_64-alpine-linux-musl
To support it either way, we look up the host triple with `rustc -vV`
and use that if the musl libc exists. This is a big hacky and
unfortunately means that we probably can't cross-compile to other
platforms when running on a musl libc host. There are probably
improvements to be made to improve cross compiling support.
The Rust test programs must link with libclamav, libclammspack, and
possibly libclamunrar_iface and libclamunrar plus all of the library
dependencies for those libraries.
To do this, we pass the path of each library in environment variables
when building the libclamav_rust unit test program.
Within `libclamav_rust/build.rs`, we read those environment variables.
If set, we parse each into library path and name components to use
as directives for how to build the unit test program.
See: https://doc.rust-lang.org/cargo/reference/build-scripts.html
Our `build.rs` file ignores the library path environment variables if
thye're not set, which is necessary when building the libclamav_rust
library and when libclamunrar isn't static and for when not linking with
a libiconv external to libc.
Rust test programs are built and executed in subdirectory under:
<target>/<llvm triple>/<config>/deps
where "target" for libclamav_rust tests is set to <build>/unit_tests
For example:
clamav/build/unit_tests/x86_64-pc-windows-msvc/debug/deps/clamav_rust-7e1343f8a2bff1cc.exe
Since this program isn't co-located with the rest of the libraries
we also have to set environment variables so the test program can find and
load the shared libraries:
- Windows: PATH
- macOS: DYLD_LIBRARY_PATH
We already set LD_LIBRARY_PATH when not Windows for similar reasons.
Note: In build.rs, we iterate references to LIB_ENV_LINK & Co because
older Rust versions do implement Iterator for [&str].
Add a top-level Cargo.toml.
Remove the vestigial libclamav_rust/Makefile.am.
Place Rust source under a libclamav_rust/src directory as is canonical
for Rust projects.
Convert cli_dbgmsg to inline function to ensure ctx check for debug flag
is always run
Add copyright and licensing info
Fix valgrind uninitialized buffer issue in cliunzip.c
Windows build fix