The Jenkinsfile renames the tarball, removing the version string suffix.
This is problematic because A) we want that suffix when we publish
release candidates and B) the tarball should extract with the same
directory name as the tarball name.
CMake/CPack is already used to build:
- TGZ source tarball
- WiX-based installer (Windows)
- ZIP install packages (Windows)
This commit adds support for building:
- macOS PKG installer
- DEB package
- RPM package
This should also enable building FreeBSD packages, but while I was able
to build all of the static dependencies using Mussels, CMake/CPack 3.20
doesn't appear to have the the FreeBSD generator despite being in the
documentation.
The package names are will be in this format:
clamav-<version><suffix>.<os>.<arch>.<extension>
This includes changing the Windows .zip and .msi installer names.
E.g.:
- clamav-0.104.0-rc.macos.x86_64.pkg
- clamav-0.104.0-rc.win.win32.msi
- clamav-0.104.0-rc.win.win32.zip
- clamav-0.104.0-rc.win.x64.msi
- clamav-0.104.0-rc.linux.x86_64.deb
- clamav-0.104.0-rc.linux.x86_64.rpm
Notes about building the packages:
I've only tested this with building ClamAV using static dependencies that
I build using the clamav_deps "host-static" recipes from the "clamav"
Mussels cookbook. Eg:
msl build clamav_deps -t host-static
Here's an example configuration to build clam in this way, installing to
/usr/local/clamav:
```sh
cmake .. \
-D CMAKE_FIND_PACKAGE_PREFER_CONFIG=TRUE \
-D CMAKE_PREFIX_PATH=$HOME/.mussels/install/host-static \
-D CMAKE_INSTALL_PREFIX="/usr/local/clamav" \
-D CMAKE_MODULE_PATH=$HOME/.mussels/install/host-static/lib/cmake \
-D CMAKE_BUILD_TYPE=RelWithDebInfo \
-D ENABLE_EXAMPLES=OFF \
-D JSONC_INCLUDE_DIR="$HOME/.mussels/install/host-static/include/json-c" \
-D JSONC_LIBRARY="$HOME/.mussels/install/host-static/lib/libjson-c.a" \
-D ENABLE_JSON_SHARED=OFF \
-D BZIP2_INCLUDE_DIR="$HOME/.mussels/install/host-static/include" \
-D BZIP2_LIBRARY_RELEASE="$HOME/.mussels/install/host-static/lib/libbz2_static.a" \
-D OPENSSL_ROOT_DIR="$HOME/.mussels/install/host-static" \
-D OPENSSL_INCLUDE_DIR="$HOME/.mussels/install/host-static/include" \
-D OPENSSL_CRYPTO_LIBRARY="$HOME/.mussels/install/host-static/lib/libcrypto.a" \
-D OPENSSL_SSL_LIBRARY="$HOME/.mussels/install/host-static/lib/libssl.a" \
-D LIBXML2_INCLUDE_DIR="$HOME/.mussels/install/host-static/include/libxml2" \
-D LIBXML2_LIBRARY="$HOME/.mussels/install/host-static/lib/libxml2.a" \
-D PCRE2_INCLUDE_DIR="$HOME/.mussels/install/host-static/include" \
-D PCRE2_LIBRARY="$HOME/.mussels/install/host-static/lib/libpcre2-8.a" \
-D CURSES_INCLUDE_DIR="$HOME/.mussels/install/host-static/include" \
-D CURSES_LIBRARY="$HOME/.mussels/install/host-static/lib/libncurses.a" \
-D ZLIB_INCLUDE_DIR="$HOME/.mussels/install/host-static/include" \
-D ZLIB_LIBRARY="$HOME/.mussels/install/host-static/lib/libz.a" \
-D LIBCHECK_INCLUDE_DIR="$HOME/.mussels/install/host-static/include" \
-D LIBCHECK_LIBRARY="$HOME/.mussels/install/host-static/lib/libcheck.a"
```
Set CPACK_PACKAGING_INSTALL_PREFIX to customize the resulting package's
install location. This can be different than the install prefix. E.g.:
```sh
-D CMAKE_INSTALL_PREFIX="/usr/local/clamav" \
-D CPACK_PACKAGING_INSTALL_PREFIX="/usr/local/clamav" \
```
Then `make` and then one of these, depending on the platform:
```sh
cpack # macOS: productbuild is default
cpack -G DEB # Debian-based
cpack -G RPM # RPM-based
```
On macOS you'll need to `pip3 install markdown` so that the NEWS.md file can
be converted to html so it will render in the installer.
On RPM-based systems, you'll need rpmbuild (install rpm-build)
This commit also fixes an issue where the html manual (if present) was
not correctly added to the Windows (or now other) install packages.
Fix num to hex function for Windows installer guid
Fix win32 cpack build
Fix macOS cpack build
The access-denied test and excludepath tests both relied on the full
path of the test file to be in the expected results. This fails if
you're working within a path that has a symlink because clamd and
clamdscan determine real-paths before scanning and end up sending
back the real-path in the results, not the original path.
This fixes the tests by removing the full paths from the expected
results.
I also cleaned up some type safety warnings.
The CURL_CA_BUNDLE environment variable used by freshclam & clamsubmit to
specify a custom path to a CA bundle is undocumented.
Feature was added here: https://bugzilla.clamav.net/show_bug.cgi?id=12504
Resolves: https://github.com/Cisco-Talos/clamav/issues/175
Also document:
- clamd/clamscan: using LD_LIBRARY_PATH to find libclamunrar_iface.so/dylib
- sigtool: using SIGNDUSER, SIGNDPASS for auth creds when building CVD
This info also needs to be added to the online documentation.
* Changed rename() on Windows
via w32_rename(). rename() doesn't work on Windows if the dest file
already exists.
* Change access() and buildcld() to support UNC paths
access uses CreateFileA() and buildcld() opens absolute path to tmpdir
Move all step-by-step instructions for installing dependencies to
docs.clamav.net.
INSTALL.md serves to direct folks to our online documentation (or
the offline copy in the release tarball), and as a reference for
all custom config options.
Add some introductory CMake material to help people new to CMake.
Add un-install instructions.
Also fix broken links in README.md.
For reference, version 0.103 started at 120 and we're already at 124
with v0.103.3.
Ordinarily we would reserve 10 FLEVELs for each feature release, but
we're implementing a new Long Term Support (LTS) program and will be
starting with 0.103, which means additional critical bug fixes for the
0.103 series for the next 2-3 years.
This commit pushes v0.104's FLEVEL to 140 to ensure that there will be
enough FLEVELs for future 0.103 patch versions.
docs: Fix a few typos
There are small typos in:
- libclamav/others_common.c
- libclamav/pe.c
- libclamav/unzip.c
Fixes:
- Should read `descriptor` rather than `desriptor`.
- Should read `record` rather than `reocrd`.
- Should read `overarching` rather than `overaching`.
Cause: _wopen() on Windows doesn't work on directories and gives a
Permission Denied error.
The old approach used _wopen() to get a file descriptor and gets the
realpath from that.
The new approach opens a HANDLE with CreateFileA() with
FILE_FLAG_BACKUP_SEMANTICS to support directories.
Refactor the cli_get_filepath_from_filedesc() function by adding
cli_get_filepath_from_handle().
This fixes a fatal issue that would occur when unable to queue events due to
clamonacc improperly using all available fds.
It also fixes the core fd socket leak issue at the heart of the segfault by
properly cleaning up after a failed curl connection.
Lastly, worst case recovery code now allows more time for consumer queue
to catchup. It accomplishes this by increasing wait time and adding
retry logic.
More info: https://github.com/Cisco-Talos/clamav/issues/184
In openSUSE Tumbleweed, this test always fails because it compiles with `-Werror=return-type` by default. Fixing this by adding a return value in the test script to keep the compiler happy.
Since strcpy only writes a null-byte to the first terminating byte,
valgrind is throwing errors about uninitialized reads for strncpy's
that could potentially read beyond the null-byte. Initializing
the whole array to 0 resolves this.
A bug introduced in the OLE2 BIFF XLM & image extraction code is causing
some file scans to fail when part of the macro extraction fails, such as
failing to transcode UTF16LE (Windows unicode) macros to UTF-8.
This commit allows scanning to continue without failing out if the
expected BIFF temp files aren't found.
I also changed the cli_codepage_to_utf8() "incomplete multibyte
sequence" warning to be a debug message, because it is too common, and
too verbose.
This is a cherry-pick of commit 24f225c21f
Modification to unrar codebase allowing skipping of files within
Solid archives when parsing in extraction mode, enabling us to skip
encrypted files while still scanning metadata and potentially
scanning unencrypted files later in the archive.
Updates to prepare for the 0.104 release candidate:
- Change documentation to explain current bytecode runtime situation.
- Document Python 2 pytest issue.
- Add additional contributors to acknowledgements.
- Update Install instructions to note that Autotool has been removed.
- Add *.cat SHA256 support and PDF bytecode hook bugfix to the News.
- Clarify purpose of the clamscan `--gen-json` option in the
clamscan --help.
In 0.104.0 we added new load/compile/free progress callback APIs to
clamav.h
This is a backwards compatible change, so we're bumping the current and
age fields, and resetting the revision.
See http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html
for more info about libtool style .so versioning.
Bytecode signatures targeting PDF files (target 10) fail to evaluate
match conditions. This occurs because the raw scan step during a magic
scan currently occurs _after_ the PDF scan, thus at the time the hook is
triggered, no matches have been performed and the logical sig eval is
always negative.
This commit fixes the issue by relocating the PDF parsing step to occur
after the raw scan, thereby enabling the bytecode lsig evaluation to
match and the signature to execute.
The embedded file type recognition feature scans files for embedded
files. This can identify things like self extracting zips (ZIPSFX), as
well as file types like DMG and MHTML files that can't be easily
identified using the start of the file.
It can waste CPU if you detect SFX files and then use the embedded
file type recognition scan on those SFX files, potentially detecting and
processing the same portions over and over.
There is loop in the SIS scanning parser where an SIS header may point
to the beginning of the file as the start of an archived file.
This would be an infinite loop if not for the scan recursion limit.
This commit fixes that by making sure that both the file records and the
individual file pointers start after the main SIS header.
Python 3.5 compatibility fixes for Debian 9, etc that lack 3.6+.
Change a python f-string to an old-style `"".format()`.
Convert Path objects to strings for older `shutil` APIs that don't
accept Paths.
Fix missing return values for progress callbacks.
Fix Windows build.
The cli_debug_flag variable is not exported on Windows. The correct way
to check if in debug-mode is to check the command line options.
Added a test to verify that clamscan can extract images from an XLS
document. The document has 2 images: a PNG and JPEG version of the
clamav demon/logo. The test requires the json metadata feature to verify
that the MD5 of the images are correct.
No other image formats were tested because despite the format allegedly
supporting other imate formats, Excel converts TIFF, BMP, and GIF images
to PNG files when you insert them.
The split test files are flagged by some AV's because they look like
broken executables. Instead of splitting the test files to prevent
detections, we should encrypt them. This commit replaces the "reassemble
testfiles" script with a basic "XOR testfiles" script that can be used
to encrypt or decrypt test files. This commit also of course then
replaces all the split files with xor'ed files.
The test and unit_tests directories were a bit of a mess, so I
reorganized them all into unit_tests with all of the test files placed
under "unit_tests/input" using subdirectories for different types of files.
Fixup input output params to be anotated with [in,out], not [in/out].
Note: skipped some other incorrectly annodated [out] params that are
already staged to be fixed in a different PR.
The previous image extraction logic would search from the beginning of
the drawing group for the image file type magic bytes and then just
assume the rest of the file is that type. This is super hacky, didn't
support more than one image extraction, and resulted in "image files"
that contain a bunch of extra garbage data (which may include more
images or maybe just some metadata about how the images are used).
This commit implemented part of the office draw file specification to
correctly identify the start and size of each embedded image. Instead of
processing the drawing group as though it is one image to be extracted,
it collects the drawing group data into a single buffer, and then parses
the records within to identify the images within Blip records.
Based on: https://interoperability.blob.core.windows.net/files/MS-ODRAW/%5bMS-ODRAW%5d.pdf
Also resolved the following issue:
If XLM (and now images) are found when parsing an ole2 files the
following other embedded content may not be processed:
- document summary metadata
- embedded ole10 files
- ole2 temp subdirectories (i.e. recursion)
The logic to process the above ole2 extracted temp files was present in
the function which processes extracted VBA. When we added support for
extracting XLM macros, processing these other data was lost.
Really, the above need to be processed if any temp files were saved.
I fixed this by restructuring the features to extract any type of temp
file into separate functions per type of temp file. I then wrappped
those in an ole2 temp dir scanning function. OLE2 temp directory scanning
is recursive if there are subdirectories.
Added a feature to extract images from OLE2 BIFF streams.
This work was derived from InQuests blog post about extracting XLM and
images from XLS files:
https://inquest.net/blog/2019/01/29/Carving-Sneaky-XLM-Files
Assorted ole2 parser code cleanup and massive error handling cleanup.
Also fixed the following:
- The XLS parser may fail to process all BIFF records if some of the
records contain unexpected data or is otherwise malformed. Because the
record size is already known, we can skip over the "malformed" record
and continue with the rest.
- Fixed an issue where the ole2 header size was improperly calculated,
failing to account for the new "has_xlm" boolean added for context.
Trusted SHA256-based Authenticode hashes can now be loaded in
from .cat files. In addition:
- Files that are covered by Authenticode hashes loaded in from
.cat files will now be treated as VERIFIED like executables
where the embedded Authenticode sig is deemed to be trusted
based on .crb rules. This fixes a regression introduced in
0.102 (I think).
- The Authenticode hashes for signed EXEs without .crb coverage
will no longer be computed in cli_check_auth_header unless
hashes from .cat rules have been loaded. This fixes a slight
performance regression introduced in 0.102 (I think).