clamav

Commit Graph

Author	SHA1	Message	Date
mko-x	a21cc6dcd7	Add explicit log level parameter to application logging API * Added loglevel parameter to logg() * Fix logg and mprintf internals with new loglevels * Update all logg calls to set loglevel * Update all mprintf calls to set loglevel * Fix hidden logg calls * Executed clam-format	4 years ago
micasnyd	140c88aa4e	Bump copyright for 2022 Includes minor format corrections.	4 years ago
micasnyd	d70adcb8b0	Fix ability to disable filesize limit with libclamav C API You should be able to disable the maxfilesize limit by setting it to zero. When "disabled", ClamAV should defer to inherent limitations, which at this time is INT_MAX - 2 bytes. This works okay for ClamScan and ClamD because our option parser converts max-filesize=0 to 4294967295 (4GB). But it is presently broken for other applications using the libclamav C API, like this: ```c cl_engine_set_num(engine, CL_ENGINE_MAX_FILESIZE, 0); ``` The limit checks added for cl_scanmap_callback and cl_scanfile_callback in 0.103.4 and 0.104.1 broke this ability because we forgot to check if the `maxfilesize > 0` before enforcing it. This commit adds that guard so you can disable by setting to `0`. While working on this, I also found that the `max_size` variables in our libmspack scanner code are using an `off_t` type, which is a SIGNED integer that may be 32bit width even on some 64bit platforms, or may be a 64bit width. AND the default `max_size` when `maxfilesize == 0` was being set to UINT_MAX (0xffffffff), aka `-1` when `off_t` is 32bits. This commit addresses this related issue by: - changing the `max_size` to use `uint64_t`, like our other limits. - verifying that `maxfilesize > 0` before using it. - checking that using `UINT32_MAX` as a backup will not exceed the max-scansize in the same way that we do with the maxfilesize.	4 years ago
Alexander Sulfrian	e8d7b2f236	Fix possibly undefined behavior with generated temp file names The max bytes supplied to strftime should be the length of result string, including the terminating null byte. Without the extra byte for the terminating null byte, the output is one byte too long and results in undefined behavior: If the length of the result string (including the terminating null byte) would exceed max bytes, then strftime() returns 0, and the contents of the array are undefined.	4 years ago
Mickey Sola	e8d78c9627	Add logging interface via the log crate for libclamav_rust modules	4 years ago
Micah Snyder	a1ca40fca1	Add libclamav_rust lib	4 years ago
Micah Snyder	d1141becac	Fix fmap handle_gets() page arithmetic The fixes to the fmap bounds for nested (duplicate) fmaps added recently introduced a subtle arithmetic bug that was detected by OSS-Fuzz: ```c scanat = m->nested_offset + at % m->pgsz; ``` should have been: ```c scanat = (m->nested_offset + at) % m->pgsz; ``` Without the parenthesis, `scanat` could be > `m->pgsz`, which would overflow in the subsequent `memchr()` call. See: - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40452 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40455 This commit also tightens up some of the other bounds checks done with `CLI_ISCONTAINED()` macro so the check limits the bounds to the nested fmap and not the original map. In addition, I've added a `CLI_ISCONTAINED_0_TO()` macro that removes checks when the "bigger" buffer starts at offset 0. This should silence a bunch of (benign) warnings and medium severity Coverity issues. There is also a possible use of an uninitialized variable (`old_hook_lsig_matches`) in `cli_magic_scan()`. Finally, I also removed an unecessary NULL-check on `filebase` in `fmap_dup_to_file()` that Coverity was unhappy with.	4 years ago
Micah Snyder	ffce672622	Fix recursion limit alert The previous commit broke alerting when exceeding the recursion limit because recursion tracking is so effective that by limiting the final layer of recursion to a scan of the fmap, we prevented it from ever hitting the recursion limit. This commit removes that restriction where it only does an fmap scan (aka "raw scan") of files that are at their limit so that we can actually hit the recursion limit and alert as intended. Also tidied up the cache_clean check so it checks the `fmap->dont_cache_flag` at the right point (before caching) instead of before setting the "CLEAN" verdict. Note: The `cache_clean` variable appears to be used to record the clean status so the `ret` variable can be re-used without losing the verdict. This is of course only required because the verdict is stored in the error enum. cough Also fixed a couple typos.	4 years ago
Micah Snyder	0354482e16	Fix issues reading from uncompressed nested files The fmap module provides a mechanism for creating a mapping into an existing map at an offset and length that's used when a file is found with an uncompressed archive or when embedded files are found with embedded file type recognition in scanraw(). This is the "fmap_duplicate()" function. Duplicate fmaps just reference the original fmap's 'data' or file handle/descriptor while allowing the caller to treat it like a new map using offsets and lengths that don't account for the original/actual file dimensions. fmap's keep track of this with m->nested_offset & m->real_len, which admittedly have confusing names. I found incorrect uses of these in a handful of locations. Notably: - In cli_magic_scan_nested_fmap_type(). The force-to-disk feature would have been checking incorrect sizes and may have written incorrect offsets for duplicate fmaps. - In XDP parser. - A bunch of places from the previous commit when making dupe maps. This commit fixes those and adds lots of documentation to the fmap.h API to try to prevent confusion in the future. nested_offset should never be referenced outside of fmap.c/h. The fmap_* functions for accessing or reading map data have two implementations, mem_* or handle_*, depending the data source. I found issues with some of these so I made a unit test that covers each of the functions I'm concerned about for both types of data sources and for both original fmaps and nested/duplicate fmaps. With the tests, I found and fixed issues in these fmap functions: - handle_need_offstr(): must account for the nested_offset in dupe maps. - handle_gets(): must account for nested_offset and use len & real_len correctly. - mem_need_offstr(): must account for nested_offset in dupe maps. - mem_gets(): must account for nested_offset and use len & real_len correctly. Moved CDBRANGE() macro out of function definition so for better legibility. Fixed a few warnings.	4 years ago
Micah Snyder	db013a2bfd	libclamav: Fix scan recursion tracking Scan recursion is the process of identifying files embedded in other files and then scanning them, recursively. Internally this process is more complex than it may sound because a file may have multiple layers of types before finding a new "file". At present we treat the recursion count in the scanning context as an index into both our fmap list AND our container list. These two lists are conceptually a part of the same thing and should be unified. But what's concerning is that the "recursion level" isn't actually incremented or decremented at the same time that we add a layer to the fmap or container lists but instead is more touchy-feely, increasing when we find a new "file". To account for this shadiness, the size of the fmap and container lists has always been a little longer than our "max scan recursion" limit so we don't accidentally overflow the fmap or container arrays (!). I've implemented a single recursion-stack as an array, similar to before, which includes a pointer to each fmap at each layer, along with the size and type. Push and pop functions add and remove layers whenever a new fmap is added. A boolean argument when pushing indicates if the new layer represents a new buffer or new file (descriptor). A new buffer will reset the "nested fmap level" (described below). This commit also provides a solution for an issue where we detect embedded files more than once during scan recursion. For illustration, imagine a tarball named foo.tar.gz with this structure: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| └── baz.exe \| PE \| 2 \| 1 \| But suppose baz.exe embeds a ZIP archive and a 7Z archive, like this: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| baz.exe \| PE \| 0 \| 0 \| \| ├── sfx.zip \| ZIP \| 1 \| 1 \| \| │ └── hello.txt \| ASCII \| 2 \| 0 \| \| └── sfx.7z \| 7Z \| 1 \| 1 \| \| └── world.txt \| ASCII \| 2 \| 0 \| (A) If we scan for embedded files at any layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| ├── foo.tar \| TAR \| 1 \| 0 \| \| │ ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ │ └── hola.txt \| ASCII \| 3 \| 0 \| \| │ ├── baz.exe \| PE \| 2 \| 1 \| \| │ │ ├── sfx.zip \| ZIP \| 3 \| 1 \| \| │ │ │ └── hello.txt \| ASCII \| 4 \| 0 \| \| │ │ └── sfx.7z \| 7Z \| 3 \| 1 \| \| │ │ └── world.txt \| ASCII \| 4 \| 0 \| \| │ ├── sfx.zip \| ZIP \| 2 \| 1 \| \| │ │ └── hello.txt \| ASCII \| 3 \| 0 \| \| │ └── sfx.7z \| 7Z \| 2 \| 1 \| \| │ └── world.txt \| ASCII \| 3 \| 0 \| \| ├── sfx.zip \| ZIP \| 1 \| 1 \| \| └── sfx.7z \| 7Z \| 1 \| 1 \| (A) is bad because it scans content more than once. Note that for the GZ layer, it may detect the ZIP and 7Z if the signature hits on the compressed data, which it might, though extracting the ZIP and 7Z will likely fail. The reason the above doesn't happen now is that we restrict embedded type scans for a bunch of archive formats to include GZ and TAR. (B) If we scan for embedded files at the foo.tar layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| ├── baz.exe \| PE \| 2 \| 1 \| \| ├── sfx.zip \| ZIP \| 2 \| 1 \| \| │ └── hello.txt \| ASCII \| 3 \| 0 \| \| └── sfx.7z \| 7Z \| 2 \| 1 \| \| └── world.txt \| ASCII \| 3 \| 0 \| (B) is almost right. But we can achieve it easily enough only scanning for embedded content in the current fmap when the "nested fmap level" is 0. The upside is that it should safely detect all embedded content, even if it may think the sfz.zip and sfx.7z are in foo.tar instead of in baz.exe. The biggest risk I can think of affects ZIPs. SFXZIP detection is identical to ZIP detection, which is why we don't allow SFXZIP to be detected if insize of a ZIP. If we only allow embedded type scanning at fmap-layer 0 in each buffer, this will fail to detect the embedded ZIP if the bar.exe was not compressed in foo.zip and if non-compressed files extracted from ZIPs aren't extracted as new buffers: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.zip \| ZIP \| 0 \| 0 \| \| └── bar.exe \| PE \| 1 \| 1 \| \| └── sfx.zip \| ZIP \| 2 \| 2 \| Provided that we ensure all files extracted from zips are scanned in new buffers, option (B) should be safe. (C) If we scan for embedded files at the baz.exe layer, we may detect: \| description \| type \| rec level \| nested fmap level \| \| ------------------------- \| ----- \| --------- \| ----------------- \| \| foo.tar.gz \| GZ \| 0 \| 0 \| \| └── foo.tar \| TAR \| 1 \| 0 \| \| ├── bar.zip \| ZIP \| 2 \| 1 \| \| │ └── hola.txt \| ASCII \| 3 \| 0 \| \| └── baz.exe \| PE \| 2 \| 1 \| \| ├── sfx.zip \| ZIP \| 3 \| 1 \| \| │ └── hello.txt \| ASCII \| 4 \| 0 \| \| └── sfx.7z \| 7Z \| 3 \| 1 \| \| └── world.txt \| ASCII \| 4 \| 0 \| (C) is right. But it's harder to achieve. For this example we can get it by restricting 7ZSFX and ZIPSFX detection only when scanning an executable. But that may mean losing detection of archives embedded elsewhere. And we'd have to identify allowable container types for each possible embedded type, which would be very difficult. So this commit aims to solve the issue the (B)-way. Note that in all situations, we still have to scan with file typing enabled to determine if we need to reassign the current file type, such as re-identifying a Bzip2 archive as a DMG that happens to be Bzip2- compressed. Detection of DMG and a handful of other types rely on finding data partway through or near the ned of a file before reassigning the entire file as the new type. Other fixes and considerations in this commit: - The utf16 HTML parser has weak error handling, particularly with respect to creating a nested fmap for scanning the ascii decoded file. This commit cleans up the error handling and wraps the nested scan with the recursion-stack push()/pop() for correct recursion tracking. Before this commit, each container layer had a flag to indicate if the container layer is valid. We need something similar so that the cli_recursion_stack_get_() functions ignore normalized layers. Details... Imagine an LDB signature for HTML content that specifies a ZIP container. If the signature actually alerts on the normalized HTML and you don't ignore normalized layers for the container check, it will appear as though the alert is in an HTML container rather than a ZIP container. This commit accomplishes this with a boolean you set in the scan context before scanning a new layer. Then when the new fmap is created, it will use that flag to set similar flag for the layer. The context flag is reset those that anything after this doesn't have that flag. The flag allows the new recursion_stack_get() function to ignore normalized layers when iterating the stack to return a layer at a requested index, negative or positive. Scanning normalized extracted/normalized javascript and VBA should also use the 'layer is normalized' flag. - This commit also fixes Heuristic.Broken.Executable alert for ELF files to make sure that: A) these only alert if cli_append_virus() returns CL_VIRUS (aka it respects the FP check). B) all broken-executable alerts for ELF only happen if the SCAN_HEURISTIC_BROKEN option is enabled. - This commit also cleans up the error handling in cli_magic_scan_dir(). This was needed so we could correctly apply the layer-is-normalized-flag to all VBA macros extracted to a directory when scanning the directory. - Also fix an issue where exceeding scan maximums wouldn't cause embedded file detection scans to abort. Granted we don't actually want to abort if max filesize or max recursion depth are exceeded... only if max scansize, max files, and max scantime are exceeded. Add 'abort_scan' flag to scan context, to protect against depending on correct error propagation for fatal conditions. Instead, setting this flag in the scan context should guarantee that a fatal condition deep in scan recursion isn't lost which result in more stuff being scanned instead of aborting. This shouldn't be necessary, but some status codes like CL_ETIMEOUT never used to be fatal and it's easier to do this than to verify every parser only returns CL_ETIMEOUT and other "fatal status codes" in fatal conditions. - Remove duplicate is_tar() prototype from filestypes.c and include is_tar.h instead. - Presently we create the fmap hash when creating the fmap. This wastes a bit of CPU if the hash is never needed. Now that we're creating fmap's for all embedded files discovered with file type recognition scans, this is a much more frequent occurence and really slows things down. This commit fixes the issue by only creating fmap hashes as needed. This should not only resolve the perfomance impact of creating fmap's for all embedded files, but also should improve performance in general. - Add allmatch check to the zip parser after the central-header meta match. That way we don't multiple alerts with the same match except in allmatch mode. Clean up error handling in the zip parser a tiny bit. - Fixes to ensure that the scan limits such as scansize, filesize, recursion depth, # of embedded files, and scantime are always reported if AlertExceedsMax (--alert-exceeds-max) is enabled. - Fixed an issue where non-fatal alerts for exceeding scan maximums may mask signature matches later on. I changed it so these alerts use the "possibly unwanted" alert-type and thus only alert if no other alerts were found or if all-match or heuristic-precedence are enabled. - Added the "Heuristics.Limits.Exceeded." events to the JSON metadata when the --gen-json feature is enabled. These will show up once under "ParseErrors" the first time a limit is exceeded. In the present implementation, only one limits-exceeded events will be added, so as to prevent a malicious or malformed sample from filling the JSON buffer with millions of events and using a tonne of RAM.	4 years ago
Markus Strehle	dd6e2c4188	Callback pointer check added Crash in programs without callbacks Found this regression after updating, Caused by `861153a656`	4 years ago
Micah Snyder	70ac31feee	Fix bytecode match evaluation for PDF scans Bytecode signatures targeting PDF files (target 10) fail to evaluate match conditions. This occurs because the raw scan step during a magic scan currently occurs _after_ the PDF scan, thus at the time the hook is triggered, no matches have been performed and the logical sig eval is always negative. This commit fixes the issue by relocating the PDF parsing step to occur after the raw scan, thereby enabling the bytecode lsig evaluation to match and the signature to execute.	5 years ago
Micah Snyder	da44d9b6e8	libclamav: Don't try to detect SFX archives in SFX archives The embedded file type recognition feature scans files for embedded files. This can identify things like self extracting zips (ZIPSFX), as well as file types like DMG and MHTML files that can't be easily identified using the start of the file. It can waste CPU if you detect SFX files and then use the embedded file type recognition scan on those SFX files, potentially detecting and processing the same portions over and over.	5 years ago
Micah Snyder	971a12ddb9	Clang-format cleanup	5 years ago
Micah Snyder	7c25968eb9	OLE2: doc summary, emb. ole10 scans skipped if XLM macro found Also resolved the following issue: If XLM (and now images) are found when parsing an ole2 files the following other embedded content may not be processed: - document summary metadata - embedded ole10 files - ole2 temp subdirectories (i.e. recursion) The logic to process the above ole2 extracted temp files was present in the function which processes extracted VBA. When we added support for extracting XLM macros, processing these other data was lost. Really, the above need to be processed if any temp files were saved. I fixed this by restructuring the features to extract any type of temp file into separate functions per type of temp file. I then wrappped those in an ole2 temp dir scanning function. OLE2 temp directory scanning is recursive if there are subdirectories.	5 years ago
Micah Snyder	90e4d66f7c	OLE2 / XLS document image extraction Added a feature to extract images from OLE2 BIFF streams. This work was derived from InQuests blog post about extracting XLM and images from XLS files: https://inquest.net/blog/2019/01/29/Carving-Sneaky-XLM-Files Assorted ole2 parser code cleanup and massive error handling cleanup. Also fixed the following: - The XLS parser may fail to process all BIFF records if some of the records contain unexpected data or is otherwise malformed. Because the record size is already known, we can skip over the "malformed" record and continue with the rest. - Fixed an issue where the ole2 header size was improperly calculated, failing to account for the new "has_xlm" boolean added for context.	5 years ago
Micah Snyder	090c8990e3	libclamav, clamscan: load/unload callbacks & progress meters Add progress callbacks to libclamav for: - database load - engine compile - engine free Add a progress bar to clamscan for load & compile. These are disabled if you run with --debug or stdout is not a TTY or you are using one of --quiet, --infected, or --no-summary. Added code so you can test the engine-free callback by building with ENABLE_ENGINE_FREE_PROGRESSBAR defined. The compile & free progress callbacks pre-calculate the number of tasks to complete to estimate the progress. Some tasks may take longer than others so the progress speed my appear to vary a little. The callbacks return type is a cl_error_t but doesn't currently do anything. It is reserved for future use. Minor formatting change in matcher-ac.c to counteract weird clang-format behavior, and to make it easier to read. Added progress callbacks and clamscan progress bars to the news.	5 years ago
Micah Snyder	1ee5c96c59	Correct return status variable type Should use the 'cl_error_t' enum, not 'int'. No functional difference, but is better for type safety and for debugging.	5 years ago
Micah Snyder (micasnyd)	a71eb34999	Fix invalid zip & macho scan recursion If zip content is detected within a file by way of the embedded file type recognition scan (in `scanraw()`), a raw scan of that "ZIPSFX" will detect all subsequent zip entries as new ZIPSFX's. Though they aren't actually scanned later, it shows up in the metadata JSON. This commit prevents embedded file type detection for ZIPSFX like we already have for ZIP. Semi-related, the mach-o unibin parser presently allows scanning of FAT partitions anywhere in the fmap, to include the very beginning of the fmap. This would be an infinite loop, scanning the same file over and over again, were it not for the scan recursion limit. With the recursion limit, it's ok, but still bad behavior. This commit prevents scanning FAT files from the mach-o unibin parser where the offset is less than the end of the headers. Also fixed an unsigned integer comparison in the OLE2 parser that might overflow.	5 years ago
Micah Snyder	0255f29a72	Blacklist & Whitelist verbiage Improvements to use modern block list and allow list verbiage. blacklist -> block list whitelist -> allow listed blacklisted -> blocked whitelisted -> allowed In the case of certificate verification, use "trust" or "verify" when something is allowed. Also changed domainlist -> domain list (or DomainList) to match.	5 years ago
Micah Snyder (micasnyd)	1919141768	Fix ENGINE_OPTIONS_FORCE_TO_DISK scan performance There is a scan logic issue where the main libclamav scanning functions create an extra "nested" fmap for each file being scanned. This is slightly inefficient for a normal scan, but causes a major performance issue when using ENGINE_OPTIONS_FORCE_TO_DISK. It causes every scanned file to be duplicated in the temp directory before the scan. We fix this by using `cli_magic_scan()` in `scan_common()` instead of `cli_magic_scan_nested_fmap_type()`. We can do this now that the `cl_scandesc_callback()` API creates an fmap for the caller, instead of the old logic where `scan_common()` called different API's depending on whether or not we have an fmap or a file descriptor.	5 years ago
Micah Snyder	bae444a25b	clang-format housekeeping	5 years ago
Micah Snyder (micasnyd)	861153a656	Fix errors when scanning files > 4G This commit resolves https://bugzilla.clamav.net/show_bug.cgi?id=12673 Changes in 0.103 to order of operations for creating fmaps and performaing hashes of fmaps resulted errors when scanning files that are 4096M and a different (but related) error when scanning files > 4096M. This is despite the fact that scanning is supposed to be limited to --max-scansize (MaxScanSize) and was also apparently limited to INT_MAX - 2 (aka ~1.999999G) back in 2014 to alleviate reported crashes for a few large file formats. (see https://bugzilla.clamav.net/show_bug.cgi?id=10960) This last limitation was not documented, so I added it to the sample clamd.conf. Anyways, the main issue is that the fmap module was using "unsigned int" and was then enforcing a limitation (verbose error messages) when that a map length exceeded the capapacity of an unsigned int. This commit switches the associated variables over to uint64_t, and while fmaps are still limited to size_t in other places, the fmap module will at least work with files > 4G on 64bit systems. In testing this, I found that the time to hash a file, particularly when hashing a file on an NTFS partition from Linux was really slow because we were hashing in FILEBUFF chunks (about 8K) at a time. Increasing this to 10MB chunks speeds up scanning of large files. Finally, now that hashing is performed immediately when an fmap is created for a file, hashing of files larger than max-scansize was occuring. This commit adds checks to bail out early if the file size exceeds the maximum before creating an fmap. It will alert with the Heuristics.Limits.Exceeded name if the heuristic is enabled. Also fixed CheckFmapFeatures.cmake module that detects if sysconf(_SC_PAGESIZE) is available.	5 years ago
Micah Snyder (micasnyd)	b9ca6ea103	Update copyright dates for 2021 Also fixes up clang-format.	5 years ago
Micah Snyder	625e506b07	clamd: PR review fix, update acknowledgements The clamd TOCTOU access check fix introduced and expectation that the scanfile API will set errno if access was denied. We should instead use the cl_error_t error code enum. Also added Duane Waddle to the 0.104 contributors acknowledgements.	5 years ago
Micah Snyder	e4e3149368	Fix fmap-duplicate performance issue The fmap_duplicate function is used create a new fmap with a view into an existing fmap. When the new view is a different size than the old fmap, a new hash must be calculated for the duplicate fmap. However, when the duplicated fmap is the same size as the original fmap, the hash will be the same and there's no point recalculating. The issue is apparent when scanning large EXE files because the hash was being calculated at the beginning and end of the scan. Digging into this issue revealed that hash calculations for fmaps were also being performed at the wrong place. For scans of maps we use fmap_duplicate() early in the process to apply the name API argument to the duplicate fmap. Fixing the logic so we doing recalculate the hash revealed that we never calculated hashes for fmap's created from buffers in the first place, so that also had to be fixed be relocating where the hash is calculated. I also found that fmap_duplicate()'s offset argument used an off_t, though it and all caller offsets are not allowed to be negative. This was a bit of tangent to fix a bunch of off_t variables and paramters that should've been size_t. Added a couple unit tests to verify that making duplicate fmaps, and duplicate-duplicate fmaps works as expected after the change. Changed CLI_ISCONTAINED() and CLI_ISCONTAINED2() macros to cast to size_t, because pointers and buffer sizes may not be negative, and these two macros do not rely on substraction.	5 years ago
Micah Snyder	c40f03ade6	GPT parser verbosity, DMG/BZ2 detection fix Reduced the verbosity of a GPT parser warning that occurs frequently when parsing DMG files prior to DMG file type recognition. DMG files support a handful of compression formats. File type recognition for DMG presently works by doing "embedded" file type recognition during the raw scan after having already identified the file type by traditional file type magic checks. I found that when DMG uses bzip2 for compression, we identify an MBR type containing a BZ type, at which point the raw scan detects it as DMG. The previous commits broke this by disabling embedded file type recognition for BZ and other compression & archivie types. Ideally the fix would be to do DMG file type detection by checking the end of the file; perhaps adding negative offset support for FTM sigs could fix it. Until we can implement that or another/better solution for DMG file type detection, we'll have to allow embedded file type recognition for BZ files. Also added some comments to narrate the scan process.	5 years ago
Micah Snyder	5566f0c76a	Scan performance improvements ClamAV's embedded file type recognition detects some files found in non-archive formats but for archive formats and compressed data streams like bzip2 and gzip, it will often detect file type magic bytes of compressed files and then attempt to parse the compressed data as if they were whole files, resulting in wasted CPU cycles and confusing warnings. This patch prevents embedded file type recognition for CL_TYPE_GZ and CL_TYPE_BZ. Also revert the UTF8 Byte Order Mark (BOM) detection and associated scanning of all text types as HTML files that had been added in 0.103. Scanning a file as HTML is not performant because it creates temp files and and normalizes the original files 3 ways. Better text type detection, transcoding, and HTML detection is probably still needed, but will have to wait. Scanning any embedded content that looked like text with the HTML parser impacts performance too much.	5 years ago
Micah Snyder	205614b403	Integrate JPEG exploit check into JPEG parser Integrated the JPEG exploit check into the JPEG parser and removed it from special.c. As a happy consequence of this, the photoshop file detection and embedded JPEG thumbnail exploit check was merged in as well, which means that the embedded thumbnails can also be scanned as embedded JPEG files.	5 years ago
Micah Snyder	1ae678c945	JPEG format validator improvements Adds debug output to the JPEG format validator to help resolve issues with unusually formatted JPEGs and to validate that the JPEG parser is working correctly. Relaxes the rules around duplicate application markers or application markers that appear later than expected, due to prior XMP metadata, etc. Removed the requirement for an application marker to exist, as some older JPEGs don't appear to use JFIF, Exif, or SPIFF application extensions. I tested against a relatively large data set of JPEGs from Mac & Windows stock photos, personal photos, and assorted downloaded photos. FP rates when alerting on broken media should be very low.	5 years ago
Micah Snyder	4cce1fcd20	GIF, PNG bugfixes; Add AlertBrokenMedia option Added a new scan option to alert on broken media (graphics) file formats. This feature mitigates the risk of malformed media files intended to exploit vulnerabilities in other software. At present media validation exists for JPEG, TIFF, PNG, and GIF files. To enable this feature, set `AlertBrokenMedia yes` in clamd.conf, or use the `--alert-broken-media` option when using `clamscan`. These options are disabled by default for now. Application developers may enable this scan option by enabling `CL_SCAN_HEURISTIC_BROKEN_MEDIA` for the `heuristic` scan option bit field. Fixed PNG parser logic bugs that caused an excess of parsing errors and fixed a stack exhaustion issue affecting some systems when scanning PNG files. PNG file type detection was disabled via signature database update for 0.103.0 to mitigate effects from these bugs. Fixed an issue where PNG and GIF files no longer work with Target:5 (graphics) signatures if detected as CL_TYPE_PNG/GIF rather than as CL_TYPE_GRAPHICS. Target types now support up to 10 possible file types to make way for additional graphics types in future releases. Scanning JPEG, TIFF, PNG, and GIF files will no longer return "parse" errors when file format validation fails. Instead, the scan will alert with the "Heuristics.Broken.Media" signature prefix and a descriptive suffix to indicate the issue, provided that the "alert broken media" feature is enabled. GIF format validation will no longer fail if the GIF image is missing the trailer byte, as this appears to be a relatively common issue in otherwise functional GIF files. Added a TIFF dynamic configuration (DCONF) option, which was missing. This will allow us to disable TIFF format validation via signature database update in the event that it proves to be problematic. This feature already exists for many other file types. Added CL_TYPE_JPEG and CL_TYPE_TIFF types.	5 years ago
Micah Snyder (micasnyd)	9e20cdf6ea	Add CMake build tooling This patch adds experimental-quality CMake build tooling. The libmspack build required a modification to use "" instead of <> for header #includes. This will hopefully be included in the libmspack upstream project when adding CMake build tooling to libmspack. Removed use of libltdl when using CMake. Flex & Bison are now required to build. If -DMAINTAINER_MODE, then GPERF is also required, though it currently doesn't actually do anything. TODO! I found that the autotools build system was generating the lexer output but not actually compiling it, instead using previously generated (and manually renamed) lexer c source. As a consequence, changes to the .l and .y files weren't making it into the build. To resolve this, I removed generated flex/bison files and fixed the tooling to use the freshly generated files. Flex and bison are now required build tools. On Windows, this adds a dependency on the winflexbison package, which can be obtained using Chocolatey or may be manually installed. CMake tooling only has partial support for building with external LLVM library, and no support for the internal LLVM (to be removed in the future). I.e. The CMake build currently only supports the bytecode interpreter. Many files used include paths relative to the top source directory or relative to the current project, rather than relative to each build target. Modern CMake support requires including internal dependency headers the same way you would external dependency headers (albeit with "" instead of <>). This meant correcting all header includes to be relative to the build targets and not relative to the workspace. For example, ... ```c include "../libclamav/clamav.h" include "clamd/clamd_others.h" ``` ... becomes: ```c // libclamav include "clamav.h" // clamd include "clamd_others.h" ``` Fixes header name conflicts by renaming a few of the files. Converted the "shared" code into a static library, which depends on libclamav. The ironically named "shared" static library provides features common to the ClamAV apps which are not required in libclamav itself and are not intended for use by downstream projects. This change was required for correct modern CMake practices but was also required to use the automake "subdir-objects" option. This eliminates warnings when running autoreconf which, in the next version of autoconf & automake are likely to break the build. libclamav used to build in multiple stages where an earlier stage is a static library containing utils required by the "shared" code. Linking clamdscan and clamdtop with this libclamav utils static lib allowed these two apps to function without libclamav. While this is nice in theory, the practical gains are minimal and it complicates the build system. As such, the autotools and CMake tooling was simplified for improved maintainability and this feature was thrown out. clamdtop and clamdscan now require libclamav to function. Removed the nopthreads version of the autotools libclamav_internal_utils static library and added pthread linking to a couple apps that may have issues building on some platforms without it, with the intention of removing needless complexity from the source. Kept the regular version of libclamav_internal_utils.la though it is no longer used anywhere but in libclamav. Added an experimental doxygen build option which attempts to build clamav.h and libfreshclam doxygen html docs. The CMake build tooling also may build the example program(s), which isn't a feature in the Autotools build system. Changed C standard to C90+ due to inline linking issues with socket.h when linking libfreshclam.so on Linux. Generate common.rc for win32. Fix tabs/spaces in shared Makefile.am, and remove vestigial ifndef from misc.c. Add CMake files to the automake dist, so users can try the new CMake tooling w/out having to build from a git clone. clamonacc changes: - Renamed FANOTIFY macro to HAVE_SYS_FANOTIFY_H to better match other similar macros. - Added a new clamav-clamonacc.service systemd unit file, based on the work of ChadDevOps & Aaron Brighton. - Added missing clamonacc man page. Updates to clamdscan man page, add missing options. Remove vestigial CL_NOLIBCLAMAV definitions (all apps now use libclamav). Rename Windows mspack.dll to libmspack.dll so all ClamAV-built libraries have the lib-prefix with Visual Studio as with CMake.	5 years ago
Micah Snyder (micasnyd)	1a8b164b4f	Fix new issues identified by Coverity 298485: Fix possible fd leaks. 298486: Fix possible use-after-free.	5 years ago
Micah Snyder (micasnyd)	c637de532b	Disable embedded type recognition for disk images Using file type recognition scan mode for disk images and other raw archive formats is problematic. One simple reason is that the contained files will be detected and parsed and scanned twice, first when deteced by the type recog scan, and later when the archive is extracted and the files are properly scanned. Another reason is an increased likelihood for incorrect type recognition, as seen with supposed MHTML files (they weren't) found in GPT disk images. Though a previous patch disabled embedded type recognition for GPT files, this one extens this to the following: - CL_TYPE_CPIO_OLD - CL_TYPE_ZIP - CL_TYPE_OLD_TAR - CL_TYPE_POSIX_TAR ZIP is included because file entries in a ZIP are incorrectly detected as ZIPSFX's and though we also ensure not to scan ZIPSFX's found in ZIP's, it's more efficient not to do the type recognition in the first place and it prevents us from adding those bogus ZIPSFX entries into the scan properties JSON. This patch also fixes what appears to be a copy-paste typo, where CL_TYPE_ISHIELD_MSI types were accidentally having their container value set to CL_TYPE_AUTOIT.	5 years ago
Micah Snyder	5d7e54c0bf	Code review fixes Exit early from VBA scanning loop if virus found. Add VBA/XLM suffix to ContainsMacros heuristics. Fix setting status code for error and virus conditions. Increment/decrement recursion counter when scanning vba dir.	5 years ago
Micah Snyder	07a66adc75	Fix bug added in previous patch, fixup unit tests to use newly added sanitized_basename parameter.	5 years ago
Micah Snyder	860764eb16	Heuristic macro detection for imp VBA extraction Notably the commit adds a heuristic alert when VBA is extracted using the new VBA extraction code and similarly adds "HasMacros":true to the JSON scan properties. In addition, a change was added to the cli_sanitize_filepath() function so it converts posix pathseps to Windows pathseps on Windows and also outputs a sanitized basename pointer (optional) which is used when generating a temporary filename so that using a prefix with pathseps in it won't cause file creation failures (observed with --leave-temps where original filenames are incorporated into temporarily filenames). Included soem error handling improvements for cli_vba_scandir() to better track alert and macro detections. Downgraded utf8 conversion error messages to debug messages because they are too verbose in files with invalid filenames (observed in some malware). Changed the xlm macro and vba project temp filenames to include "xlm_macros" and "vba_project" prefix, to make it easier to find them. Relocated XLM and VBA temp files from the top-level tmp directory to the current sub_tmpdir, so tempfiles for a given scan are more organized.	5 years ago
Micah Snyder	b1dbf93f0b	Fix newly introduced VBA/XLM OLE2 bugs Fix an infinite loop in the new XLM macro parser. Fix error handling, resource cleanup in OLE2 parser. Fix issues tracking detected "viruses" in VBA & OLE2 parsers affecting non-allmatch (regular) scan mode, wherein multiple viruses may be found but each record lost and the overall detection comes up clean. Also silence switch() fall-through warning for WORD/PPT/XL/HWP (OOXML) file type fall-throughs to the ZIP parser (because they are zips). Also silence switch() fall-through warning when handling the limits- exceeded error types, checking for the limits-exceeded heuristic, and continuing on to bail out with a clean verdict.	5 years ago
Mickey Sola	9ea3b93018	Recurse all fpmaps when doing fpchecks Changes cli_checkfp_virus to a recursive function which checks all parent fmaps in the context for false positives Simplifies params needed for cli_checkfp_virus to use the current digest and fmap length that resides within the fmap struct itself	5 years ago
Micah Snyder	e2f59af30a	Clang-format touchup	5 years ago
Micah Snyder (micasnyd)	6198778903	Additional XLM parser error handling fixes Improve error handling for functions that read the XLM BIFF temp-files. Improve resource cleanup to alleviate Coverity false positive issue.	6 years ago
Andrew	319bfb51a5	Fix several coverity warnings 290424 Missing break in switch - In hash_match: Missing break statement between cases in switch statement 290414 Resource leak - In cli_scanishield_msi: Leak of memory or pointers to system resources. Memory leak in a fail case 288197 Resource leak - In decrypt_any: Leak of memory or pointers to system resources. Memory leak in a fail case 290426 Resource leak - In cli_magic_scan: Leak of memory or pointers to system resources. Leaked a file prefix when running with --save-temps 192923 Resource leak - In cli_scanrar: Leak of memory or pointers to system resources. Leaked a file descriptor if a virus was found in a RAR file comment 225146 Resource leak - In cli_scanegg: Leak of memory or pointers to system resources. Leaked a file descriptor if unable to write a comment file to disk 290425 Resource leak - In scan_common: Leak of memory or pointers to system resources. Memory leaks in various fail cases. Also changes cli_scanrar to write out the file comment only if --leave-temps is specified and scan the buffer (like what is done in cli_scanegg) instead of writing the file out, scanning that, and then deleting the file if --leave-temps is not specified. The unit tests stopped working when correcting an issue with a switch statement that determined what type of signature had matched on a Google SafeBrowsing GDB rule. Looking into the unit tests, it looks like the code had always assumed that the test cases would be detected by a malware test rule in unit_tests/input/daily.gdb, but now some of the tests get matched on the phishing test rule. I updated the test logic to be more clear, and added tests for both cases now. Fix some memory leaks in libclamav/scanners.c	6 years ago
Micah Snyder	e01ba94e36	bb12506: Fix phishing/heuristic alert verbosity Some detections, like phishing, are considered heuristic alerts because they match based on behavior more than on content. A subset of these are considered "potentially unwanted" (low-severity). These low-severity alerts include: - phishing - PDFs with obfuscated object names - bytecode signature alerts that start with "BC.Heuristics" The concept is that unless you enable "heuristic precedence" (a method of lowing the threshold to immediateley alert on low-severity detections), the scan should continue after a match in case a higher severity match is found. Only at the end will it print the low-severity match if nothing else was found. The current implementation is buggy though. Scanning of archives does not correctly bail out for the entire archive if one email contains a phishing link. Instead, it sets the "heuristic found" flag then and alerts for every subsequent file in the archive because it doesn't know if the heuristic was found in an embedded file or the target file. Because it's just a heuristic and the status is "clean", it keeps scanning. This patch corrects the behavior by checking if a low-severity alerts were found at the end of scanning the target file, instead of at the end of each embedded file. Additionally, this patch fixes an in issue with phishing alerts wherein heuristic precedence mode did not cause a scan to stop after the first alert. The above changes required restructuring to create an fmap inside of cl_scandesc_callback() so that scan_common() could be modified to require an fmap and set up so that the current *ctx->fmap pointer is never NULL when scan_common() evaluates match results. Also fixed a couple minor bugs in the phishing unit tests and cleaned up the test code for improved legitibility and type safety.	6 years ago
Micah Snyder	d1f209e879	Fix fmap NULL deref in preclass bytecode hook If using a bytecode signature that makes use of the BC_PRECLASS hook and if it alerts, a NULL dereference may occur. This change fixes that. Also fixed unrelated memory leaks introduced recently when adding file name extraction to the zip parser and rar parser.	6 years ago
Andrew	ead920501b	Fix fmap leak in scan_common when map parameter is NULL scan_common must either be passed an fmap (map) or a file descriptor (desc) corresponding to the file being scanned. In the case where map is NULL, scan_common will create an fmap in order to execute the BC_PRECLASS bytecode hook, and this fmap wasn't being unmapped afterward	6 years ago
Micah Snyder	e0dae24fcc	Fix dupl. fmap name bug, fix fd init in HTML norm Fixed copypaste bug with duplicated fmap names being assigned to the parent instead of the dup/child fmap. Fixed file descriptor initialization issue in the HTML normalizer.	6 years ago
Micah Snyder	c110392780	Change permission for new tmp files from RWX to RW	6 years ago
Micah Snyder	11ef77007b	Improve tmp sub-directory names At present many parsers create tmp subdirectories to store extracted files. For parsers like the vba parser, this is required as the directory is later scanned. For other parsers, these subdirectories are probably not helpful now that we provide recursive sub-dirs when --leave-temps is enabled. It's not quite as simple as removing the extra subdirectories, however. Certain parsers, like autoit, don't create very unique filenames and would result in file name collisions when --leave-temps is not enabled. The best thing to do would be to make sure each parser uses unique filenames and doesn't rely on cli_magic_scan_dir() to scan extracted content before removing the extra subdirectory. In the meantime, this commit gives the extra subdirectories meaningful names to improve readability. This commit also: - Provides the 'bmp' prefix for extracted PE icons. - Removes empty tmp subdirs when extracting rtf files, to eliminate clutter. - The PDF parser sometimes creates tmp files when decompressing streams before it knows if there is actually any content to decompress. This resulted in a large number of empty files. While it would be best to avoid creating empty files in the first place, that's not quite as as it sounds. This commit does the next best thing and deletes the tmp files if nothing was actually extracted, even if --leave-temps is enabled. - Removes the "scantemp" prefix for unnamed fmaps scanned with cli_magic_scan(). The 5-character hashes given to tmp files with prefixes resulted in occasional file name collisions when extracting certain file types with thousands of embedded files. - The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX, resulting in truncated file paths and failed extraction when --leave-temps is enabled and a lot of recursion is in play. This commit switches them from NAME_MAX to PATH_MAX.	6 years ago
Micah Snyder	c545cad161	Only create rfc2397 tmp directory when needed HTML normalization creates a tmp directory for storing rfc2397 style links. The vast majority of html does not make use of rfc2397 and thus an excess of empty tmp directories are generated. This commit alters behavior to only create the rfc2397 directory when required if it does not already exist.	6 years ago
Micah Snyder	9b9999d778	Rename core scanning functions Many of the core scanning functions' names no longer represent their specific purpose or arguments. This commit aims to make the names more intuitive. Names are now prefixed with "magic" if they involve file-typing and file-type parsing. In addition, each function now includes the type of input being scanned whether its "desc", "fmap", or "buff". Some of the APIs also now specify "type" to indicate that a type other than "ANY" may be passed in to select the type rather than use file type magic for type recognition. \| current name \| new name \| \| ------------------------- \| --------------------------------- \| \| magic_scandesc() \| cli_magic_scan() \| \| cli_magic_scandesc_type() \| <delete> \| \| cli_magic_scandesc() \| cli_magic_scan_desc() \| \| cli_base_scandesc() \| cli_magic_scan_desc_type() \| \| cli_partition_scandesc() \| <delete> \| \| cli_map_scandesc() \| magic_scan_nested_fmap_type() \| \| cli_map_scan() \| cli_magic_scan_nested_fmap_type() \| \| cli_mem_scandesc() \| cli_magic_scan_buff() \| \| cli_scanbuff() \| cli_scan_buff() \| \| cli_scandesc() \| cli_scan_desc() \| \| cli_fmap_scandesc() \| cli_scan_fmap() \| \| cli_scanfile() \| cli_magic_scan_file() \| \| cli_scandir() \| cli_magic_scan_dir() \| \| cli_filetype2() \| cli_determine_fmap_type() \| \| cli_filetype() \| cli_compare_ftm_file() \| \| cli_partitiontype() \| cli_compare_ftm_partition() \| \| cli_scanraw() \| scanraw() \|	6 years ago

1 2 3 4 5 ...

550 Commits (2e55c901b1b0571a340602ee2681255876f63f1a)