clamav

Commit Graph

Author	SHA1	Message	Date
Micah Snyder	c545cad161	Only create rfc2397 tmp directory when needed HTML normalization creates a tmp directory for storing rfc2397 style links. The vast majority of html does not make use of rfc2397 and thus an excess of empty tmp directories are generated. This commit alters behavior to only create the rfc2397 directory when required if it does not already exist.	5 years ago
Micah Snyder	9b9999d778	Rename core scanning functions Many of the core scanning functions' names no longer represent their specific purpose or arguments. This commit aims to make the names more intuitive. Names are now prefixed with "magic" if they involve file-typing and file-type parsing. In addition, each function now includes the type of input being scanned whether its "desc", "fmap", or "buff". Some of the APIs also now specify "type" to indicate that a type other than "ANY" may be passed in to select the type rather than use file type magic for type recognition. \| current name \| new name \| \| ------------------------- \| --------------------------------- \| \| magic_scandesc() \| cli_magic_scan() \| \| cli_magic_scandesc_type() \| <delete> \| \| cli_magic_scandesc() \| cli_magic_scan_desc() \| \| cli_base_scandesc() \| cli_magic_scan_desc_type() \| \| cli_partition_scandesc() \| <delete> \| \| cli_map_scandesc() \| magic_scan_nested_fmap_type() \| \| cli_map_scan() \| cli_magic_scan_nested_fmap_type() \| \| cli_mem_scandesc() \| cli_magic_scan_buff() \| \| cli_scanbuff() \| cli_scan_buff() \| \| cli_scandesc() \| cli_scan_desc() \| \| cli_fmap_scandesc() \| cli_scan_fmap() \| \| cli_scanfile() \| cli_magic_scan_file() \| \| cli_scandir() \| cli_magic_scan_dir() \| \| cli_filetype2() \| cli_determine_fmap_type() \| \| cli_filetype() \| cli_compare_ftm_file() \| \| cli_partitiontype() \| cli_compare_ftm_partition() \| \| cli_scanraw() \| scanraw() \|	5 years ago
Micah Snyder	ae77e87880	Add EmbeddedObjects to JSON The metadata projecties JSON structure isn't recording file types found embedded within a file such as self-extracting (SFX) types and office document types (DOCX, PPTX, etc). This presents a problem... At present there's no way to know if the current file has ended and a few file is found tacked on to the end of the first file. If there were, we could simply check if the type found by the raw-scan exists within the first file, or after. If within the first, and the type is an archive then it's reasonable to conclude we're either observing zip headers (for SFXZIP detections) or other files that are not compressed. If the type ISN'T found within the first file, then we definitely have whole new file to parse and we should do so with cli_magic_scan() rather than only using these embedded type scanners. At present we can't ignore SFXZIP detections even if the original file type is a ZIP because we may have found two ZIPs appended together to evade detection (a legitimate trick). As a consequence, we will effectively parse every zip entry twice. The same issue applies to types found within non-compressed archives. This commit adds an EmbeddedObjects list to the metadata JSON object so that the existance of these types is noted. Additionally, this commit removes the two-part int64 cli_jsonint64() implementation as json_object_new_int64() should be available everywhere and the macro to detect such support was never set.	5 years ago
Micah Snyder	005cbf5a37	Record names of extracted files A way is needed to record scanned file names for two purposes: 1. File names (and extensions) must be stored in the json metadata properties recorded when using the --gen-json clamscan option. Future work may use this to compare file extensions with detected file types. 2. File names are useful when interpretting tmp directory output when using the --leave-temps option. This commit enables file name retention for later use by storing file names in the fmap header structure, if a file name exists. To store the names in fmaps, an optional name argument has been added to any internal scan API's that create fmaps and every call to these APIs has been modified to pass a file name or NULL if a file name is not required. The zip and gpt parsers required some modification to record file names. The NSIS and XAR parsers fail to collect file names at all and will require future work to support file name extraction. Also: - Added recursive extraction to the tmp directory when the --leave-temps option is enabled. When not enabled, the tmp directory structure remains flat so as to prevent the likelihood of exceeding MAX_PATH. The current tmp directory is stored in the scan context. - Made the cli_scanfile() internal API non-static and added it to scanners.h so it would be accessible outside of scanners.c in order to remove code duplication within libmspack.c. - Added function comments to scanners.h and matcher.h - Converted a TDB-type macros and LSIG-type macros to enums for improved type safey. - Converted more return status variables from `int` to `cl_error_t` for improved type safety, and corrected ooxml file typing functions so they use `cli_file_t` exclusively rather than mixing types with `cl_error_t`. - Restructured the magic_scandesc() function to use goto's for error handling and removed the early_ret_from_magicscan() macro and magic_scandesc_cleanup() function. This makes the code easier to read and made it easier to add the recursive tmp directory cleanup to magic_scandesc(). - Corrected zip, egg, rar filename extraction issues. - Removed use of extra sub-directory layer for zip, egg, and rar file extraction. For Zip, this also involved changing the extracted filenames to be randomly generated rather than using the "zip.###" file name scheme.	5 years ago
Micah Snyder	9f2de39e04	New tmp sub-dir per scan; JSON meta improvements This commit improves the layout of the tmp file output and the JSON metadata output when using the --leave-temps and --gen-json options. For all scans, each scan target will get a unique tmp sub-directory. If using --leave-temps, that subdir will include the basename of the original file to make it easier to identify. Additionally, when using --leave-temps option, all extracted objects will have their subdirectories extracted in recursive subdirectories including filename prefixes where available. When not using the --leave-temps option, the layout of the tmp sub-directory will remain flat, so as to alleviate the possibility of exceeding PATH_MAX. The JSON metadata generated by the --gen-json option is now generated for all file types, not just a select few. The format is also pretty-printed for readability and now includes filenames and file paths when available. Also: - Added missing ALLMATCH check when determining if bytecode hooks should be run. - Added cl_engine_get_str API to windows libclamav symbol export file.	5 years ago
Mickey Sola	706dd7d7bc	pdf - fixup Aldo's PR based on review by team	5 years ago
Aldo Mazzeo	7d2ce0b32c	Adding support for Adobe Reader X encryption scheme	5 years ago
Micah Snyder	2fbaaaed9a	Fix PNG missing-return recently introduced A missing return statement in png.c for a function that should return a status code is resulting in undefined behavior. In this patch I also added ".PNG" to one of the new heuristic signatures to match others.	5 years ago
Micah Snyder (micasnyd)	a97ce0c837	fuzz-21960: Add missing size checks to vba parser Add missing size checks to validate size data parsed from a VBA file. This fixes a possible buffer overflow read that was caught by oss-fuzz before it made it into any release.	5 years ago
Micah Snyder (micasnyd)	4b7a738152	fuzz-21329: Fix out-of-bounds read in PDF parser Fix for an out-of-bounds read in the PDF parser when initializing aes crypto routines that may result in a crash. Bug found by OSS-Fuzz. Also added checks for the arc4 init routine to mitigate the risk of a similar issue.	5 years ago
Micah Snyder (micasnyd)	e8381da15c	bb12515: Fix for out-of-bounds read in ARJ parser Fix for an out-of-bounds read in the ARJ parser accidentally introduced when adding text normalization and bound checking when parsing filename and comment fields from file headers.	5 years ago
Mickey Sola	ffd0f1357f	png - fixup PR based on feedback	5 years ago
Mickey Sola	19894948b7	ppr/73 - add credit to news; fix formatting to be compliant with clamav standards	5 years ago
Aldo Mazzeo	f366b7c703	Transforming the PNG checker into a PNG exploit seeker	5 years ago
Micah Snyder	54c766d184	Autojunk	5 years ago
Jonas Zaddach (jzaddach)	d5a733ef90	XLM (Excel 4.0) macro detection and extraction XLM is a macro language in Excel that was used before VBA (before 1996). It is still parsed and executed by modern Excel and is gaining popularity with malware authors. This patch adds rudimentary support for detecting and extracting Excel 4.0 (XLM) macros. The code is based on Didier Steven's plugin_biff for oletools.py.	5 years ago
Jonas Zaddach (jzaddach)	965d7cc82e	Fix bug in LLVM bytecode runtime Fixes a bug in the PtrVerifier pass when using LLVM >= v3.5 for the bytecode signature runtime. LLVM 3.5 changed the meaning of "use" and introduced "user". This fix swaps out "use" keywords for "user" so the code functions correctly when using LLVM 3.5+.	5 years ago
Mickey Sola	c6f6b9e67b	dlp - clang-format'd	5 years ago
Mickey Sola	f7af9f2f7e	bb12449: Fix for out-of-bounds read in DLP feature An integer overflow causes an out-of-bounds read that results in a crash. The crash may occur when using the optional Data-Loss-Prevention (DLP) feature to block content that contains credit card numbers. This commit fixes the issue by using a signed index variable.	5 years ago
John Schember	a6a355629d	Add DLP feature to detect credit cards only Add Data-Loss-Prevention option to detect credit cards only, excluding debit and private label cards where possible. You can select the credit card-only DLP mode for clamscan with the `--structured-cc-mode` command-line option. You can select the credit card-only DLP mode for clamd with the `StructuredCCOnly` clamd.conf config option. This patch also adds credit card matching for additional vendors: - Mastercard 2016 - China Union Pay - Discover 2009	5 years ago
Jonas Zaddach (jzaddach)	cd977727f0	Add LZMA & BZip2 decompression to bytecode API Adds LZMA and BZip2 decompression routines to the bytecode API. The ability to decompress LZMA and BZip2 streams is particularly useful for bytecode signatures that extend clamav executable unpacking capabilities. Of note, the LZMA format is not well standardized. This API expects the stream to start with the LZMA_Alone header. Also fixed a bug in LZMA dictionary size setting.	5 years ago
Jonas Zaddach (jzaddach)	b7f8440965	Modernize VBA code extraction from Microsoft Office files - Existing VBA extraction code uses undocumented cache structures. This code uses the documented way of accessing VBA projects. - Adds additional detail to the dumped information: Project name, Project doc string, ... All VBA projects are dumped into a single file. - Malware authors are currently evading detection by spreading malicious code over several projects. It is hard to write signatures if only part of the malicious code is visible.	5 years ago
Mickey Sola	f55acb436e	gif - update PR based on feedback; add dconf option for gif scanning	5 years ago
Mickey Sola	a25d48d7fa	gif - clang formatted; copyright dates fixed	5 years ago
Aldo Mazzeo	153a87a74b	Making the GIF parser more tolerant and supporting GIF overlays	5 years ago
Micah Snyder	50455664a7	libclamav: Fix fmap leak in bytecode runtime Fixes an fmap leak in the bytecode switch_input() API. The switch_input() API provides a way to read from an extracted file instead of reading from the current file. The issue is that the current implementation fails to free the fmap created to read from the extracted file on cleanup or when switching back to the original fmap. In addition, it fails to use the cli_bytecode_context_setfile() function to restore the file_size in the context for the current fmap. Fixes a couple fmap leaks in the unit tests.	5 years ago
Micah Snyder	f5a2584609	libclamav: Fixes scanning of embedded fmaps Specifically this fixes use of cli_map_scandesc(). The cli_map_scandesc() function used to override the current fmap settings with a new size and offset, performing a scan of the embedded content. This broke the ability to iterate backwards through the fmap recursion array when an alert occurs to check each map's hash for whitelist matches. In order to fix this issue, it needed to be possible to duplicate an fmap header for the scan of the embedded file without duplicating the actual map/data. This wasn't feasible with the posix fmap handle implementation where the fmap header, bitmap array, and memory map were all contiguouus. This commit makes it possible by extracting the fmap header and bitmap array from the mmap region, using instead a pointer for both the bitmap array and mmap/data. The resulting posix fmap handle implementation as a result ended up working more similarly to existing the Windows implementation. In addition to the above changes, this commit fixes: - fmap recursion tracking for cli_scandesc() - a recursion tracking issue in cli_scanembpe() error handling	5 years ago
Micah Snyder	cbe2cba4d1	libclamav: Generate hash for each new fmap Signature alerts on content extracted into a new fmap such as normalized HTML resulted in checking FP signatures against the fmap's hash value that was initialized to all zeroes, and never computed. This patch seeks will enable FP signatures of normalized HTML files or other content that is extracted to a new fmap to work. This patch doesn't resolve the issue that normal people will write FP signatures targeting the original file, not the normalized file and thus won't really see benefit from this bug-fix. Additional work is needed to traverse the fmap recursion lists and FP-check all parent fmaps when an alert occurs. In addition, the HTML normalization method of temporarily overriding the ctx->fmap instead of increasing the recursion depth and doing ctx->fmap++/-- will need to be corrected for fmap reverse recursion traversal to work.	5 years ago
Micah Snyder (micasnyd)	87653328e2	fuzz-20432: Fix fmap leak in HWP error handling	5 years ago
Jonas Zaddach	d9db7cd2e2	libclamav: Support for HFS+ compressed files ClamAV doesn't handle compressed attribute for hfs+ file catalog entries. This patch adds support for FLATE compressed files. To accomplish this, we had to find and parse the root/header node of the attributes file, if one exists. Then, parse the attribute map to check if the compressed attribute exists. If compressed, parse the compression header to determine how to decompress it. Support is included for both inline compressed files as well as compressed resource forks. Inflating inline compressed files is straightforward. Inflating a compressed resource fork requires more work: - Find location and size of the resource. - Parse the resource block table. - Inflate and write each block to a temporary file to be scanned. Additional changes needed for this work: - Make hfsplus_fetch_node work for both catalog and attributes. - Figure out node size. - Handle nodes that span several blocks. - If the attributes are missing, or invalid, extraction continues. This behavior is to support malformed files which would also extract on macOS and perhaps other systems. This patch also: - Adds filename extraction for the hfs+ parser. - Skips embedded file type detection for GPT image file types. This prevents double extraction of embedded files, or misclassfication of GPT images as MHTML, for example. This resolves bb12335.	5 years ago
Micah Snyder (micasnyd)	9910d2d426	PDF: Fix error Attempt to allocate 0 bytes The PDF parser currently prints verbose error messages when attempting to shrink a buffer down to actual data length after decoding if it turns out that the decoded stream was empty (0 bytes). With exception to the verbose error messages, there's no real behavior issue. This commit fixes the issue by checking if any bytes were decoded before attempting to shrink the buffer.	5 years ago
Micah Snyder (micasnyd)	47a298b7e5	bb12437: Fix for RTF detection issue Scans performed in the RTC SCAN_CLEANUP macro by the state.cb_end() callback function never save the return value and thus fail to record a detection. This patch sets `ret` so the detection isn't lost.	5 years ago
Mickey Sola	01b9a84fef	autojunk'd	5 years ago
Micah Snyder	ad77f3ba31	fuzz-17763: Fix leak in Xz error handling	5 years ago
Andrew	4fd8cf0845	Add support for the 0x0 and 0x1 AutoIT opcodes These opcodes specify a function or keyword by number instead of by name. The corresponding lookup tables still have a few entries without names, but the majority of them are been determined and verified.	5 years ago
Andrew	ed73e9673f	Fix issue with PROFILE_HASHTABLE dev mode build option The PROFILE_HASHTABLE preprocessor definition can be set at build time and is intended to be used to enable profiling capabilities for developers working with hash table and set data structure profiling. This hashtable profiling functionality was added into the code a while back and isn't currently functional, but would ultimately be nice to have. This commit is a first step towards getting it working. When PROFILE_HASHTABLE is set, it causes several counters used for collecting performance metrics to be inserted into the core hashtable structures. When PROFILE_HASHTABLE is not set, however, these counters are omitted, and the other members of the structure only ever contain constant data. I'm guessing that at some point, as an optimization in the latter case, ClamAV began declaring the hashtable structures `const`, causing gcc (and maybe other compilers) to put the structures in the read-only data section. Thus, the code crashes when PROFILE_HASHTABLE is defined and the counters in the read-only data section try to get incremented. The fix for this is to just not mark these structures as `const` if PROFILE_HASHTABLE is defined.	5 years ago
Jonas Zaddach (jzaddach)	a627d09689	Correct disassembling of bytecode ICMP_ULT instruction	5 years ago
Mickey Sola	9565c92b87	scanners - uncomment utf8 BOM check to improve file type identifiaction and add html scan check to account for false negatives caused by the change	5 years ago
Mickey Sola	5e3322d58d	str - add cli_strntoul to libclamav.map	5 years ago
Mickey Sola	5d411c68fb	bb12461 - error out properly when pdf parser fails to allocate a map; normalize/sanitize user supplied filename and comment info when parsing arj headers; add better bound checking and error handling to arj header parsers	5 years ago
Andy Ragusa (aragusa)	ba6467a6a6	Fixed memory leak reported by oss-fuzz. Added checks to see if realloc succeeds before reassigning the pointers, and made this file build without warnings when compiled with -Wextra.	5 years ago
Andy Ragusa (aragusa)	3616f1fcb9	Fixed memory leak reported by oss-fuzz.	5 years ago
Mickey Sola	0018365456	amp - add new signature for file typing which matches against word xml documents which lack the <w:wordDocument tag	5 years ago
Mickey Sola	da506b6178	bcomp - limit check subsigid	5 years ago
Mickey Sola	9ecbda8105	bcomp - fix memory leak caused by allocation of heap space for subsigid when setting up byte compare scan criteria in cli_bcomp_scanbuf	5 years ago
Mickey Sola	3ac3cf17f2	bb12332 - fix segfault when scanning bz2 compressed iso9660s by limiting total page prefaulting to the 4GB max	5 years ago
Micah Snyder	f907e4b4d7	bb12431: Freshclam progress bar fixes Disable line wrap when printing the progress bar so that small terminal windows do not see excessive lines printed. Reduce the number of characters in the progress bar to accomodate 80-char width terminals. Correctly display number of kilobytes (KiB) in progress bar. Previously was showing # of MiB but printing "KiB".	5 years ago
Micah Snyder (micasnyd)	d5ec8b7a0c	oss-fuzz - 16516 - Fix egg archive memory leak.	5 years ago
Micah Snyder (micasnyd)	25ed70fb46	Fix file access issue if in low privelege process Removing problematic call to convert file descriptors to filepaths. Added filename and tempfile names to scandesc calls in clamd. Added a general scan option to treat the scan engine as unprivileged, meaning that the scan engine will not have read access to the file. Added check to drop a temp file for RAR's where the we don't have read access to the filepath provided (i.e. unprivileged is set, or access() check fails).	5 years ago
Micah Snyder	dd4967ed55	Adds fallthrough hints to alleviate warnings on gcc with -Wextra.	5 years ago

1 2 3 4 5 ...

4608 Commits (b849bbd02eabc0f948930e893bee2e0590a6e8b2)