clamav

Commit Graph

Author	SHA1	Message	Date
Andrew	ead920501b	Fix fmap leak in scan_common when map parameter is NULL scan_common must either be passed an fmap (map) or a file descriptor (desc) corresponding to the file being scanned. In the case where map is NULL, scan_common will create an fmap in order to execute the BC_PRECLASS bytecode hook, and this fmap wasn't being unmapped afterward	5 years ago
Micah Snyder	e0dae24fcc	Fix dupl. fmap name bug, fix fd init in HTML norm Fixed copypaste bug with duplicated fmap names being assigned to the parent instead of the dup/child fmap. Fixed file descriptor initialization issue in the HTML normalizer.	5 years ago
Micah Snyder	c110392780	Change permission for new tmp files from RWX to RW	5 years ago
Micah Snyder	52098c5f57	Eliminate warnings in mail parser Disables run time warning messages emitted by libxml2 when parsing HTML email content for JSON metadata feature. Fixed compile time warning caused by libjson-c API changes from int to size_t.	5 years ago
Micah Snyder	11ef77007b	Improve tmp sub-directory names At present many parsers create tmp subdirectories to store extracted files. For parsers like the vba parser, this is required as the directory is later scanned. For other parsers, these subdirectories are probably not helpful now that we provide recursive sub-dirs when --leave-temps is enabled. It's not quite as simple as removing the extra subdirectories, however. Certain parsers, like autoit, don't create very unique filenames and would result in file name collisions when --leave-temps is not enabled. The best thing to do would be to make sure each parser uses unique filenames and doesn't rely on cli_magic_scan_dir() to scan extracted content before removing the extra subdirectory. In the meantime, this commit gives the extra subdirectories meaningful names to improve readability. This commit also: - Provides the 'bmp' prefix for extracted PE icons. - Removes empty tmp subdirs when extracting rtf files, to eliminate clutter. - The PDF parser sometimes creates tmp files when decompressing streams before it knows if there is actually any content to decompress. This resulted in a large number of empty files. While it would be best to avoid creating empty files in the first place, that's not quite as as it sounds. This commit does the next best thing and deletes the tmp files if nothing was actually extracted, even if --leave-temps is enabled. - Removes the "scantemp" prefix for unnamed fmaps scanned with cli_magic_scan(). The 5-character hashes given to tmp files with prefixes resulted in occasional file name collisions when extracting certain file types with thousands of embedded files. - The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX, resulting in truncated file paths and failed extraction when --leave-temps is enabled and a lot of recursion is in play. This commit switches them from NAME_MAX to PATH_MAX.	5 years ago
Micah Snyder	c545cad161	Only create rfc2397 tmp directory when needed HTML normalization creates a tmp directory for storing rfc2397 style links. The vast majority of html does not make use of rfc2397 and thus an excess of empty tmp directories are generated. This commit alters behavior to only create the rfc2397 directory when required if it does not already exist.	5 years ago
Micah Snyder	9b9999d778	Rename core scanning functions Many of the core scanning functions' names no longer represent their specific purpose or arguments. This commit aims to make the names more intuitive. Names are now prefixed with "magic" if they involve file-typing and file-type parsing. In addition, each function now includes the type of input being scanned whether its "desc", "fmap", or "buff". Some of the APIs also now specify "type" to indicate that a type other than "ANY" may be passed in to select the type rather than use file type magic for type recognition. \| current name \| new name \| \| ------------------------- \| --------------------------------- \| \| magic_scandesc() \| cli_magic_scan() \| \| cli_magic_scandesc_type() \| <delete> \| \| cli_magic_scandesc() \| cli_magic_scan_desc() \| \| cli_base_scandesc() \| cli_magic_scan_desc_type() \| \| cli_partition_scandesc() \| <delete> \| \| cli_map_scandesc() \| magic_scan_nested_fmap_type() \| \| cli_map_scan() \| cli_magic_scan_nested_fmap_type() \| \| cli_mem_scandesc() \| cli_magic_scan_buff() \| \| cli_scanbuff() \| cli_scan_buff() \| \| cli_scandesc() \| cli_scan_desc() \| \| cli_fmap_scandesc() \| cli_scan_fmap() \| \| cli_scanfile() \| cli_magic_scan_file() \| \| cli_scandir() \| cli_magic_scan_dir() \| \| cli_filetype2() \| cli_determine_fmap_type() \| \| cli_filetype() \| cli_compare_ftm_file() \| \| cli_partitiontype() \| cli_compare_ftm_partition() \| \| cli_scanraw() \| scanraw() \|	5 years ago
Micah Snyder	ae77e87880	Add EmbeddedObjects to JSON The metadata projecties JSON structure isn't recording file types found embedded within a file such as self-extracting (SFX) types and office document types (DOCX, PPTX, etc). This presents a problem... At present there's no way to know if the current file has ended and a few file is found tacked on to the end of the first file. If there were, we could simply check if the type found by the raw-scan exists within the first file, or after. If within the first, and the type is an archive then it's reasonable to conclude we're either observing zip headers (for SFXZIP detections) or other files that are not compressed. If the type ISN'T found within the first file, then we definitely have whole new file to parse and we should do so with cli_magic_scan() rather than only using these embedded type scanners. At present we can't ignore SFXZIP detections even if the original file type is a ZIP because we may have found two ZIPs appended together to evade detection (a legitimate trick). As a consequence, we will effectively parse every zip entry twice. The same issue applies to types found within non-compressed archives. This commit adds an EmbeddedObjects list to the metadata JSON object so that the existance of these types is noted. Additionally, this commit removes the two-part int64 cli_jsonint64() implementation as json_object_new_int64() should be available everywhere and the macro to detect such support was never set.	5 years ago
Micah Snyder	005cbf5a37	Record names of extracted files A way is needed to record scanned file names for two purposes: 1. File names (and extensions) must be stored in the json metadata properties recorded when using the --gen-json clamscan option. Future work may use this to compare file extensions with detected file types. 2. File names are useful when interpretting tmp directory output when using the --leave-temps option. This commit enables file name retention for later use by storing file names in the fmap header structure, if a file name exists. To store the names in fmaps, an optional name argument has been added to any internal scan API's that create fmaps and every call to these APIs has been modified to pass a file name or NULL if a file name is not required. The zip and gpt parsers required some modification to record file names. The NSIS and XAR parsers fail to collect file names at all and will require future work to support file name extraction. Also: - Added recursive extraction to the tmp directory when the --leave-temps option is enabled. When not enabled, the tmp directory structure remains flat so as to prevent the likelihood of exceeding MAX_PATH. The current tmp directory is stored in the scan context. - Made the cli_scanfile() internal API non-static and added it to scanners.h so it would be accessible outside of scanners.c in order to remove code duplication within libmspack.c. - Added function comments to scanners.h and matcher.h - Converted a TDB-type macros and LSIG-type macros to enums for improved type safey. - Converted more return status variables from `int` to `cl_error_t` for improved type safety, and corrected ooxml file typing functions so they use `cli_file_t` exclusively rather than mixing types with `cl_error_t`. - Restructured the magic_scandesc() function to use goto's for error handling and removed the early_ret_from_magicscan() macro and magic_scandesc_cleanup() function. This makes the code easier to read and made it easier to add the recursive tmp directory cleanup to magic_scandesc(). - Corrected zip, egg, rar filename extraction issues. - Removed use of extra sub-directory layer for zip, egg, and rar file extraction. For Zip, this also involved changing the extracted filenames to be randomly generated rather than using the "zip.###" file name scheme.	5 years ago
Micah Snyder	9f2de39e04	New tmp sub-dir per scan; JSON meta improvements This commit improves the layout of the tmp file output and the JSON metadata output when using the --leave-temps and --gen-json options. For all scans, each scan target will get a unique tmp sub-directory. If using --leave-temps, that subdir will include the basename of the original file to make it easier to identify. Additionally, when using --leave-temps option, all extracted objects will have their subdirectories extracted in recursive subdirectories including filename prefixes where available. When not using the --leave-temps option, the layout of the tmp sub-directory will remain flat, so as to alleviate the possibility of exceeding PATH_MAX. The JSON metadata generated by the --gen-json option is now generated for all file types, not just a select few. The format is also pretty-printed for readability and now includes filenames and file paths when available. Also: - Added missing ALLMATCH check when determining if bytecode hooks should be run. - Added cl_engine_get_str API to windows libclamav symbol export file.	5 years ago
Mickey Sola	706dd7d7bc	pdf - fixup Aldo's PR based on review by team	5 years ago
Aldo Mazzeo	7d2ce0b32c	Adding support for Adobe Reader X encryption scheme	5 years ago
Micah Snyder	2fbaaaed9a	Fix PNG missing-return recently introduced A missing return statement in png.c for a function that should return a status code is resulting in undefined behavior. In this patch I also added ".PNG" to one of the new heuristic signatures to match others.	5 years ago
Micah Snyder (micasnyd)	a97ce0c837	fuzz-21960: Add missing size checks to vba parser Add missing size checks to validate size data parsed from a VBA file. This fixes a possible buffer overflow read that was caught by oss-fuzz before it made it into any release.	5 years ago
Micah Snyder (micasnyd)	4b7a738152	fuzz-21329: Fix out-of-bounds read in PDF parser Fix for an out-of-bounds read in the PDF parser when initializing aes crypto routines that may result in a crash. Bug found by OSS-Fuzz. Also added checks for the arc4 init routine to mitigate the risk of a similar issue.	5 years ago
Micah Snyder (micasnyd)	e8381da15c	bb12515: Fix for out-of-bounds read in ARJ parser Fix for an out-of-bounds read in the ARJ parser accidentally introduced when adding text normalization and bound checking when parsing filename and comment fields from file headers.	5 years ago
Mickey Sola	ffd0f1357f	png - fixup PR based on feedback	5 years ago
Mickey Sola	19894948b7	ppr/73 - add credit to news; fix formatting to be compliant with clamav standards	5 years ago
Aldo Mazzeo	f366b7c703	Transforming the PNG checker into a PNG exploit seeker	5 years ago
Andy Ragusa (aragusa)	9317e7872a	clamd: Reduce RAM requirements of VirusEvent On some systems, the VirusEvent feature doubles the amount of RAM being used because fork() duplicates the loaded signature database in the new process. This commit changes fork() to vfork() so that VirusEvent won't fail if these systems don't have enough memory.	5 years ago
Micah Snyder (micasnyd)	091c4ee6a7	Updated UnRAR 5.7.5 to 5.9.2. As with the previous libclamunrar commit, this update maintains the hack in dll.cpp:332 that allows skipping of files in solid archives.	5 years ago
Micah Snyder	54c766d184	Autojunk	5 years ago
Patrick Welche	66b071b5b2	autotools: Fix shell compatibility issue Fixes a shell compatibility issue with string comparisons in the clamonacc and libclamav-only M4 files: test(1) uses `=` for string equality. (`==` is a bashism)	5 years ago
Jonas Zaddach (jzaddach)	d5a733ef90	XLM (Excel 4.0) macro detection and extraction XLM is a macro language in Excel that was used before VBA (before 1996). It is still parsed and executed by modern Excel and is gaining popularity with malware authors. This patch adds rudimentary support for detecting and extracting Excel 4.0 (XLM) macros. The code is based on Didier Steven's plugin_biff for oletools.py.	5 years ago
Jonas Zaddach (jzaddach)	965d7cc82e	Fix bug in LLVM bytecode runtime Fixes a bug in the PtrVerifier pass when using LLVM >= v3.5 for the bytecode signature runtime. LLVM 3.5 changed the meaning of "use" and introduced "user". This fix swaps out "use" keywords for "user" so the code functions correctly when using LLVM 3.5+.	5 years ago
Mickey Sola	c6f6b9e67b	dlp - clang-format'd	5 years ago
Mickey Sola	9da15ce163	Add DLP CC-only option to win32 clamd.conf Add the credit card-only DLP option "StructuredCCOnly" to the win32 sample clamd config. Also update NEWS.md to credit John Schember and Alexander Sulfrian for the DLP CC-only mode contribution.	5 years ago
Mickey Sola	f7af9f2f7e	bb12449: Fix for out-of-bounds read in DLP feature An integer overflow causes an out-of-bounds read that results in a crash. The crash may occur when using the optional Data-Loss-Prevention (DLP) feature to block content that contains credit card numbers. This commit fixes the issue by using a signed index variable.	5 years ago
John Schember	a6a355629d	Add DLP feature to detect credit cards only Add Data-Loss-Prevention option to detect credit cards only, excluding debit and private label cards where possible. You can select the credit card-only DLP mode for clamscan with the `--structured-cc-mode` command-line option. You can select the credit card-only DLP mode for clamd with the `StructuredCCOnly` clamd.conf config option. This patch also adds credit card matching for additional vendors: - Mastercard 2016 - China Union Pay - Discover 2009	5 years ago
Jonas Zaddach (jzaddach)	cd977727f0	Add LZMA & BZip2 decompression to bytecode API Adds LZMA and BZip2 decompression routines to the bytecode API. The ability to decompress LZMA and BZip2 streams is particularly useful for bytecode signatures that extend clamav executable unpacking capabilities. Of note, the LZMA format is not well standardized. This API expects the stream to start with the LZMA_Alone header. Also fixed a bug in LZMA dictionary size setting.	5 years ago
Jonas Zaddach (jzaddach)	b7f8440965	Modernize VBA code extraction from Microsoft Office files - Existing VBA extraction code uses undocumented cache structures. This code uses the documented way of accessing VBA projects. - Adds additional detail to the dumped information: Project name, Project doc string, ... All VBA projects are dumped into a single file. - Malware authors are currently evading detection by spreading malicious code over several projects. It is hard to write signatures if only part of the malicious code is visible.	5 years ago
Mickey Sola	f55acb436e	gif - update PR based on feedback; add dconf option for gif scanning	5 years ago
Mickey Sola	a25d48d7fa	gif - clang formatted; copyright dates fixed	5 years ago
Aldo Mazzeo	153a87a74b	Making the GIF parser more tolerant and supporting GIF overlays	5 years ago
Micah Snyder	50455664a7	libclamav: Fix fmap leak in bytecode runtime Fixes an fmap leak in the bytecode switch_input() API. The switch_input() API provides a way to read from an extracted file instead of reading from the current file. The issue is that the current implementation fails to free the fmap created to read from the extracted file on cleanup or when switching back to the original fmap. In addition, it fails to use the cli_bytecode_context_setfile() function to restore the file_size in the context for the current fmap. Fixes a couple fmap leaks in the unit tests.	5 years ago
Micah Snyder	f5a2584609	libclamav: Fixes scanning of embedded fmaps Specifically this fixes use of cli_map_scandesc(). The cli_map_scandesc() function used to override the current fmap settings with a new size and offset, performing a scan of the embedded content. This broke the ability to iterate backwards through the fmap recursion array when an alert occurs to check each map's hash for whitelist matches. In order to fix this issue, it needed to be possible to duplicate an fmap header for the scan of the embedded file without duplicating the actual map/data. This wasn't feasible with the posix fmap handle implementation where the fmap header, bitmap array, and memory map were all contiguouus. This commit makes it possible by extracting the fmap header and bitmap array from the mmap region, using instead a pointer for both the bitmap array and mmap/data. The resulting posix fmap handle implementation as a result ended up working more similarly to existing the Windows implementation. In addition to the above changes, this commit fixes: - fmap recursion tracking for cli_scandesc() - a recursion tracking issue in cli_scanembpe() error handling	5 years ago
Micah Snyder	cbe2cba4d1	libclamav: Generate hash for each new fmap Signature alerts on content extracted into a new fmap such as normalized HTML resulted in checking FP signatures against the fmap's hash value that was initialized to all zeroes, and never computed. This patch seeks will enable FP signatures of normalized HTML files or other content that is extracted to a new fmap to work. This patch doesn't resolve the issue that normal people will write FP signatures targeting the original file, not the normalized file and thus won't really see benefit from this bug-fix. Additional work is needed to traverse the fmap recursion lists and FP-check all parent fmaps when an alert occurs. In addition, the HTML normalization method of temporarily overriding the ctx->fmap instead of increasing the recursion depth and doing ctx->fmap++/-- will need to be corrected for fmap reverse recursion traversal to work.	5 years ago
Micah Snyder (micasnyd)	37b4d49cf9	bb12282: Fix clamd unix socket dir creation perms If the clamd.conf enables the LocalSocket option and sets the unix socket file in a directory that does not exist, clamd creates the missing directory but with invalid 000 permissions bits, causing socket creation to fail. This patch sets the umask temporarily to allow creation of the directory w/ dwrxwr-wr- (766) permissions.	5 years ago
Micah Snyder (micasnyd)	87653328e2	fuzz-20432: Fix fmap leak in HWP error handling	5 years ago
Jonas Zaddach	d9db7cd2e2	libclamav: Support for HFS+ compressed files ClamAV doesn't handle compressed attribute for hfs+ file catalog entries. This patch adds support for FLATE compressed files. To accomplish this, we had to find and parse the root/header node of the attributes file, if one exists. Then, parse the attribute map to check if the compressed attribute exists. If compressed, parse the compression header to determine how to decompress it. Support is included for both inline compressed files as well as compressed resource forks. Inflating inline compressed files is straightforward. Inflating a compressed resource fork requires more work: - Find location and size of the resource. - Parse the resource block table. - Inflate and write each block to a temporary file to be scanned. Additional changes needed for this work: - Make hfsplus_fetch_node work for both catalog and attributes. - Figure out node size. - Handle nodes that span several blocks. - If the attributes are missing, or invalid, extraction continues. This behavior is to support malformed files which would also extract on macOS and perhaps other systems. This patch also: - Adds filename extraction for the hfs+ parser. - Skips embedded file type detection for GPT image file types. This prevents double extraction of embedded files, or misclassfication of GPT images as MHTML, for example. This resolves bb12335.	5 years ago
Micah Snyder (micasnyd)	9910d2d426	PDF: Fix error Attempt to allocate 0 bytes The PDF parser currently prints verbose error messages when attempting to shrink a buffer down to actual data length after decoding if it turns out that the decoded stream was empty (0 bytes). With exception to the verbose error messages, there's no real behavior issue. This commit fixes the issue by checking if any bytes were decoded before attempting to shrink the buffer.	5 years ago
Micah Snyder (micasnyd)	6e17eb5e97	Adds missing clamscan --max-scantime documentation	5 years ago
Micah Snyder	6bd0650b5c	Updates to NEWS.md	5 years ago
Micah Snyder (micasnyd)	47a298b7e5	bb12437: Fix for RTF detection issue Scans performed in the RTC SCAN_CLEANUP macro by the state.cb_end() callback function never save the return value and thus fail to record a detection. This patch sets `ret` so the detection isn't lost.	5 years ago
Mickey Sola	da6ae059b5	NEWS - Thank Aldo For notable contributions to PDF, GIF, and PDF parsing	5 years ago
Mickey Sola	944ee2c024	bb12477 - clamdtop - Fix memory leaks fixed a leak where host and port were not being properly cleaned up cleaned up error handling for make_connection_real function added various null param checks	5 years ago
Mickey Sola	01b9a84fef	autojunk'd	5 years ago
Mickey Sola	3ed42fcca4	bb12494 - Ensure libclamav-only build w/o curl a problem existed in which specifying --enable-libclamav-only would fail if curl was not installed on the system this fix puts a check in place which will ensure the curl check code is not run if the option is turned on in the future if curl becomes required in libclamav this check will need to be removed	5 years ago
Micah Snyder (micasnyd)	5485787a92	bb12504: Custom CA paths for freshclam, clamsubmit The newer freshclam uses libcurl for downloads and downloads the updates via https. There are systems which don't have a "default CA store" but instead the administrator maintains a CA-bundle of certs they trust. This patch allows the users to specify their own CA cert path by setting the environment variable CURL_CA_BUNDLE to the path of their choice. Patch courtesy of Sebastian A. Siewior	5 years ago
Micah Snyder (micasnyd)	1c683bf09a	bb12471: Fix clamdscan memory leak on exit	5 years ago

1 2 3 4 5 ...

10245 Commits (ead920501b77b2a430ad7df16a5d72551e7e86bf) All Branches Search

10245 Commits (ead920501b77b2a430ad7df16a5d72551e7e86bf)

All Branches