clamav

Commit Graph

Author	SHA1	Message	Date
Micah Snyder (micasnyd)	c637de532b	Disable embedded type recognition for disk images Using file type recognition scan mode for disk images and other raw archive formats is problematic. One simple reason is that the contained files will be detected and parsed and scanned twice, first when deteced by the type recog scan, and later when the archive is extracted and the files are properly scanned. Another reason is an increased likelihood for incorrect type recognition, as seen with supposed MHTML files (they weren't) found in GPT disk images. Though a previous patch disabled embedded type recognition for GPT files, this one extens this to the following: - CL_TYPE_CPIO_OLD - CL_TYPE_ZIP - CL_TYPE_OLD_TAR - CL_TYPE_POSIX_TAR ZIP is included because file entries in a ZIP are incorrectly detected as ZIPSFX's and though we also ensure not to scan ZIPSFX's found in ZIP's, it's more efficient not to do the type recognition in the first place and it prevents us from adding those bogus ZIPSFX entries into the scan properties JSON. This patch also fixes what appears to be a copy-paste typo, where CL_TYPE_ISHIELD_MSI types were accidentally having their container value set to CL_TYPE_AUTOIT.	5 years ago
Micah Snyder	5d7e54c0bf	Code review fixes Exit early from VBA scanning loop if virus found. Add VBA/XLM suffix to ContainsMacros heuristics. Fix setting status code for error and virus conditions. Increment/decrement recursion counter when scanning vba dir.	5 years ago
Micah Snyder	07a66adc75	Fix bug added in previous patch, fixup unit tests to use newly added sanitized_basename parameter.	5 years ago
Micah Snyder	860764eb16	Heuristic macro detection for imp VBA extraction Notably the commit adds a heuristic alert when VBA is extracted using the new VBA extraction code and similarly adds "HasMacros":true to the JSON scan properties. In addition, a change was added to the cli_sanitize_filepath() function so it converts posix pathseps to Windows pathseps on Windows and also outputs a sanitized basename pointer (optional) which is used when generating a temporary filename so that using a prefix with pathseps in it won't cause file creation failures (observed with --leave-temps where original filenames are incorporated into temporarily filenames). Included soem error handling improvements for cli_vba_scandir() to better track alert and macro detections. Downgraded utf8 conversion error messages to debug messages because they are too verbose in files with invalid filenames (observed in some malware). Changed the xlm macro and vba project temp filenames to include "xlm_macros" and "vba_project" prefix, to make it easier to find them. Relocated XLM and VBA temp files from the top-level tmp directory to the current sub_tmpdir, so tempfiles for a given scan are more organized.	5 years ago
Micah Snyder	b1dbf93f0b	Fix newly introduced VBA/XLM OLE2 bugs Fix an infinite loop in the new XLM macro parser. Fix error handling, resource cleanup in OLE2 parser. Fix issues tracking detected "viruses" in VBA & OLE2 parsers affecting non-allmatch (regular) scan mode, wherein multiple viruses may be found but each record lost and the overall detection comes up clean. Also silence switch() fall-through warning for WORD/PPT/XL/HWP (OOXML) file type fall-throughs to the ZIP parser (because they are zips). Also silence switch() fall-through warning when handling the limits- exceeded error types, checking for the limits-exceeded heuristic, and continuing on to bail out with a clean verdict.	5 years ago
Micah Snyder	78c8cb0fe1	fuzz-20756: Fix memory leak in mail parser Fixes at least one memory leak in the mail parser caused by improper cleanup of multipart messages.	5 years ago
Micah Snyder	e0132def09	Formatting touchup	5 years ago
Mickey Sola	9ea3b93018	Recurse all fpmaps when doing fpchecks Changes cli_checkfp_virus to a recursive function which checks all parent fmaps in the context for false positives Simplifies params needed for cli_checkfp_virus to use the current digest and fmap length that resides within the fmap struct itself	5 years ago
Micah Snyder	e409920298	Fix assorted warnings Add missing ping_clamd() declaration in client.h Fix check for ping option to first check if ping option is NULL before strdup'ing and checking if the alloc failed. Fix format string for uint64_t print. Correctly assign name pointer to stack buffer in cpio parser. Remove vestigial variables from insert_list() function matcher-ac.c, left over from before the load-time optimizations completely restructured everything. Silence warnings about unused parameters in progress bar callback function.	5 years ago
Micah Snyder	42ac62edd2	HTML-normalization valgrind fix for Alpine Valgrind reports uninitialized `tag` stack buffer being used. While this appears to be a false positive, it can't hurt to initialize this and similar buffers in this function.	5 years ago
Micah Snyder (micasnyd)	b35d1e1bec	fuzz-24354: Fix unknown read in VBA parser Fixes bound checks in recently rewritten VBA parser code (i.e. issue does not affect prior versions). Also improves VBA terminator header parsing to better match the spec, per recommendation by Jonas Zaddach.	5 years ago
Micah Snyder (micasnyd)	205d8dcd6e	fuzz-24408: Fix NULL-deref bug in PDF parser Fixes a NULL-dereference bug recently added when improving support for PDF decryption. Issue does not affect prior versions.	5 years ago
Andy Ragusa (aragusa)	3c556dc1a2	Fixed pdf timeout. The function to parse the object dictionaries fails if the dictionary ends during a comment string. Set the end in that condition.	5 years ago
Jan Smutny	b33e4be3ea	zip: Fix false negative w. HeuristicScanPrecedence	5 years ago
Micah Snyder	e2f59af30a	Clang-format touchup	5 years ago
Micah Snyder (micasnyd)	8db5fcae6f	Add unit tests for conv to UTF-8 Also relocated codepage table from msdoc.h to entconv.h Also adds new macros for codepages to reduce use of magic numbers when referencing code pages elsewhere in libclamav.	5 years ago
Micah Snyder (micasnyd)	244ff86cad	XLM: Fix coverity memory corruption warning 294429: Negative check for fd_out occurs after a call to fdopen where the value must not be negative. Coverity interprets this as a high severity issue, even though it really isn't. Removing the needless check should silence the false positive.	5 years ago
Andy Ragusa (aragusa)	0b0fc4bb92	Fix build issue with newer versions of Bison Somewhere between bison versions 3.5 and 3.6.4, having multiple characters in a single-quoted string went from a warning to an error. This commit corrects that by changing a literal to string,	5 years ago
Micah Snyder	1db4787f8a	Remove autotools generated files, add autogen.sh Removed all autotools generates files. Autotools (autoconf, automake, libtool, pkg-config, m4) will be required from now on for builds from git clones. Added autogen.sh to be run before ./configure. Significant update to main .gitignore file. Removed extraneous .gitignore files. A Git repository only needs one .gitignore file.	5 years ago
Micah Snyder	9b03090a0a	Autotools compatibility fixes Fixes breaking issues when using autoconf 2.69 and automake 1.15.	5 years ago
lutianxiong	38622da97f	Fix int64 overflow check Overflow check "(value >> 32) * 10 < INT32_MAX" may not work in certain conditions, e.g. value is 0xcccccccdbcdc9cc Note: This fixes oss-fuzz bug 16117.	5 years ago
Clement Lecigne	4c96f017f9	pdf: do not override pdf->fileIDlen if there is no new fileID.	5 years ago
Micah Snyder (micasnyd)	6198778903	Additional XLM parser error handling fixes Improve error handling for functions that read the XLM BIFF temp-files. Improve resource cleanup to alleviate Coverity false positive issue.	5 years ago
Andrew	319bfb51a5	Fix several coverity warnings 290424 Missing break in switch - In hash_match: Missing break statement between cases in switch statement 290414 Resource leak - In cli_scanishield_msi: Leak of memory or pointers to system resources. Memory leak in a fail case 288197 Resource leak - In decrypt_any: Leak of memory or pointers to system resources. Memory leak in a fail case 290426 Resource leak - In cli_magic_scan: Leak of memory or pointers to system resources. Leaked a file prefix when running with --save-temps 192923 Resource leak - In cli_scanrar: Leak of memory or pointers to system resources. Leaked a file descriptor if a virus was found in a RAR file comment 225146 Resource leak - In cli_scanegg: Leak of memory or pointers to system resources. Leaked a file descriptor if unable to write a comment file to disk 290425 Resource leak - In scan_common: Leak of memory or pointers to system resources. Memory leaks in various fail cases. Also changes cli_scanrar to write out the file comment only if --leave-temps is specified and scan the buffer (like what is done in cli_scanegg) instead of writing the file out, scanning that, and then deleting the file if --leave-temps is not specified. The unit tests stopped working when correcting an issue with a switch statement that determined what type of signature had matched on a Google SafeBrowsing GDB rule. Looking into the unit tests, it looks like the code had always assumed that the test cases would be detected by a malware test rule in unit_tests/input/daily.gdb, but now some of the tests get matched on the phishing test rule. I updated the test logic to be more clear, and added tests for both cases now. Fix some memory leaks in libclamav/scanners.c	5 years ago
Micah Snyder (micasnyd)	e830b45ca7	Fix unitialized name buffer in CPIO parser Fixes a possible stack buffer overflow introduced in 0.103 development when we added optional names to file maps (fmaps). The CPIO parser uses a stack buffer to store the name (if present). If no name present, then the stack buffer was passed unitialized to the fmap scanning function which could cause an overflow. This fix both initializes the buffer and uses a pointer so the scan function gets NULL instead of a buffer in the event that a name isn't present as that's the intended way to use the API, rather than passing an empty string name buffer.	5 years ago
Andy Ragusa	65e3394aa6	Fixed uninitialized variable warning found by coverity	5 years ago
Andy Ragusa (aragusa)	2049078622	fuzz-22348 null deref in egg utf8 conversion Corrected memory leaks and a null dereference in the egg utf8 conversion.	5 years ago
Micah Snyder (micasnyd)	8081a6b06c	Fix new XLM parser stack overflow Fixes a stack overflow that resulted in stack corruption and general mayhem. This bugfix only applies to the 0.103 dev branch. The issue was caused by buffering formatted XLM macro content to a small buffer without regard for possible overflow. Instead of buffering manually, use of snprintf and later cli_writen were replaced with direct calls to fwrite / fprintf / fputc.	5 years ago
Micah Snyder (micasnyd)	16c6e3740f	bb12569: Fix json-c include path issue Fix include path issue for systems that don't have /usr/local/include in their default include path.	5 years ago
Andy Ragusa (aragusa)	305df4091a	fuzz-22211: correct fix to heap over-read in ARJ parser. The ARJ parser fix from 0.102.3 was insufficient and still allowed for some overflow. This commit passes correct buffer sizes into text_normalize functions.	5 years ago
Micah Snyder	cd2f2975b9	Docs: Warn against running untrusted bytecode Add notices to man pages and help strings cautioning against running bytecode signatures from untrusted sources. Also adds missing BytecodeUnsigned option to clamd.conf.sample files.	5 years ago
Micah Snyder (micasnyd)	407407c98c	clamd clients: Mitigate move/remove symlink attack A malicious user could replace a scan target's directory with a symlink to another path to trick clamscan, clamdscan, or clamonacc into removing or moving a different file (eg. a critical system file). The issue would affect users that use the `--move` or `--remove` options for clamscan, clamdscan, and clamonacc. This patch gets the real path for the scan target before the scan, and if the file alerts and the --move or --remove quarantine features are used, it mitigates the symlink attack by traversing the path one directory at a time until reaching the leaf directory where the scan target file resides before unlinking (or renaming) the file directly. This commit applies a similar tactic used in the previous commit for Windows builds, using the Win32 Native API to traverse a path and delete or move files by handle rather than by file path. I had some trouble using SetFileInformationByHandle to rename a file by handle, so for Windows instead it will copy the file to the new location and then use the safe unlink technique to remove the old file. If the symlink attack occurs, the unlink will fail, and the system will not be damaged. For more information about AV quarantine attacks using links, see the [RACK911 Lab's report](https://www.rack911labs.com/research/exploiting-almost-every-antivirus-software)	5 years ago
Micah Snyder (micasnyd)	cdbc833a32	PDF: Delay Javascript detection until JS found PDFs may contain Javascript actions in objects. Those may actually be indirect references to other objects where the Javascript resides rather than storing the Javascript in the current object. The thing is, sometimes the indirect object is empty (no actual script), in which case the PDF may not have any active content. This commit changes the Javascript detection logic so it only records the stats/JSON after detecting the "Javascript" and "JS" tags in the object (indirect or direct) where the Javascript is supposed to reside. This moves the logic from an action callback when the object dictionary key "Javascript" is detected over into the object extraction logic, after all of the objects have been parsed.	5 years ago
Andrew	f19f69c7ee	Comment out the filter_search call in regex_list_match Reviewing Coverity bug reports we found that the return value to this filter_search call was effectively being ignored, causing no filtering to occur. Fixing this issue resulted in a unit test that uses the following match list regex to fail when searching for `ebay.com`.: .+\\.paypal\\.(com\|de\|fr\|it)([/?].*)?:.+\\.ebay\\.(at\|be\|ca\|ch\|co\\.uk\|de\|es\|fr\|ie\|in\|it\|nl\|ph\|pl\|com(\\.(au\|cn\|hk\|my\|sg))?)/ After investigating further, this is because the regex_list_add_pattern call, which parses the regex for suffixes and attempts to add these to the filter, can't handle the `com(\\.(au\|cn\|hk\|my\|sg))?` portion of the regex. As a result, it only adds `ebay.at`, `ebay.be`, `ebay.ca`, up through `ebay.pl` into the filter). With the code returning if no filter match is found, the `ebay.com` suffix not existing in the filter causes incoming URLs to be treated as if there are no corresponding regexes for ebay.com, which results in no regex rules being evaluated against it. We should get the regex parsing code working (and ensure it handles any other complex cases in daily.cdb) before re-enabling this code. The code has had no effect for 12+ years at this point, though, so it's probably safe to wait a bit longer without it.	5 years ago
Andrew	2429b8dfa7	More Coverity bug fixes Fixed the following Coverity issues: - 225236 - In cli_egg_extract_file: Dereference of an explicit null value (CWE-476). The first fail case checked handle for NULL and then dereferenced it in the done block - 225209 - In executeIfNewVersion: Leak of memory or pointers to system resources (CWE-404). modifiedCommand was defined twice, with the inner instance being assigned to and the outer instance being freed - 225201 - In regex_list_match: Code can never be reached because of a logical contradiction (CWE-561). The code had logic off to the side that may have been missed: filter_search_rc = filter_search(&matcher->filter, (const unsigned char )bufrev, buffer_len) != -1; if (filter_search_rc == -1) { - 225198 - In phishingCheck: Leak of memory or pointers to system resources (CWE-404). A fail case caused by malloc failing would leak previously allocated memory. - 225197 - In updatecustomdb: A pointer to freed memory is dereferenced, used as a function argument, or otherwise used (CWE-416). In a fail case, a pointer was freed and then used in a debug print statement - 225190 - In updatedb: A pointer to freed memory is dereferenced, used as a function argument, or otherwise used (CWE-416). In a fail case, a pointer was freed and then used in a debug print statement - 225195 - In cli_egg_open: The sizeof operator is used on a wrong argument that incidentally has the same size (CWE-467). sizeof(char ) was being used instead of sizeof(char ) - 225193 - In egg_parse_comment_header: Code can never be reached because of a logical contradiction (CWE-561). A cleanup case for variable comment was unnecessary, and to fix comment was removed entirely. - 225147 - In get_server_node: Code can never be reached because of a logical contradiction (CWE-561). A cleanup case for variable url was unnecessary - 225168 - In download_complete_callback: Missing break statement between cases in switch statement (CWE-484). In the case where forking failed, freshclam would check the database without forking but then continue on to execute the code intended to be done in the child process because of a missing break statement - 225152 - In cli_egg_lzma_decompress: Use of an uninitialized variable (CWE-457). Certain fail cases would call cli_LzmaShutdown on an uninitialized stream. Now it’s only called after initialization occurs.	5 years ago
Andrew	1d66184a7d	More Coverity bug fixes Looking through the list of issues, I spotted some easy ones and submitted some fixes: - 225229 - In cli_rarload: Leak of memory or pointers to system resources. If finding the necessary libunrar functions fails (should be rare),we now dlclose libunrar. 225224 - In main (freshclam.c): A copied piece of code is inconsistent with the original (CWE-398). A minor copy-paste error was present, and optOutList could be cleaned up in one of the failure edge cases. 225228 - In decodecdb: Out-of-bounds access to a buffer (CWE-119). Off by one error when tokenizing certain CDB sig fields for printing with sigtool. Ex: $ cat test.cdb a:CL_TYPE_7Z:1-2-3:/./:1-2-3:1-2-3:0:1-2-3:: $ cat test.cdb \| ../installed/bin/sigtool --decode VIRUS NAME: a CONTAINER TYPE: CL_TYPE_7Z CONTAINER SIZE: WITHIN RANGE 1 to 2 FILENAME REGEX: /./ COMPRESSED FILESIZE: WITHIN RANGE 1 to 2 UNCOMPRESSED FILESIZE: WITHIN RANGE 1 to 2 ENCRYPTION: NO FILE POSITION: ================================================================= ==17245==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffe3136d10 at pc 0x7f0f31c3f414 bp 0x7fffe3136c70 sp 0x7fffe3136c60 WRITE of size 8 at 0x7fffe3136d10 thread T0 #0 0x7f0f31c3f413 in cli_strtokenize ../../libclamav/str.c:524 #1 0x559e9797dc91 in decodecdb ../../sigtool/sigtool.c:2929 #2 0x559e9797ea66 in decodesig ../../sigtool/sigtool.c:3058 #3 0x559e9797f31e in decodesigs ../../sigtool/sigtool.c:3162 #4 0x559e97981fbc in main ../../sigtool/sigtool.c:3638 #5 0x7f0f3100fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96) #6 0x559e9795a1d9 in _start (/home/zelda/workspace/clamav-devel/installed/bin/sigtool+0x381d9) Address 0x7fffe3136d10 is located in stack of thread T0 at offset 48 in frame #0 0x559e9797d113 in decodecdb ../../sigtool/sigtool.c:2840 This frame has 1 object(s): [32, 48) 'range' <== Memory access at offset 48 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions are supported) SUMMARY: AddressSanitizer: stack-buffer-overflow ../../libclamav/str.c:524 in cli_strtokenize - 225223 - In cli_egg_deflate_decompress: Reads an uninitialized pointer or its target (CWE-457). Certain fail cases would call inflateEnd on an uninitialized stream. Now it’s only called after initialization occurs. - 225220 - In buildcld: Use of an uninitialized variable (CWE-457). Certain fail cases would result in oldDir being used before initialization. It now gets zeroed before the first fail case. - 225219 - In cli_egg_open: Leak of memory or pointers to system resources (CWE-404). If certain realloc’s failed, several structures would not be cleaned up - 225218 - In cli_scanhwpml: Code block is unreachable because of the syntactic structure of the code (CWE-561). With certain macros set, there could be two consecutive return statements.	5 years ago
Andrew	035265b96f	Bug fixes related to the recent HFS+/VBA/OLE2/XLM code changes This commit includes bug fixes and minor modifications based on warnings generated by Coverity. These include: - 287096 - In cli_xlm_extract_macros: Leak of memory or pointers to system resources (CWE-404). This was a legitimate leak of a generated temp filename and could occur frequently. - 287095 - In scan_for_xlm_macros: Use of an uninitialized variable. The uninitialized value (state.length) was likely never used unitialized, but we now initialize it just in case. - 287094 - In cli_vba_readdir_new: Out-of-bounds access to a buffer (CWE-119). This looks like a copy-paste error and was a legitimate read past the bounds of a buffer in an error case. - 284479 - In hfsplus_walk_catalog: All paths that lead to this null pointer comparison already dereference the pointer earlier (CWE-476). In certain cases a NULL pointer could be returned in the success case of hfsplus_scanfile, which was not handled correctly. This case may have been prevented in practice by an earlier check, but adding a check for NULL just in case. - 284478 - In hfsplus_walk_catalog: A value assigned to a variable is never used. ret would be set if zlib's inflateEnd function fails. The fix is to just not set ret in this case, since the error doesn't seem fatal (although would result in a memory leak by the zlib code...). - 284477 - In hfsplus_check_attribute: Pointer is checked against null but then dereferenced anyway. I just took out the NULL check of record and recordSize, since the code requires these values to not be NULL elsewhere and there's no way an error could occur as currently used (stack var addresses are passed via these parameters). I also fixed up some of the function identifiers in debug print messages.	5 years ago
Micah Snyder	e01ba94e36	bb12506: Fix phishing/heuristic alert verbosity Some detections, like phishing, are considered heuristic alerts because they match based on behavior more than on content. A subset of these are considered "potentially unwanted" (low-severity). These low-severity alerts include: - phishing - PDFs with obfuscated object names - bytecode signature alerts that start with "BC.Heuristics" The concept is that unless you enable "heuristic precedence" (a method of lowing the threshold to immediateley alert on low-severity detections), the scan should continue after a match in case a higher severity match is found. Only at the end will it print the low-severity match if nothing else was found. The current implementation is buggy though. Scanning of archives does not correctly bail out for the entire archive if one email contains a phishing link. Instead, it sets the "heuristic found" flag then and alerts for every subsequent file in the archive because it doesn't know if the heuristic was found in an embedded file or the target file. Because it's just a heuristic and the status is "clean", it keeps scanning. This patch corrects the behavior by checking if a low-severity alerts were found at the end of scanning the target file, instead of at the end of each embedded file. Additionally, this patch fixes an in issue with phishing alerts wherein heuristic precedence mode did not cause a scan to stop after the first alert. The above changes required restructuring to create an fmap inside of cl_scandesc_callback() so that scan_common() could be modified to require an fmap and set up so that the current *ctx->fmap pointer is never NULL when scan_common() evaluates match results. Also fixed a couple minor bugs in the phishing unit tests and cleaned up the test code for improved legitibility and type safety.	5 years ago
Micah Snyder	053ce64c6f	Reduce likelihood of tmp file name collisions In regression testing we observed occasional errors creating files or directories. It appears as though these are caused by tmp files that use identical prefixes. Such prefixed tmp file names currently have a 5-character hash suffix. Though these temporary files or directories are deleted when no longer needed, there is still the possibility that there could be a hash collision which causes the scan to error out. This patch doubles the size of the short-hash to 10-characters to reduce the chance of a collision.	5 years ago
Micah Snyder	d1f209e879	Fix fmap NULL deref in preclass bytecode hook If using a bytecode signature that makes use of the BC_PRECLASS hook and if it alerts, a NULL dereference may occur. This change fixes that. Also fixed unrelated memory leaks introduced recently when adding file name extraction to the zip parser and rar parser.	5 years ago
Andrew	ead920501b	Fix fmap leak in scan_common when map parameter is NULL scan_common must either be passed an fmap (map) or a file descriptor (desc) corresponding to the file being scanned. In the case where map is NULL, scan_common will create an fmap in order to execute the BC_PRECLASS bytecode hook, and this fmap wasn't being unmapped afterward	5 years ago
Micah Snyder	e0dae24fcc	Fix dupl. fmap name bug, fix fd init in HTML norm Fixed copypaste bug with duplicated fmap names being assigned to the parent instead of the dup/child fmap. Fixed file descriptor initialization issue in the HTML normalizer.	5 years ago
Micah Snyder	c110392780	Change permission for new tmp files from RWX to RW	5 years ago
Micah Snyder	52098c5f57	Eliminate warnings in mail parser Disables run time warning messages emitted by libxml2 when parsing HTML email content for JSON metadata feature. Fixed compile time warning caused by libjson-c API changes from int to size_t.	5 years ago
Micah Snyder	11ef77007b	Improve tmp sub-directory names At present many parsers create tmp subdirectories to store extracted files. For parsers like the vba parser, this is required as the directory is later scanned. For other parsers, these subdirectories are probably not helpful now that we provide recursive sub-dirs when --leave-temps is enabled. It's not quite as simple as removing the extra subdirectories, however. Certain parsers, like autoit, don't create very unique filenames and would result in file name collisions when --leave-temps is not enabled. The best thing to do would be to make sure each parser uses unique filenames and doesn't rely on cli_magic_scan_dir() to scan extracted content before removing the extra subdirectory. In the meantime, this commit gives the extra subdirectories meaningful names to improve readability. This commit also: - Provides the 'bmp' prefix for extracted PE icons. - Removes empty tmp subdirs when extracting rtf files, to eliminate clutter. - The PDF parser sometimes creates tmp files when decompressing streams before it knows if there is actually any content to decompress. This resulted in a large number of empty files. While it would be best to avoid creating empty files in the first place, that's not quite as as it sounds. This commit does the next best thing and deletes the tmp files if nothing was actually extracted, even if --leave-temps is enabled. - Removes the "scantemp" prefix for unnamed fmaps scanned with cli_magic_scan(). The 5-character hashes given to tmp files with prefixes resulted in occasional file name collisions when extracting certain file types with thousands of embedded files. - The VBA and TAR parsers mistakenly used NAME_MAX instead of PATH_MAX, resulting in truncated file paths and failed extraction when --leave-temps is enabled and a lot of recursion is in play. This commit switches them from NAME_MAX to PATH_MAX.	5 years ago
Micah Snyder	c545cad161	Only create rfc2397 tmp directory when needed HTML normalization creates a tmp directory for storing rfc2397 style links. The vast majority of html does not make use of rfc2397 and thus an excess of empty tmp directories are generated. This commit alters behavior to only create the rfc2397 directory when required if it does not already exist.	5 years ago
Micah Snyder	9b9999d778	Rename core scanning functions Many of the core scanning functions' names no longer represent their specific purpose or arguments. This commit aims to make the names more intuitive. Names are now prefixed with "magic" if they involve file-typing and file-type parsing. In addition, each function now includes the type of input being scanned whether its "desc", "fmap", or "buff". Some of the APIs also now specify "type" to indicate that a type other than "ANY" may be passed in to select the type rather than use file type magic for type recognition. \| current name \| new name \| \| ------------------------- \| --------------------------------- \| \| magic_scandesc() \| cli_magic_scan() \| \| cli_magic_scandesc_type() \| <delete> \| \| cli_magic_scandesc() \| cli_magic_scan_desc() \| \| cli_base_scandesc() \| cli_magic_scan_desc_type() \| \| cli_partition_scandesc() \| <delete> \| \| cli_map_scandesc() \| magic_scan_nested_fmap_type() \| \| cli_map_scan() \| cli_magic_scan_nested_fmap_type() \| \| cli_mem_scandesc() \| cli_magic_scan_buff() \| \| cli_scanbuff() \| cli_scan_buff() \| \| cli_scandesc() \| cli_scan_desc() \| \| cli_fmap_scandesc() \| cli_scan_fmap() \| \| cli_scanfile() \| cli_magic_scan_file() \| \| cli_scandir() \| cli_magic_scan_dir() \| \| cli_filetype2() \| cli_determine_fmap_type() \| \| cli_filetype() \| cli_compare_ftm_file() \| \| cli_partitiontype() \| cli_compare_ftm_partition() \| \| cli_scanraw() \| scanraw() \|	5 years ago
Micah Snyder	ae77e87880	Add EmbeddedObjects to JSON The metadata projecties JSON structure isn't recording file types found embedded within a file such as self-extracting (SFX) types and office document types (DOCX, PPTX, etc). This presents a problem... At present there's no way to know if the current file has ended and a few file is found tacked on to the end of the first file. If there were, we could simply check if the type found by the raw-scan exists within the first file, or after. If within the first, and the type is an archive then it's reasonable to conclude we're either observing zip headers (for SFXZIP detections) or other files that are not compressed. If the type ISN'T found within the first file, then we definitely have whole new file to parse and we should do so with cli_magic_scan() rather than only using these embedded type scanners. At present we can't ignore SFXZIP detections even if the original file type is a ZIP because we may have found two ZIPs appended together to evade detection (a legitimate trick). As a consequence, we will effectively parse every zip entry twice. The same issue applies to types found within non-compressed archives. This commit adds an EmbeddedObjects list to the metadata JSON object so that the existance of these types is noted. Additionally, this commit removes the two-part int64 cli_jsonint64() implementation as json_object_new_int64() should be available everywhere and the macro to detect such support was never set.	5 years ago
Micah Snyder	005cbf5a37	Record names of extracted files A way is needed to record scanned file names for two purposes: 1. File names (and extensions) must be stored in the json metadata properties recorded when using the --gen-json clamscan option. Future work may use this to compare file extensions with detected file types. 2. File names are useful when interpretting tmp directory output when using the --leave-temps option. This commit enables file name retention for later use by storing file names in the fmap header structure, if a file name exists. To store the names in fmaps, an optional name argument has been added to any internal scan API's that create fmaps and every call to these APIs has been modified to pass a file name or NULL if a file name is not required. The zip and gpt parsers required some modification to record file names. The NSIS and XAR parsers fail to collect file names at all and will require future work to support file name extraction. Also: - Added recursive extraction to the tmp directory when the --leave-temps option is enabled. When not enabled, the tmp directory structure remains flat so as to prevent the likelihood of exceeding MAX_PATH. The current tmp directory is stored in the scan context. - Made the cli_scanfile() internal API non-static and added it to scanners.h so it would be accessible outside of scanners.c in order to remove code duplication within libmspack.c. - Added function comments to scanners.h and matcher.h - Converted a TDB-type macros and LSIG-type macros to enums for improved type safey. - Converted more return status variables from `int` to `cl_error_t` for improved type safety, and corrected ooxml file typing functions so they use `cli_file_t` exclusively rather than mixing types with `cl_error_t`. - Restructured the magic_scandesc() function to use goto's for error handling and removed the early_ret_from_magicscan() macro and magic_scandesc_cleanup() function. This makes the code easier to read and made it easier to add the recursive tmp directory cleanup to magic_scandesc(). - Corrected zip, egg, rar filename extraction issues. - Removed use of extra sub-directory layer for zip, egg, and rar file extraction. For Zip, this also involved changing the extracted filenames to be randomly generated rather than using the "zip.###" file name scheme.	5 years ago
Micah Snyder	9f2de39e04	New tmp sub-dir per scan; JSON meta improvements This commit improves the layout of the tmp file output and the JSON metadata output when using the --leave-temps and --gen-json options. For all scans, each scan target will get a unique tmp sub-directory. If using --leave-temps, that subdir will include the basename of the original file to make it easier to identify. Additionally, when using --leave-temps option, all extracted objects will have their subdirectories extracted in recursive subdirectories including filename prefixes where available. When not using the --leave-temps option, the layout of the tmp sub-directory will remain flat, so as to alleviate the possibility of exceeding PATH_MAX. The JSON metadata generated by the --gen-json option is now generated for all file types, not just a select few. The format is also pretty-printed for readability and now includes filenames and file paths when available. Also: - Added missing ALLMATCH check when determining if bytecode hooks should be run. - Added cl_engine_get_str API to windows libclamav symbol export file.	5 years ago

1 2 3 4 5 ...

4603 Commits (c637de532b790aede5f338ab41e1055a7cdc84ac)