The current implementation sets a "next layer attributes" flag field
in the scan context. This may introduce bugs if accidentally not cleared
during error handling, causing that attribute to be applied to a
different layer than intended.
This commit resolves that by adding an attribute flag to the major
internal scan functions and removing the "next layer attributes" from
the scan context. This attributes flag shares the same flag fields as
the attributes flag in the new file inspection callback and the flags
are defined in `clamav.h`.
Many of the core scanning functions' names no longer represent their
specific purpose or arguments. This commit aims to make the names more
intuitive. Names are now prefixed with "magic" if they involve
file-typing and file-type parsing. In addition, each function now
includes the type of input being scanned whether its "desc", "fmap", or
"buff". Some of the APIs also now specify "type" to indicate that a type
other than "ANY" may be passed in to select the type rather than use
file type magic for type recognition.
| current name | new name |
| ------------------------- | --------------------------------- |
| magic_scandesc() | cli_magic_scan() |
| cli_magic_scandesc_type() | <delete> |
| cli_magic_scandesc() | cli_magic_scan_desc() |
| cli_base_scandesc() | cli_magic_scan_desc_type() |
| cli_partition_scandesc() | <delete> |
| cli_map_scandesc() | magic_scan_nested_fmap_type() |
| cli_map_scan() | cli_magic_scan_nested_fmap_type() |
| cli_mem_scandesc() | cli_magic_scan_buff() |
| cli_scanbuff() | cli_scan_buff() |
| cli_scandesc() | cli_scan_desc() |
| cli_fmap_scandesc() | cli_scan_fmap() |
| cli_scanfile() | cli_magic_scan_file() |
| cli_scandir() | cli_magic_scan_dir() |
| cli_filetype2() | cli_determine_fmap_type() |
| cli_filetype() | cli_compare_ftm_file() |
| cli_partitiontype() | cli_compare_ftm_partition() |
| cli_scanraw() | scanraw() |
A way is needed to record scanned file names for two purposes:
1. File names (and extensions) must be stored in the json metadata
properties recorded when using the --gen-json clamscan option. Future
work may use this to compare file extensions with detected file types.
2. File names are useful when interpretting tmp directory output when
using the --leave-temps option.
This commit enables file name retention for later use by storing file
names in the fmap header structure, if a file name exists.
To store the names in fmaps, an optional name argument has been added to
any internal scan API's that create fmaps and every call to these APIs
has been modified to pass a file name or NULL if a file name is not
required. The zip and gpt parsers required some modification to record
file names. The NSIS and XAR parsers fail to collect file names at all
and will require future work to support file name extraction.
Also:
- Added recursive extraction to the tmp directory when the
--leave-temps option is enabled. When not enabled, the tmp directory
structure remains flat so as to prevent the likelihood of exceeding
MAX_PATH. The current tmp directory is stored in the scan context.
- Made the cli_scanfile() internal API non-static and added it to
scanners.h so it would be accessible outside of scanners.c in order to
remove code duplication within libmspack.c.
- Added function comments to scanners.h and matcher.h
- Converted a TDB-type macros and LSIG-type macros to enums for improved
type safey.
- Converted more return status variables from `int` to `cl_error_t` for
improved type safety, and corrected ooxml file typing functions so
they use `cli_file_t` exclusively rather than mixing types with
`cl_error_t`.
- Restructured the magic_scandesc() function to use goto's for error
handling and removed the early_ret_from_magicscan() macro and
magic_scandesc_cleanup() function. This makes the code easier to
read and made it easier to add the recursive tmp directory cleanup to
magic_scandesc().
- Corrected zip, egg, rar filename extraction issues.
- Removed use of extra sub-directory layer for zip, egg, and rar file
extraction. For Zip, this also involved changing the extracted
filenames to be randomly generated rather than using the "zip.###"
file name scheme.