Add a new logical signature subsignature type for matching on images
with image fuzzy hashes.
Image fuzzy hash subsigantures follow this format:
fuzzy_img#<hash>#<dist>
In this initial implementation, the hamming distance (dist) is ignored
and only exact fuzzy hash matches will alert.
Fuzzy hash matching is only performed for supported image types.
Also: removed some excessive debug log messages on start-up.
Fixed an issue where the signature name (virname) is being allocated and
stored for every subsignature or even ever sub-pattern in an AC-pattern
(i.e. NDB sig or LDB subsig) containing a `{n-m}` or `*` wildcard.
This fix is only for LDB subsigs though. NDB signatures are still
allocaing one virname per sub-pattern.
This fix was required because I needed a place to store the virname with
fuzzy-hash subsignatures. Storing it in the fuzzy-hash subsig
metadatathe way AC-pattern, PCRE, and BComp subsigs were doing it
wouldn't work because it would cross the C-Rust FFI boundary and giving
pointers to Rust allocated stuff is dicey. Not to mention native Rust
strings are different thatn C strings. Anyways, the correct thing to do
was to store the virname with the actual logical signature.
TODO: Keep track of NDB signatures in the same way and store the virname
for NDB sigs there instead of in AC-patterns so that we can get rid of
the virname field in the AC-pattern struct.
The logic for parsing a logical subsignature isn't clearly identified
and has been, perhaps mistakenly or out of convenience, used to when
parsing NDB signatures in addition to LDB subsignatures. What this means
is that you can technically use a PCRE subsignature in an NDB file and
clam won't complain about it. It won't work however, because a PCRE
subsignature requires another matching subsignature to trigger it, but
it will parse. The same is likely true for byte-compare subsignatures.
This commit restructures that logic a bit so subsignature parsing has
its own function and is more organized.
I also renamed the functions a little bit and added lots of comments.
I fixed a few minor warnings relating to format string characters.
The change in str.c:cli_ldbtokenize is to prevent a buffer under-read if
you were to use the function on the start of a buffer, as is now down in
this commit.
Changes include:
- Fixing several memory leaks noticed when running with ASan
- Adds documentation for several functions and structs
- Simplifies the interface for using cli_targetinfo_init/destroy
and cli_exe_info_init/destroy
- A few other minor changes
Consolidate the PE parsing code into one function. I tried to preserve all existing functionality from the previous, distinct implementations to a large extent (with the exceptions mentioned below). If I noticed potential bugs/improvements, I added a TODO statement about those so that they can be fixed in a smaller commit later. Also, there are more TODOs in places where I'm not entirely sure why certain actions are performed - more research is needed for these.
I'm submitting a pull request now so that regression testing can be done, and because merging what I have thus far now will likely have fewer conflicts than if I try to merge later
PE parsing code improvements:
- PEs without all 16 data directories are parsed more appropriately now
- Added lots more debug statements
Also:
- Allow MAX_BC and MAX_TRACKED_PCRE to be specified via CFLAGS
When doing performance testing with the latest CVD, MAX_BC and
MAX_TRACKED_PCRE need to be raised to track all the events.
Allow these to be specified via CFLAGS by not redefining them
if they are already defined
- Fix an issue preventing wildcard sizes in .MDB/.MSB rules
I'm not sure what the original intent of the check I removed was,
but it prevents using wildcard sizes in .MDB/.MSB rules. AFAICT
these wildcard sizes should be handled appropriately by the MD5
section hash computation code, so I don't think a check on that
is needed.
- Fix several issues related to db loading
- .imp files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag
- .pwdb files will now get loaded if they exist in a directory passed
via clamscan's '-d' flag even when compiling without yara support
- Changes to .imp, .ign, and .ign2 files will now be reflected in calls
to cl_statinidir and cl_statchkdir (and also .pwdb files, even when
compiling without yara support)
- The contents of .sfp files won't be included in some of the signature
counts, and the contents of .cud files will be
- Any local.gdb files will no longer be loaded twice
- For .imp files, you are no longer required to specify a minimum flevel for wildcard rules, since this isn't needed
New API calls:
int cl_init(unsigned int options);
struct cl_engine *cl_engine_new(unsigned int options);
int cl_engine_compile(struct cl_engine *engine);
struct cl_engine *cl_engine_dup(struct cl_engine *engine);
int cl_engine_free(struct cl_engine *engine);
more to come..
WARNING: THE BRANCH IS CURRENTLY BROKEN AND SHOULD NOT BE USED
git-svn-id: file:///var/lib/svn/clamav-devel/branches/newapi@4370 77e5149b-7576-45b1-b177-96237e5ba77b