Some detections, like phishing, are considered heuristic alerts because
they match based on behavior more than on content. A subset of these
are considered "potentially unwanted" (low-severity). These
low-severity alerts include:
- phishing
- PDFs with obfuscated object names
- bytecode signature alerts that start with "BC.Heuristics"
The concept is that unless you enable "heuristic precedence" (a method
of lowing the threshold to immediateley alert on low-severity
detections), the scan should continue after a match in case a higher
severity match is found. Only at the end will it print the low-severity
match if nothing else was found.
The current implementation is buggy though. Scanning of archives does
not correctly bail out for the entire archive if one email contains a
phishing link. Instead, it sets the "heuristic found" flag then and
alerts for every subsequent file in the archive because it doesn't know
if the heuristic was found in an embedded file or the target file.
Because it's just a heuristic and the status is "clean", it keeps
scanning.
This patch corrects the behavior by checking if a low-severity alerts
were found at the end of scanning the target file, instead of at the end
of each embedded file.
Additionally, this patch fixes an in issue with phishing alerts wherein
heuristic precedence mode did not cause a scan to stop after the first
alert.
The above changes required restructuring to create an fmap inside of
cl_scandesc_callback() so that scan_common() could be modified to
require an fmap and set up so that the current *ctx->fmap pointer is
never NULL when scan_common() evaluates match results.
Also fixed a couple minor bugs in the phishing unit tests and cleaned up
the test code for improved legitibility and type safety.
* use a suffix AC-trie and a shift-or FSM to filter
* rewrite the URL regex in C
* use a perfect hash to lookup TLD and ccTLD, instead of a regex
* TODO: suffixes having a common prefix: loop over all of them
cli_ac_free: multiple virname pointing to same location
git-svn: trunk@3978
Don't care why an url is clean, just state it is clean.
Various cleanups resulting from this.
Prepare to introduce selective turn on of sub-features.
git-svn: trunk@3369
This code is licensed under the 3-clause BSD.
This will be used instead of system provided regexec()/regcomp() to
have consistent behaviour across platforms.
git-svn: trunk@3225