Also relocated codepage table from msdoc.h to entconv.h
Also adds new macros for codepages to reduce use of magic numbers when
referencing code pages elsewhere in libclamav.
XLM is a macro language in Excel that was used before VBA (before
1996). It is still parsed and executed by modern Excel and is gaining
popularity with malware authors.
This patch adds rudimentary support for detecting and extracting
Excel 4.0 (XLM) macros.
The code is based on Didier Steven's plugin_biff for oletools.py.
- Existing VBA extraction code uses undocumented cache structures.
This code uses the documented way of accessing VBA projects.
- Adds additional detail to the dumped information:
Project name, Project doc string, ...
All VBA projects are dumped into a single file.
- Malware authors are currently evading detection by spreading
malicious code over several projects. It is hard to write
signatures if only part of the malicious code is visible.
ClamAV doesn't handle compressed attribute for hfs+ file catalog
entries.
This patch adds support for FLATE compressed files.
To accomplish this, we had to find and parse the root/header node
of the attributes file, if one exists. Then, parse the attribute map
to check if the compressed attribute exists. If compressed, parse the
compression header to determine how to decompress it. Support is
included for both inline compressed files as well as compressed
resource forks.
Inflating inline compressed files is straightforward.
Inflating a compressed resource fork requires more work:
- Find location and size of the resource.
- Parse the resource block table.
- Inflate and write each block to a temporary file to be scanned.
Additional changes needed for this work:
- Make hfsplus_fetch_node work for both catalog and attributes.
- Figure out node size.
- Handle nodes that span several blocks.
- If the attributes are missing, or invalid, extraction continues.
This behavior is to support malformed files which would also
extract on macOS and perhaps other systems.
This patch also:
- Adds filename extraction for the hfs+ parser.
- Skips embedded file type detection for GPT image file types. This
prevents double extraction of embedded files, or misclassfication
of GPT images as MHTML, for example. This resolves bb12335.
use only cli_readline() we don't need exact conversion
drop unused functions,
simplify encoding_norm_readline(), and rename to encoding_normalize_toascii()
git-svn: trunk@3571
* use fewer entities, browsers don't support all either.
* update to generate code for new entconv.
* no need for configure, use just a simple Makefile
(it is an internal tool)
libclamav/entconv.c, hashtab.c, htmlnorm.c:
* don't allocate memory for each entity_norm call.
* don't touch length of mmaped area (bb #785)
* update htmlnorm to use new entity_norm
git-svn: trunk@3515
entconv improvements to improve security and performance
Part I for (bb #686, #386)
TODO:
* optimize entity_norm
* create testfiles for unicode encoding variants
* create a regression test
* check for memory leaks
git-svn: trunk@3511