14 KiB
LibClamAV
Libclamav provides an easy and effective way to add a virus protection into your software. The library is thread-safe and transparently recognizes and scans within archives, mail files, MS Office document files, executables and other special formats.
License
Libclamav is licensed under the GNU GPL v2 license. This means you are not allowed to link commercial, closed-source software against it. All software using libclamav must be GPL compliant.
Supported formats and features
Executables
The library has a built-in support for 32- and 64-bit Portable Executable, ELF and Mach-O files. Additionally, it can handle PE files compressed or obfuscated with the following tools:
- Aspack (2.12)
- UPX (all versions)
- FSG (1.3, 1.31, 1.33, 2.0)
- Petite (2.x)
- PeSpin (1.1)
- NsPack
- wwpack32 (1.20)
- MEW
- Upack
- Y0da Cryptor (1.3)
Mail files
Libclamav can handle almost every mail file format including TNEF (winmail.dat) attachments.
Archives and compressed files
The following archive and compression formats are supported by internal handlers:
- Zip (+ SFX)
- RAR (+ SFX)
- 7Zip
- Tar
- CPIO
- Gzip
- Bzip2
- DMG
- IMG
- ISO 9660
- PKG
- HFS+ partition
- HFSX partition
- APM disk image
- GPT disk image
- MBR disk image
- XAR
- XZ
- MS OLE2
- MS Cabinet Files (+ SFX)
- MS CHM (Compiled HTML)
- MS SZDD compression format
- BinHex
- SIS (SymbianOS packages)
- AutoIt
- NSIS
- InstallShield
Documents
The most popular file formats are supported:
- MS Office and MacOffice files
- RTF
- HTML
In the case of Office, RTF and PDF files, libclamav will only extract the embedded objects and will not decode the text data itself. The text decoding and normalization is only performed for HTML files.
Data Loss Prevention
Libclamav includes a DLP module which can detect the following credit card issuers: AMEX, VISA, MasterCard, Discover, Diner’s Club, and JCB and U.S. social security numbers inside text files.
Future versions of Libclamav may include additional features to detect other credit cards and other forms of PII (Personally Identifiable Information) which may be transmitted without the benefit of being encrypted.
Others
Libclamav can handle various obfuscators, encoders, files vulnerable to security risks such as:
- JPEG (exploit detection)
- RIFF (exploit detection)
- uuencode
- ScrEnc obfuscation
- CryptFF
API
Header file
Every program using libclamav must include the header file clamav.h
:
#include <clamav.h>
Initialization
Before using libclamav, you should call cl_init()
to initialize it. CL_INIT_DEFAULT
is a macro that can be passed to cl_init()
representing the default initialization settings. When it’s done, you’re ready to create a new scan engine by calling cl_engine_new()
. To free resources allocated by the engine use cl_engine_free()
. Function prototypes:
int cl_init(unsigned int options);
struct cl_engine *cl_engine_new(void);
int cl_engine_free(struct cl_engine *engine);
cl_init()
and cl_engine_free()
return CL_SUCCESS
on success or another code on error. cl_engine_new()
return a pointer or NULL if there’s not enough memory to allocate a new engine structure.
Database loading
The following set of functions provides an interface for loading the virus database:
const char *cl_retdbdir(void);
int cl_load(const char *path, struct cl_engine *engine,
unsigned int *signo, unsigned int options);
cl_retdbdir()
returns the default (hardcoded) path to the directory with ClamAV databases. cl_load()
loads a single database file or all databases from a given directory (when path
points to a directory). The second argument is used for passing in the pointer to the engine that should be previously allocated with cl_engine_new()
. A number of loaded signatures will be added to signo
. The last argument can pass the following flags:
- CL_DB_STDOPT This is an alias for a recommended set of scan options.
- CL_DB_PHISHING Load phishing signatures.
- CL_DB_PHISHING_URLS Initialize the phishing detection module and load .wdb and .pdb files.
- CL_DB_PUA Load signatures for Potentially Unwanted Applications.
- CL_DB_OFFICIAL_ONLY Only load official signatures from digitally signed databases.
- CL_DB_BYTECODE Load bytecode.
cl_load()
returns CL_SUCCESS
on success and another code on failure.
...
struct cl_engine *engine;
unsigned int sigs = 0;
int ret;
if((ret = cl_init(CL_INIT_DEFAULT)) != CL_SUCCESS) {
printf("cl_init() error: %s\n", cl_strerror(ret));
return 1;
}
if(!(engine = cl_engine_new())) {
printf("Can't create new engine\n");
return 1;
}
ret = cl_load(cl_retdbdir(), engine, &sigs, CL_DB_STDOPT);
Error handling
Use cl_strerror()
to convert error codes into human readable messages. The function returns a statically allocated string:
if(ret != CL_SUCCESS) {
printf("cl_load() error: %s\n", cl_strerror(ret));
cl_engine_free(engine);
return 1;
}
Engine structure
When all required databases are loaded you should prepare the detection engine by calling cl_engine_compile()
. In case of failure you should still free the memory allocated to the engine with cl_engine_free()
:
int cl_engine_compile(struct cl_engine *engine);
In our example:
if((ret = cl_engine_compile(engine)) != CL_SUCCESS) {
printf("cl_engine_compile() error: %s\n", cl_strerror(ret));
cl_engine_free(engine);
return 1;
}
Limits
When you create a new engine with cl_engine_new()
, it will have all internal settings set to default values as recommended by the ClamAV authors. It’s possible to check and modify the values (numerical and strings) using the following set of functions:
int cl_engine_set_num(struct cl_engine *engine,
enum cl_engine_field field, long long num);
long long cl_engine_get_num(const struct cl_engine *engine,
enum cl_engine_field field, int *err);
int cl_engine_set_str(struct cl_engine *engine,
enum cl_engine_field field, const char *str);
const char *cl_engine_get_str(const struct cl_engine *engine,
enum cl_engine_field field, int *err);
Please don’t modify the default values unless you know what you’re doing. Refer to the ClamAV sources (clamscan, clamd) for examples.
Database checks
It’s very important to keep the internal instance of the database up to date. You can watch database changes with the cl_stat..()
family of functions.
int cl_statinidir(const char *dirname, struct cl_stat *dbstat);
int cl_statchkdir(const struct cl_stat *dbstat);
int cl_statfree(struct cl_stat *dbstat);
Initialization:
...
struct cl_stat dbstat;
memset(&dbstat, 0, sizeof(struct cl_stat));
cl_statinidir(dbdir, &dbstat);
To check for a change you just need to call cl_statchkdir
and check its return value (0 - no change, 1 - some change occurred). Remember to reset the cl_stat
structure after reloading the database.
if(cl_statchkdir(&dbstat) == 1) {
reload_database...;
cl_statfree(&dbstat);
cl_statinidir(cl_retdbdir(), &dbstat);
}
Libclamav (\ge0.96) includes and additional call to check the number of signatures that can be loaded from a given directory:
int cl_countsigs(const char *path, unsigned int countoptions,
unsigned int *sigs);
The first argument points to the database directory, the second one specifies what signatures should be counted: CL_COUNTSIGS_OFFICIAL
(official signatures), CL_COUNTSIGS_UNOFFICIAL
(third party signatures), CL_COUNTSIGS_ALL
(all signatures). The last argument points to the counter to which the number of detected signatures will be added (therefore the counter should be initially set to 0). The call returns CL_SUCCESS
or an error code.
Data scan functions
It’s possible to scan a file or descriptor using:
int cl_scanfile(const char *filename, const char **virname,
unsigned long int *scanned, const struct cl_engine *engine,
unsigned int options);
int cl_scandesc(int desc, const char **virname, unsigned
long int *scanned, const struct cl_engine *engine,
unsigned int options);
Both functions will store a virus name under the pointer virname
, the virus name is part of the engine structure and must not be released directly. If the third argument (scanned
) is not NULL, the functions will increase its value with the size of scanned data (in CL_COUNT_PRECISION
units). The last argument (options
) specified the scan options and supports the following flags (which can be combined using bit operators):
- CL_SCAN_STDOPT This is an alias for a recommended set of scan options. You should use it to make your software ready for new features in the future versions of libclamav.
- CL_SCAN_RAW Use it alone if you want to disable support for special files.
- CL_SCAN_ARCHIVE This flag enables transparent scanning of various archive formats.
- CL_SCAN_BLOCKENCRYPTED With this flag the library will mark encrypted archives as viruses (Encrypted.Zip, Encrypted.RAR).
- CL_SCAN_MAIL Enable support for mail files.
- CL_SCAN_OLE2 Enables support for OLE2 containers (used by MS Office and .msi files).
- CL_SCAN_PDF Enables scanning within PDF files.
- CL_SCAN_SWF Enables scanning within SWF files, notably compressed SWF.
- CL_SCAN_PE This flag enables deep scanning of Portable Executable files and allows libclamav to unpack executables compressed with run-time unpackers.
- CL_SCAN_ELF Enable support for ELF files.
- CL_SCAN_BLOCKBROKEN libclamav will try to detect broken executables and mark them as Broken.Executable.
- CL_SCAN_HTML This flag enables HTML normalisation (including ScrEnc decryption).
- CL_SCAN_ALGORITHMIC Enable algorithmic detection of viruses.
- CL_SCAN_PHISHING_BLOCKSSL Phishing module: always block SSL mismatches in URLs.
- CL_SCAN_PHISHING_BLOCKCLOAK Phishing module: always block cloaked URLs.
- CL_SCAN_STRUCTURED Enable the DLP module which scans for credit card and SSN numbers.
- CL_SCAN_STRUCTURED_SSN_NORMAL Search for SSNs formatted as xx-yy-zzzz.
- CL_SCAN_STRUCTURED_SSN_STRIPPED Search for SSNs formatted as xxyyzzzz.
- CL_SCAN_PARTIAL_MESSAGE
Scan RFC1341 messages split over many emails. You will need to periodically clean up
$TemporaryDirectory/clamav-partial
directory. - CL_SCAN_HEURISTIC_PRECEDENCE Allow heuristic match to take precedence. When enabled, if a heuristic scan (such as phishingScan) detects a possible virus/phish it will stop scan immediately. Recommended, saves CPU scan-time. When disabled, virus/phish detected by heuristic scans will be reported only at the end of a scan. If an archive contains both a heuristically detected virus/phishing, and a real malware, the real malware will be reported.
- CL_SCAN_BLOCKMACROS OLE2 containers, which contain VBA macros will be marked infected (Heuristics.OLE2.ContainsMacros).
All functions return CL_CLEAN
when the file seems clean, CL_VIRUS
when a virus is detected and another value on failure.
...
const char *virname;
if((ret = cl_scanfile("/tmp/test.exe", &virname, NULL, engine,
CL_SCAN_STDOPT)) == CL_VIRUS) {
printf("Virus detected: %s\n", virname);
} else {
printf("No virus detected.\n");
if(ret != CL_CLEAN)
printf("Error: %s\n", cl_strerror(ret));
}
Memory
Because the engine structure occupies a few megabytes of system memory, you should release it with cl_engine_free()
if you no longer need to scan files.
Forking daemons
If you’re using libclamav with a forking daemon you should call srand()
inside a forked child before making any calls to the libclamav functions. This will avoid possible collisions with temporary filenames created by other processes of the daemon. This procedure is not required for multi-threaded daemons.
clamav-config
Use clamav-config
to check compilation information for libclamav.
$ clamav-config --libs
-L/usr/local/lib -lz -lbz2 -lgmp -lpthread
$ clamav-config --cflags
-I/usr/local/include -g -O2
Example
You will find an example scanner application in the clamav source package (/example). Provided you have ClamAV already installed, execute the following to compile it:
gcc -Wall ex1.c -o ex1 -lclamav
CVD format
CVD (ClamAV Virus Database) is a digitally signed tarball containing one or more databases. The header is a 512-bytes long string with colon separated fields:
ClamAV-VDB:build time:version:number of signatures:functionality
level required:MD5 checksum:digital signature:builder name:build time (sec)
sigtool --info
displays detailed information on CVD files:
$ sigtool -i daily.cvd
File: daily.cvd
Build time: 10 Mar 2008 10:45 +0000
Version: 6191
Signatures: 59084
Functionality level: 26
Builder: ccordes
MD5: 6e6e29dae36b4b7315932c921e568330
Digital signature: zz9irc9irupR3z7yX6J+OR6XdFPUat4HIM9ERn3kAcOWpcMFxq
Fs4toG5WJsHda0Jj92IUusZ7wAgYjpai1Nr+jFfXHsJxv0dBkS5/XWMntj0T1ctNgqmiF
+RLU6V0VeTl4Oej3Aya0cVpd9K4XXevEO2eTTvzWNCAq0ZzWNdjc
Verification OK.
Graphics
The current ClamAV logo was created by Alicia Willet, Talos.
OpenAntiVirus
Our database includes the virus database (about 7000 signatures) from OpenAntiVirus (http://OpenAntiVirus.org).
-
Subscribers are not allowed to post to the mailing list
-
For Windows instructions please see win32/README in the main source code directory.
-
See section 3.7 on how to run the unit tests
-
if not available ClamAV will fall back to an interpreter
-
Note that several versions of GCC have bugs when compiling LLVM, see http://llvm.org/docs/GettingStarted.html#brokengcc for a full list.
-
The configure script in ClamAV automatically enables the unit tests, if it finds the check framework, however it doesn’t consider it a fatal error if unit tests cannot be enabled.
-
To get more info on clamscan options run ’man clamscan’
-
man 5 clamd.conf
-
Remember to initialize the virus counter variable with 0.