mirror of https://github.com/Cisco-Talos/clamav
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
368 lines
15 KiB
368 lines
15 KiB
\documentclass[a4paper,titlepage,12pt]{article}
|
|
\usepackage{amssymb}
|
|
\usepackage{pslatex}
|
|
\usepackage[dvips]{graphicx}
|
|
\usepackage{wrapfig}
|
|
\usepackage{url}
|
|
\date{}
|
|
|
|
\begin{document}
|
|
|
|
\begin{center}
|
|
\huge Creating signatures for ClamAV\\
|
|
\vspace{2cm}
|
|
\end{center}
|
|
|
|
\noindent
|
|
\section{Introduction}
|
|
CVD (ClamAV Virus Database) is a digitally signed container that
|
|
includes signature databases in various text formats. The header
|
|
of the container is a 512 bytes long string with colon separated fields:
|
|
\begin{verbatim}
|
|
ClamAV-VDB:build time:version:number of signatures:functionality
|
|
level required:MD5 checksum:digital signature:builder name:build
|
|
time (sec)
|
|
\end{verbatim}
|
|
\verb+sigtool --info+ displays detailed information about a given CVD file:
|
|
\begin{verbatim}
|
|
zolw@localhost:/usr/local/share/clamav$ sigtool -i main.cvd
|
|
File: main.cvd
|
|
Build time: 09 Dec 2007 15:50 +0000
|
|
Version: 45
|
|
Signatures: 169676
|
|
Functionality level: 21
|
|
Builder: sven
|
|
MD5: b35429d8d5d60368eea9630062f7c75a
|
|
Digital signature: dxsusO/HWP3/GAA7VuZpxYwVsE9b+tCk+tPN6OyjVF/U8
|
|
JVh4vYmW8mZ62ZHYMlM903TMZFg5hZIxcjQB3SX0TapdF1SFNzoWjsyH53eXvMDY
|
|
eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh
|
|
Verification OK.
|
|
\end{verbatim}
|
|
The ClamAV project distributes two CVD files: \emph{main.cvd} and
|
|
\emph{daily.cvd}.
|
|
|
|
\section{Signature formats}
|
|
|
|
\subsection{MD5}
|
|
The easiest way to create signatures for ClamAV is to use MD5 checksums,
|
|
however this method can be only used against static malware. To create
|
|
a signature for \verb+test.exe+ use the \verb+--md5+ option of sigtool:
|
|
\begin{verbatim}
|
|
zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
|
|
zolw@localhost:/tmp/test$ cat test.hdb
|
|
48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
|
|
\end{verbatim}
|
|
That's it! The signature is ready to use:
|
|
\begin{verbatim}
|
|
zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
|
|
test.exe: test.exe FOUND
|
|
|
|
----------- SCAN SUMMARY -----------
|
|
Known viruses: 1
|
|
Scanned directories: 0
|
|
Engine version: 0.92.1
|
|
Scanned files: 1
|
|
Infected files: 1
|
|
Data scanned: 0.02 MB
|
|
Time: 0.024 sec (0 m 0 s)
|
|
\end{verbatim}
|
|
You can change the name (by default sigtool uses the name of the file)
|
|
and place it inside a \verb+*.hdb+ file. A single database file can
|
|
include any number of signatures. To get them automatically loaded
|
|
each time clamscan/clamd starts just copy the database file(s) into
|
|
the local virus database directory (eg. /usr/local/share/clamav).
|
|
|
|
\subsection{MD5, PE section based}
|
|
You can create a MD5 signature for a specific section in a PE file.
|
|
Such signatures shall be stored inside \verb+.mdb+ files in the
|
|
following format:
|
|
\begin{verbatim}
|
|
PESectionSize:MD5:MalwareName
|
|
\end{verbatim}
|
|
The easiest way to generate MD5 based section signatures is to extract
|
|
target PE sections into separate files and then run sigtool with the
|
|
option \verb+--mdb+
|
|
|
|
\subsection{Hexadecimal signatures}
|
|
ClamAV stores all signatures in a hexadecimal format. By a hex-signature
|
|
here we mean a fragment of a malware's body converted into a hexadecimal
|
|
string which can be additionally extended with various wildcards.
|
|
|
|
\subsubsection{Hexadecimal format}
|
|
You can use \verb+sigtool --hex-dump+ to convert any data into a hex-string:
|
|
\begin{verbatim}
|
|
zolw@localhost:/tmp/test$ sigtool --hex-dump
|
|
How do I look in hex?
|
|
486f7720646f2049206c6f6f6b20696e206865783f0a
|
|
\end{verbatim}
|
|
|
|
\subsubsection{Wildcards}
|
|
ClamAV supports the following extensions inside hex signatures:
|
|
\begin{itemize}
|
|
\item \verb+??+\\
|
|
Match any byte.
|
|
\item \verb+a?+\\
|
|
Match a high nibble (the four high bits).\\ \textbf{IMPORTANT NOTE:}
|
|
The nibble matching is only available in libclamav with the
|
|
functionality level 17 and higher therefore please only use it with
|
|
.ndb signatures followed by ":17" (MinEngineFunctionalityLevel,
|
|
see \ref{ndb}).
|
|
\item \verb+?a+\\
|
|
Match a low nibble (the four low bits).
|
|
\item \verb+*+\\
|
|
Match any number of bytes.
|
|
\item \verb+{n}+\\
|
|
Match $n$ bytes.
|
|
\item \verb+{-n}+\\
|
|
Match $n$ or less bytes.
|
|
\item \verb+{n-}+\\
|
|
Match $n$ or more bytes.
|
|
\item \verb+{n-m}+\\
|
|
Match between $n$ and $m$ bytes ($m > n$).
|
|
\item \verb+(aa|bb|cc|..)+\\
|
|
Match aa or bb or cc..
|
|
\item \verb+HEXSIG[x-y]aa+ or \verb+aa[x-y]HEXSIG+\\
|
|
Match aa anchored to a hex-signature, see
|
|
\url{https://wwws.clamav.net/bugzilla/show_bug.cgi?id=776} for
|
|
a discussion and examples.
|
|
\end{itemize}
|
|
The range signatures \verb+*+ and \verb+{}+ virtually separate
|
|
a hex-signature into two parts, eg. \verb+aabbcc*bbaacc+ is treated
|
|
as two sub-signatures \verb+aabbcc+ and \verb+bbaacc+ with any number
|
|
of bytes between them. It's a requirement that each sub-signature
|
|
includes a block of two static characters somewhere in its body.
|
|
|
|
\subsubsection{Basic signature format}
|
|
The simplest (and now deprecated) signature format is:
|
|
\begin{verbatim}
|
|
MalwareName=HexSignature
|
|
\end{verbatim}
|
|
ClamAV will scan the entire file looking for HexSignature. All
|
|
signatures of this type must be placed inside \verb+*.db+ files.
|
|
|
|
\subsubsection{Extended signature format}\label{ndb}
|
|
The extended signature format allows for specification of additional
|
|
information such as a target file type, virus offset or engine version,
|
|
making the detection more reliable. The format is:
|
|
\begin{verbatim}
|
|
MalwareName:TargetType:Offset:HexSignature[:MinEngineFunctionalityLevel:[Max]]
|
|
\end{verbatim}
|
|
where \verb+TargetType+ is one of the following numbers specifying
|
|
the type of the target file:
|
|
\begin{itemize}
|
|
\item 0 = any file
|
|
\item 1 = Portable Executable, both 32- and 64-bit.
|
|
\item 2 = file inside OLE2 container (e.g. image, embedded executable,
|
|
VBA script). The OLE2 format is primarily used by MS Office and MSI
|
|
installation files.
|
|
\item 3 = HTML (normalized: whitespace transformed to spaces, tags/tag
|
|
attributes normalized, all lowercase), Javascript is normalized too:
|
|
all strings are normalized (hex encoding is decoded), numbers are
|
|
parsed and normalized, local variables/function names are normalized
|
|
to 'n001' format, argument to eval() is parsed as JS again,
|
|
unescape() is handled, some simple JS packers are handled,
|
|
output is whitespace normalized.
|
|
\item 4 = Mail file
|
|
\item 5 = Graphics
|
|
\item 6 = ELF
|
|
\item 7 = ASCII text file (normalized)
|
|
\item 8 = Disassembler data
|
|
\item 9 = Mach-O files
|
|
\end{itemize}
|
|
And \verb+Offset+ is an asterisk or a decimal number \verb+n+ possibly
|
|
combined with a special modifier:
|
|
\begin{itemize}
|
|
\item \verb+*+ = any
|
|
\item \verb+n+ = absolute offset
|
|
\item \verb+EOF-n+ = end of file minus \verb+n+ bytes
|
|
\end{itemize}
|
|
Signatures for PE, ELF and Mach-O files additionally support:
|
|
\begin{itemize}
|
|
\item \verb#EP+n# = entry point plus n bytes (\verb#EP+0# for \verb+EP+)
|
|
\item \verb#EP-n# = entry point minus n bytes
|
|
\item \verb#Sx+n# = start of section \verb+x+'s (counted from 0)
|
|
data plus \verb+n+ bytes
|
|
\item \verb#Sx-n# = start of section \verb+x+'s data minus \verb+n+ bytes
|
|
\item \verb#SL+n# = start of last section plus \verb+n+ bytes
|
|
\item \verb#SL-n# = start of last section minus \verb+n+ bytes
|
|
\end{itemize}
|
|
All the above offsets except \verb+*+ can be turned into
|
|
\textbf{floating offsets} and represented as \verb+Offset,MaxShift+ where
|
|
\verb+MaxShift+ is an unsigned integer. A floating offset will match every
|
|
offset between \verb+Offset+ and \verb#Offset+MaxShift#, eg. \verb+10,5+
|
|
will match all offsets from 10 to 15 and \verb#EP+n,y# will match all
|
|
offsets from \verb#EP+n# to \verb#EP+n+y#. Versions of ClamAV older than
|
|
0.91 will silently ignore the \verb+MaxShift+ extension and only use
|
|
\verb+Offset+.\\
|
|
|
|
\noindent
|
|
All signatures in the extended format must be placed inside \verb+*.ndb+ files.
|
|
|
|
\subsubsection{Logical signatures}\label{ndb}
|
|
Logical signatures allow combining of multiple signatures in extended
|
|
format using logical operators. They can provide both more detailed and
|
|
flexible pattern matching. The logical sigs are stored inside \verb+*.ldb+
|
|
files in the following format:
|
|
\begin{verbatim}
|
|
SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
|
|
Subsig1;Subsig2;...
|
|
\end{verbatim}
|
|
where:
|
|
\begin{itemize}
|
|
\item \verb+TargetDescriptionBlock+ provides information about the
|
|
engine and target file with comma separated \verb+Arg:Val+ pairs,
|
|
currently (as of 0.95.1) only \verb+Target:X+ and \verb+Engine:X-Y+
|
|
are supported.
|
|
\item \verb+LogicalExpression+ specifies the logical expression
|
|
describing the relationship between \verb+Subsig0...SubsigN+.\\
|
|
\textbf{Basis clause:} 0,1,...,N decimal indexes are SUB-EXPRESSIONS
|
|
representing \verb+Subsig0, Subsig1,...,SubsigN+ respectively.\\
|
|
\textbf{Inductive clause:} if \verb+A+ and \verb+B+ are
|
|
SUB-EXPRESSIONS and \verb+X, Y+ are decimal numbers then
|
|
\verb+(A&B)+, \verb+(A|B)+, \verb+A=X+, \verb+A=X,Y+, \verb+A>X+,
|
|
\verb+A>X,Y+, \verb+A<X+ and \verb+A<X,Y+ are SUB-EXPRESSIONS
|
|
\item \verb+SubsigN+ is n-th subsignature in extended format possibly
|
|
preceded with an offset. There can be specified up to 64 subsigs.
|
|
\end{itemize}
|
|
Modifiers for subexpressions:
|
|
\begin{itemize}
|
|
\item \verb+A=X+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must get matched exactly X times; if it refers to
|
|
a (logical) block of signatures then this block must generate exactly
|
|
X matches (with any of its sigs).
|
|
\item \verb+A=0+ specifies negation (signature or block of signatures
|
|
cannot be matched)
|
|
\item \verb+A=X,Y+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must be matched exactly X times; if it refers to
|
|
a (logical) block of signatures then this block must generate X matches
|
|
and at least Y different signatures must get matched.
|
|
\item \verb+A>X+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must get matched more than X times; if it refers to
|
|
a (logical) block of signatures then this block must generate more
|
|
than X matches (with any of its sigs).
|
|
\item \verb+A>X,Y+: If the SUB-EXPRESSION A refers to a single signature
|
|
then this signature must get matched more than X times; if it refers to
|
|
a (logical) block of signatures then this block must generate more than
|
|
X matches and at least Y different signatures must be matched.
|
|
\item \verb+A<X+ and \verb+A<X,Y+ as above with the change of "more"
|
|
to "less".
|
|
\end{itemize}
|
|
Examples:
|
|
\begin{verbatim}
|
|
Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
|
|
6616e;deadbeef
|
|
|
|
Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
|
|
46566616e
|
|
|
|
Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
|
|
46566616e;deadbeef
|
|
|
|
Sig4;Target:1,Engine:18-20;((0|1)&(2|3))&4;EP+123:33c06834f04100
|
|
f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
|
|
(63|64)61706528;S+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58d
|
|
cf43987e4f519d629b103375;SL+550:6300680065005c0046006900
|
|
\end{verbatim}
|
|
|
|
\subsection{Signatures based on archive metadata}
|
|
Signatures based on metadata inside archive files can provide an effective
|
|
protection against malware that spreads via encrypted zip or rar
|
|
archives. The format of a metadata signature is:
|
|
\begin{verbatim}
|
|
virname:encrypted:filename:normal size:csize:crc32:cmethod:fileno:max depth
|
|
\end{verbatim}
|
|
where the corresponding fields are:
|
|
\begin{itemize}
|
|
\item Virus name
|
|
\item Encryption flag (1 -- encrypted, 0 -- not encrypted)
|
|
\item File name (this is a regular expression - * to ignore)
|
|
\item Normal (uncompressed) size (* to ignore)
|
|
\item Compressed size (* to ignore)
|
|
\item CRC32 (* to ignore)
|
|
\item Compression method (* to ignore)
|
|
\item File position in archive (* to ignore)
|
|
\item Maximum number of nested archives (* to ignore)
|
|
\end{itemize}
|
|
The database file should have the extension of \verb+.zmd+ or
|
|
\verb+.rmd+ for zip or rar metadata respectively.
|
|
|
|
\subsection{Whitelist databases}
|
|
To whitelist a specific file use the MD5 signature format and place
|
|
it inside a database file with the extension of \verb+.fp+.\\
|
|
|
|
\noindent
|
|
To whitelist a specific signature inside main.cvd add the following
|
|
entry into daily.ign or a local file local.ign:
|
|
\begin{verbatim}
|
|
db_name:line_number:signature_name
|
|
\end{verbatim}
|
|
|
|
\subsection{Signature names}
|
|
ClamAV uses the following prefixes for signature names:
|
|
\begin{itemize}
|
|
\item \emph{Worm} for Internet worms
|
|
\item \emph{Trojan} for backdoor programs
|
|
\item \emph{Adware} for adware
|
|
\item \emph{Flooder} for flooders
|
|
\item \emph{HTML} for HTML files
|
|
\item \emph{Email} for email messages
|
|
\item \emph{IRC} for IRC trojans
|
|
\item \emph{JS} for Java Script malware
|
|
\item \emph{PHP} for PHP malware
|
|
\item \emph{ASP} for ASP malware
|
|
\item \emph{VBS} for VBS malware
|
|
\item \emph{BAT} for BAT malware
|
|
\item \emph{W97M}, \emph{W2000M} for Word macro viruses
|
|
\item \emph{X97M}, \emph{X2000M} for Excel macro viruses
|
|
\item \emph{O97M}, \emph{O2000M} for generic Office macro viruses
|
|
\item \emph{DoS} for Denial of Service attack software
|
|
\item \emph{DOS} for old DOS malware
|
|
\item \emph{Exploit} for popular exploits
|
|
\item \emph{VirTool} for virus construction kits
|
|
\item \emph{Dialer} for dialers
|
|
\item \emph{Joke} for hoaxes
|
|
\end{itemize}
|
|
Important rules of the naming convention:
|
|
\begin{itemize}
|
|
\item always use a -zippwd suffix in the malware name for signatures of type zmd,
|
|
\item always use a -rarpwd suffix in the malware name for signatures
|
|
of type rmd,
|
|
\item only use alphanumeric characters, dash (-), dot (.), underscores
|
|
(\_) in malware names, never use space, apostrophe or quote mark.
|
|
\end{itemize}
|
|
|
|
\section{Special files}
|
|
|
|
\subsection{HTML}
|
|
ClamAV contains a special HTML normalisation code which helps to detect
|
|
HTML exploits. Running \verb+sigtool --html-normalise+ on a HTML file
|
|
should generate the following files:
|
|
\begin{itemize}
|
|
\item nocomment.html - the file is normalized, lower-case, with all
|
|
comments and superflous white space removed
|
|
\item notags.html - as above but with all HTML tags removed
|
|
\end{itemize}
|
|
The code automatically decodes JScript.encode parts and char ref's (e.g.
|
|
\verb+f+). You need to create a signature against one of the created
|
|
files. To eliminate potential false positive alerts the target type should
|
|
be set to 3.
|
|
|
|
\subsection{Text files}
|
|
Similarly to HTML all ASCII text files get normalized (converted
|
|
to lower-case, all superflous white space and control characters removed,
|
|
etc.) before scanning. Use \verb+clamscan --leave-temps+ to obtain
|
|
a normalized file then create a signature with the target type 7.
|
|
|
|
\subsection{Compressed Portable Executable files}
|
|
If the file is compressed with UPX, FSG, Petite or other PE packer
|
|
supported by libclamav, run \verb+clamscan+ with
|
|
\verb+--debug --leave-temps+. Example output for a FSG compressed file:
|
|
\begin{verbatim}
|
|
LibClamAV debug: UPX/FSG/MEW: empty section found - assuming compression
|
|
LibClamAV debug: FSG: found old EP @119e0
|
|
LibClamAV debug: FSG: Unpacked and rebuilt executable saved in
|
|
/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c
|
|
\end{verbatim}
|
|
Next create a type 1 signature for \verb+/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c+
|
|
|
|
\end{document}
|
|
|