ClamAV is an open source (GPLv2) anti-virus toolkit.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
clamav/libclammspack/ChangeLog

960 lines
38 KiB

2023-02-03 Stuart Caie <kyzer@cabextract.org.uk>
* configure.ac: do AC_CHECK_SIZEOF([off_t]) test only after
AC_SYS_LARGEFILE, because the latter can alter the size of off_t.
* cabd_extract(): file->offset and file->length are unsigned ints,
both of them and their sum are checked to be <= CAB_LENGTHMAX. But
recent code stuffs file->length into an off_t and checks that instead.
On 32-bit architectures, if file->length > 2GiB then the off_t is
negative, evading the check. Ultimately this causes the decompression
functions to return MSPACK_ERR_ARGS as they already guard against
being asked to decompress a negative number of bytes.
2023-02-01 Stuart Caie <kyzer@cabextract.org.uk>
* readbits.h, readhuff.h, cabd.c, kwajd.c, lzxd.c, mszipd.c, qtmd.c:
ensure bit operations (including intermediary ones) are considered
as unsigned int, so UBSan is happy.
2023-01-31 Stuart Caie <kyzer@cabextract.org.uk>
* chmd.c: replace READ_ENCINT() macro with stricter read_encint()
function that reads no more than 63 or 31 bits so ENCINTs can never
be negative.
I'd prefer to use unsigned types, but off_t is used for file offsets
and lengths to match the environment's file I/O, so changing it is
tricky and would change the current public API.
Additionally, UBSan complains about shifting a 1 into a signed
type's MSB. https://www.cs.utah.edu/~regehr/papers/tosem15.pdf
notes that this is legal in ANSI C and "fairly benign (and well-
defined until C99)", but C99 made it undefined for no good reason.
I don't agree with this, but I don't want someone else using a C99
compiler to end up miscompiling the code.
* chmd_read_headers(): the CHM's internally declared file length is
compared against its actual file length and a warning is printed if
they don't match.
* chmd_extract(): files in the uncompressed section will print a
warning if their declared length goes beyond the declared end of the
CHM file. This may not match the actual CHM file length. You will
still get seek or read errors if a file's offset or length go beyond
the actual CHM file length.
Files in the compressed section will now cause a decrunch error if
their declared offset goes beyond the uncompressed length of the
section. If their offset is OK but their declared length goes beyond
the end, they will print a warning and then decompress as much as
possible before causing an error.
2023-01-02 Stuart Caie <kyzer@cabextract.org.uk>
* kwajd_extract(): KWAJ compression method #2 is the QBasic variant
of the SZDD compression algorithm. Thanks to Jason Summers for finding
this and providing examples.
2021-07-20 Stuart Caie <kyzer@cabextract.org.uk>
* lzxd_decompress(): simplified the code that decodes match_offset.
Thanks to Jasper St. Pierre for prompting me to look at it.
2020-12-30 Stuart Caie <kyzer@cabextract.org.uk>
* cabd_read_string(): libmspack no longer rejects CAB files with
empty previnfo/nextinfo strings. Thanks to Simon Tatham for the
patch, and for noting that WiX v4 currently generates such files.
2020-08-10 Stuart Caie <kyzer@cabextract.org.uk>
* lzxd_decompress(): merged the code for decoding aligned and
verbatim blocks, also verified there is no significant performance
penalty.
2020-08-07 Stuart Caie <kyzer@cabextract.org.uk>
* read_sys_file(): in a CHM file, the ControlData and ResetTable
files are loaded entirely into memory, regardless of file size.
This is not in the spirit of letting users control memory usage.
ControlData previously had to be at least 28 bytes (in case a new,
larger version of the file ever appeared), but is now rejected
if not exactly 28 bytes.
ResetTable can theoretically be huge; the longest LZX stream of
16 exabytes could have a 4 petabyte ResetTable. Practically, the
largest seen in the wild is 46 kilobytes (PHP manuals). I picked
an arbitrary upper limit of 1MB; please get in contact if you
know of any CHM files in the wild that are largest than this.
Thanks to seviezhou on Github for reporting this.
2020-04-13 Stuart Caie <kyzer@cabextract.org.uk>
* system.h: clear up libmspack's large file support.
To support large files, do this:
1. add any defines that your compiler needs to enable large file
support. It may be supported by default.
2. Define HAVE_FSEEKO if fseeko() and ftello() are available.
3. Define SIZEOF_OFF_T to the value of sizeof(off_t); it must be a
literal value because sizeof() can't be used in preprocessor tests.
libmspack uses the off_t datatype for all file offsets. If off_t is
less than 64 bits, libmspack will return an error when processing
CHM files with offsets beyond 2GB, and won't search for CAB headers
beyond 2GB into a file. In both cases, it prints a warning message
that the library doesn't support large files.
2020-04-13 Stuart Caie <kyzer@cabextract.org.uk>
* macros.h: new header for the D(), LD/LU and EndGet???() macros.
Use this instead of system.h.
* system.h: if MSPACK_NO_DEFAULT_SYSTEM is defined, define
inline versions of the only standard C functions used in
mspack (strlen, memcmp, memset), so that no standard C library
functions are needed at all.
2020-01-08 Stuart Caie <kyzer@cabextract.org.uk>
* lzxd_decompress(): do not apply the E8 transformation on the
32769th LZX frame! Thanks to Cezary Sliwa for discovering this
bug and providing an example cab file (which is
http://download.windowsupdate.com/d/msdownload/update/driver/
drvs/2019/11/016c7f3e-809d-4720-893b-
e0d74f10c39d_35e12507628e8dc8ae5fb3332835f4253d2dab23.cab)
* cabd_compare: use EXPAND.EXE instead of EXTRACT.EXE when
testing files in a directory called 'expand'. The example
cab file above is extracted wrongly by EXTRACT.EXE, but
correctly by EXPAND.EXE because they take different approaches
to E8 transformations:
- EXTRACT.EXE writes "E8E8E8E8E8E8' to the last 6 bytes of
frame, looks for E8 bytes up to the last 6 bytes, then restores
the last 6 bytes, leaving partial transforms of 1-3 bytes if
E8 byte is found near the end of the frame
- EXPAND.EXE looks for E8 bytes up to the last 10 bytes of a
frame, therefore the last 6 bytes are never altered and all
transforms are 4 bytes
2019-02-18 Stuart Caie <kyzer@cabextract.org.uk>
* chmd_read_headers(): a CHM file name beginning "::" but shorter
than 33 bytes will lead to reading past the freshly-allocated name
buffer - checks for specific control filenames didn't take length
into account. Thanks to ADLab of Venustech for the report and
proof of concept.
2019-02-18 Stuart Caie <kyzer@cabextract.org.uk>
* chmd_read_headers(): CHM files can declare their chunks are any
size up to 4GB, and libmspack will attempt to allocate that to
read the file.
This is not a security issue; libmspack doesn't promise how much
memory it'll use to unpack files. You can set your own limits by
returning NULL in a custom mspack_system.alloc() implementation.
However, it would be good to validate chunk size further. With no
official specification, only empirical data is available. All files
created by hhc.exe have a chunk size of 4096 bytes, and this is
matched by all the files I've found in the wild, except for one
which has a chunk size of 8192 bytes, which was created by someone
developing a CHM file creator 15 years ago, and they appear to
have abandoned it, so it seems 4096 is a de-facto standard.
I've changed the "chunk size is not a power of two" warning to
"chunk size is not 4096", and now only allow chunk sizes between
22 and 8192 bytes. If you have CHM files with a larger chunk size,
please send them to me and I'll increase this upper limit.
Thanks to ADLab of Venustech for the report.
2019-02-18 Stuart Caie <kyzer@cabextract.org.uk>
* oabd.c: replaced one-shot copying of uncompressed blocks (which
requires allocating a buffer of the size declared in the header,
which can be 4GB) with a fixed-size buffer. The buffer size is
user-controllable with the new msoab_decompressor::set_param()
method (check you have version 2 of the OAB decompressor), and
also controls the input buffer used for OAB's LZX decompression.
Reminder: compression formats can dictate how much memory is
needed to decompress them. If memory usage is a security concern
to you, write a custom mspack_system.alloc() that returns NULL
if "too much" memory is requested. Do not rely on libmspack adding
special heuristics to know not to request "too much".
Thanks to ADLab of Venustech for the report.
2018-11-03 Stuart Caie <kyzer@cabextract.org.uk>
* configure.ac, doc/Makefile.in, doc/Doxyfile.in: remove these
template files and replace with static files. You can still build
the documentation with make -C doc
2018-11-03 Stuart Caie <kyzer@cabextract.org.uk>
* Makefile.am, src: move the "useful" programs in src/ to examples/
and don't auto-install them. Even though they're useful, they are
intended as examples and aren't productised (no command-line
options, no man pages, etc.) -- if you disagree, feel free to
send in a patch
2018-11-01 Stuart Caie <kyzer@cabextract.org.uk>
* cabd_extract(): would not do decompression for random-access
offsets if the folder type was LZX. This is a fairly major bug,
and affects any decompression where you skip directly to a file,
or decompress data out-of-order. Thanks to austin987 for alerting
me to this.
This bug was introduced by the recent 'salvage mode' patch. Even
though I'd reviewed all the differences in clamav's copy of
libmspack and said "wtf" to this particular change, I didn't
notice it was still in the resulting patch I merged. Mea culpa :)
* test/cabd_test.c: now has a regression test to cover this
2018-10-31 Stuart Caie <kyzer@cabextract.org.uk>
* Makefile.am, test/*_test.c: use the automake test-suite system
with the test-suite programs (cabd_test, chmd_test, kwajd_test).
This also fixes a longstanding bugbear that these programs don't
access their test files using an absolute path. Now this is passed
to them and you can run them from any directory. Thanks to Richard
Jones for requesting this.
2018-10-31 Stuart Caie <kyzer@cabextract.org.uk>
* configure.ac: require at least automake 1.11, use AM_SILENT_RULES
unconditionally
2018-10-30 Stuart Caie <kyzer@cabextract.org.uk>
* configure.ac: remove obsolescent C library tests. AC_HEADER_STDC is
removed, and so are most checks for standard C headers. libmspack now
makes these assumptions:
- <ctype.h> <limits.h> <stdlib.h> <string.h> exist
- <ctype.h> defines tolower()
- <string.h> defines memset(), memcmp(), strlen()
- if towlower() exists, it's defined in <wctype.h>
2018-10-22 Stuart Caie <kyzer@cabextract.org.uk>
* cabd.c: remove the only use of assert()
2018-10-20 Stuart Caie <kyzer@cabextract.org.uk>
* src/chmextract.c: add anti "../" and leading slash protection to
chmextract. I'm not pleased about this. All the sample code provided
with libmspack is meant to be simple examples of library use, not
"productised" binaries. Making the "useful" code samples install
as binaries was a mistake. They were never intended to protect you
from unpacking archive files with relative/absolute paths, and I
would prefer that they never will be.
2018-10-17 Stuart Caie <kyzer@cabextract.org.uk>
* cab.h: Make the CAB block input buffer one byte larger, to allow
a maximum-allowed-size input block and the special extra byte added
after the block by cabd_sys_read_block to help Quantum alignment.
Thanks to Henri Salo for reporting this.
2018-10-17 Stuart Caie <kyzer@cabextract.org.uk>
* chmd_read_headers(): again reject files with blank filenames, this
time because their 1st or 2nd byte is null, not because their length
is zero. Thanks again to Hanno Böck for finding the issue.
2018-10-16 Stuart Caie <kyzer@cabextract.org.uk>
* Makefile.am: using automake _DEPENDENCIES for chmd_test appears to
override the default dependencies (e.g. sources), so libchmd.la was no
longer considered a dependency of chmd_test. This breaks parallel
builds like "make -j4". Added libchmd.la explicitly to dependencies.
Thanks to Thomas Deutschmann for reporting this.
2018-10-16 Stuart Caie <kyzer@cabextract.org.uk>
* cabd.c: add new parameter, MSCABD_PARAM_SALVAGE, which makes CAB file
reading and extraction more lenient, to allow damaged or mangled CABs
to be extracted. When enabled:
- cabd->open() won't reject cabinets with files that have invalid
folder indices or filenames. These files will simply be skipped
- cabd->extract() won't reject files with invalid lengths, but will
limit them to the maximum possible
- block output sizes over 32768 bytes won't be rejected
- invalid data block checksums won't be rejected
It's still possible for corrupted files to fail extraction, but more
data can be extracted before they do.
This new parameter doesn't affect the existing MSCABD_PARAM_FIXMSZIP
parameter, which ignores MSZIP decompression failures. You can enable
both at once.
Thanks to Micah Snyder from ClamAV for working with me to get this
feature into libmspack. This also helps ClamAV move towards using a
vanilla copy of libmspack without needing their own patchset.
2018-08-13 Stuart Caie <kyzer@cabextract.org.uk>
* mspack.h: clarify that mspack_system.free() should allow NULL. If your
mspack_system implementation doesn't, it would already have crashed, as
there are several places where libmspack calls sys->free(NULL). This
change makes it official, and amends a few "if (x) sys->free(x)" cases
to the simpler "sys->free(x)" to make it clearer.
2018-08-09 Stuart Caie <kyzer@cabextract.org.uk>
* Makefile.am: the test file cve-2015-4467-reset-interval-zero.chm is
detected by ClamAV as BC.Legacy.Exploit.CVE_2012_1458-1 "infected".
My hosting deletes anything that ClamAV calls "infected", so has been
continually deleting the official libmspack 0.7alpha release.
CVE-2012-1458 is the same issue as CVE-2015-4467: both libmspack, and
ClamAV using libmspack, could get a division-by-zero crash when the LZX
reset interval was zero. This was fixed years ago, but ClamAV still has
it as a signature, which today prevents me from releasing libmspack.
BC.Legacy.Exploit.CVE_2012_1458-1 is a bytecode signature, so I can't
see the exact trigger conditions, but I can see that it looks for the
"LZXC" signature of the LZX control file, so I've changed this to
"lzxc" and added a step in the Makefile to change it back to LZXC, so
I can release libmspack whether or not ClamAV keeps the signature.
2018-04-26 Stuart Caie <kyzer@cabextract.org.uk>
* read_chunk(): the test that chunk numbers are in bounds was off
by one, so read_chunk() returned a pointer taken from outside
allocated memory that usually crashes libmspack when accessed.
Thanks to Hanno Böck for finding the issue and providing a sample.
* chmd_read_headers(): reject files with blank filenames. Thanks
again to Hanno Böck for finding the issue and providing a sample file.
2018-02-06 Stuart Caie <kyzer@cabextract.org.uk>
* chmd.c: fixed an off-by-one error in the TOLOWER() macro, reported
by Dmitry Glavatskikh. Thanks Dmitry!
2017-11-26 Stuart Caie <kyzer@cabextract.org.uk>
* kwajd_read_headers(): fix up the logic of reading the filename and
extension headers to avoid a one or two byte overwrite. Thanks to
Jakub Wilk for finding the issue.
* test/kwajd_test.c: add tests for KWAJ filename.ext handling
2017-10-16 Stuart Caie <kyzer@cabextract.org.uk>
* test/cabd_test.c: update the short string tests to expect not only
MSPACK_ERR_DATAFORMAT but also MSPACK_ERR_READ, because of the recent
change to cabd_read_string(). Thanks to maitreyee43 for spotting this.
* test/msdecompile_md5: update the setup instructions for this script,
and also change the script so it works with current Wine. Again, thanks
to maitreyee43 for trying to use it and finding it not working.
2017-08-13 Stuart Caie <kyzer@cabextract.org.uk>
* src/chmextract.c: support MinGW one-arg mkdir(). Thanks to AntumDeluge
for reporting this.
2017-08-13 Stuart Caie <kyzer@cabextract.org.uk>
* read_spaninfo(): a CHM file can have no ResetTable and have a
negative length in SpanInfo, which then feeds a negative output length
to lzxd_init(), which then sets frame_size to a value of your choosing,
the lower 32 bits of output length, larger than LZX_FRAME_SIZE. If the
first LZX block is uncompressed, this writes data beyond the end of the
window. This issue was raised by ClamAV as CVE-2017-6419. Thanks to
Sebastian Andrzej Siewior for finding this by chance!
* lzxd_init(), lzxd_set_output_length(), mszipd_init(): due to the issue
mentioned above, these functions now reject negative lengths
2017-08-05 Stuart Caie <kyzer@cabextract.org.uk>
* cabd_read_string(): add missing error check on result of read().
If an mspack_system implementation returns an error, it's interpreted
as a huge positive integer, which leads to reading past the end of the
stack-based buffer. Thanks to Sebastian Andrzej Siewior for explaining
the problem. This issue was raised by ClamAV as CVE-2017-11423
2016-04-20 Stuart Caie <kyzer@cabextract.org.uk>
* configure.ac: change my email address to kyzer@cabextract.org.uk
2015-05-10 Stuart Caie <kyzer@4u.net>
* cabd_read_string(): correct rejection of empty strings. Thanks to
Hanno Böck for finding the issue and providing a sample file.
2015-05-10 Stuart Caie <kyzer@4u.net>
* Makefile.am: Add subdir-objects option as suggested by autoreconf.
* configure.ac: Add AM_PROG_AR as suggested by autoreconf.
2015-01-29 Stuart Caie <kyzer@4u.net>
* system.h: if C99 inttypes.h exists, use its PRI{d,u}{32,64} macros.
Thanks to Johnathan Kollasch for the suggestion.
2015-01-18 Stuart Caie <kyzer@4u.net>
* lzxd_decompress(): the byte-alignment code for reading uncompressed
block headers presumed it could wind i_ptr back 2 bytes, but this
hasn't been true since READ_BYTES was allowed to read bytes straddling
two blocks, leaving just 1 byte in the read buffer. Thanks to Jakub
Wilk for finding the issue and providing a sample file.
* inflate(): off-by-one error. Distance codes are 0-29, not 0-30.
Thanks to Jakub Wilk again.
* chmd_read_headers(), search_chunk(): another fix for checking pointer
is within a chunk, thanks again to Jakub Wilk.
2015-01-17 Stuart Caie <kyzer@4u.net>
* GET_UTF8_CHAR(): Remove 5/6-byte encoding support and check decoded
chars are no more than U+10FFFF.
* chmd_init_decomp(): A reset interval of 0 is invalid. Thanks to
Jakub Wilk for finding the issue and providing a sample and patch.
2015-01-15 Stuart Caie <kyzer@4u.net>
* chmd_read_headers(): add a bounds check to prevent over-reading data,
which caused a segfault on 32-bit architectures. Thanks to Jakub Wilk.
* search_chunk(): change the order of pointer arithmetic operations to
avoid overflow during bounds checks, which lead to segfaults on 32-bit
architectures. Again, thanks to Jakub Wilk for finding this issue,
providing sample files and a patch.
2015-01-08 Stuart Caie <kyzer@4u.net>
* cabd_extract(): No longer uses broken state data if extracting from
folder 1, 2, 1 and setting up folder 2 fails. This prevents a jump to
null and thus segfault. Thanks to Jakub Wilk again.
* cabd_read_string: reject empty strings. They are not found in any
valid CAB files. Thanks to Hanno Böck for sending me an example.
2015-01-05 Stuart Caie <kyzer@4u.net>
* cabd_can_merge_folders(): disallow folder merging if the combined
folder would have more than 65535 data blocks.
* cabd_decompress(): disallow files if their offset, length or
offset+length is more than 65535*32768, the maximum size of any
folder. Thanks to Jakub Wilk for identifying the problem and providing
a sample file.
2014-04-20 Stuart Caie <kyzer@4u.net>
* readhuff.h: fixed the table overflow check, which allowed one more
code after capacity had been reached, resulting in a read of
uninitialized data inside the decoding table. Thanks to Denis Kroshin
for identifying the problem and providing a sample file.
2013-05-27 Stuart Caie <kyzer@4u.net>
* test/oabx.c: added new example command for unpacking OAB files.
2013-05-17 Stuart Caie <kyzer@4u.net>
* mspack.h: Support for decompressing a new file format, the Exchange
Offline Address Book (OAB). Thanks to David Woodhouse for writing
the implementation. I've bumped the version to 0.4alpha in celebration.
2012-04-15 Stuart Caie <kyzer@4u.net>
* chmd_read_headers(): More thorough validation of CHM header values.
Thanks to Sergei Trofimovich for finding sample files.
* read_reset_table(): Better test for overflow. Thanks again to
Sergei Trofimovich for generating a good example.
* test/chminfo.c: this test program reads the reset table by itself
and was also susceptible to the same overflow problems.
2012-03-16 Stuart Caie <kyzer@4u.net>
* Makefile.am, configure.ac: make the GCC warning flags conditional
on using the GCC compiler. Thanks to Dagobert Michelsen for letting
me know.
2011-11-25 Stuart Caie <kyzer@4u.net>
* lzxd_decompress(): Prevent matches that go beyond the start
of the LZX stream. Thanks to Sergei Trofimovich for testing
with valgrind and finding a corrupt sample file that exercises
this scenario.
2011-11-23 Stuart Caie <kyzer@4u.net>
* chmd_fast_find(): add a simple check against infinite PMGL
loops. Thanks to Sergei Trofimovich for finding sample files.
Multi-step PMGL/PMGI infinite loops remain possible.
2011-06-17 Stuart Caie <kyzer@4u.net>
* read_reset_table(): wasn't reading the right offset for getting
the LZX uncompressed length. Thanks to Sergei Trofimovich for
finding the bug.
2011-05-31 Stuart Caie <kyzer@4u.net>
* kwajd.c, mszipd.c: KWAJ type 4 files (MSZIP) are now supported.
Thanks to Clive Turvey for sending me the format details.
* doc/szdd_kwaj_format.html: Updated documentation to cover
KWAJ's MSZIP compression.
2011-05-11 Stuart Caie <kyzer@4u.net>
* cabd_find(): rethought how large vs small file support is
handled, as users were getting "library not compiled to support
large files" message on some small files. Now checks for actual
off_t overflow, rather than trying to preempt it.
2011-05-10: Stuart Caie <kyzer@4u.net>
* chmd.c: implemented fast_find()
* test/chmx.c: removed the multiple extraction orders, now it just
extracts in the fastest order
* test/chmd_order.c: new program added to test that different
extraction orders don't affect the results of extraction
* test/chmd_find.c: new program to test that fast_find() works.
Either supply your own filename to find, or it will try finding
every file in the CHM.
* configure.ac: because CHM fast find requires case-insensitive
comparisons, tolower() or towlower() are used where possible.
These functions and their headers are checked for.
* mspack.h: exposed struct mschmd_sec_mscompressed's spaninfo
and struct mschmd_header's first_pmgl, last_pmgl and chunk_cache
to the world. Check that the CHM decoder version is v2 or higher
before using them.
* system.c: set CHM decoder version to v2
2011-04-27: Stuart Caie <kyzer@4u.net>
* many files: Made C++ compilers much happier with libmspack.
Changed char * to const char * where possible.
* mspack.h: Changed user-supplied char * to const char *.
Unless you've written your own mspack_system implementation,
you will likely be unaffected.
If you have written your own mspack_system implementation:
1: change open() so it takes a const char *filename
2: change message() so it takes a const char *format
If you cast your function into the mspack_system struct,
you can change the cast instead of the function.
2011-04-27: Stuart Caie <kyzer@4u.net>
* Makefile.am: changed CFLAGS from "-Wsign-compare -Wconversion
-pedantic" to "-W -Wno-unused". This enables more warnings, and
disables these specific warnings which are now a hindrance.
2011-04-27: Stuart Caie <kyzer@4u.net>
* test/cabrip.c, test/chminfo.c: used macros from system.h for
printing offsets and reading 64-bit values, rather than
reinvent the wheel.
* cabd_can_merge_folders(): declare variables at the start of
a block so older C compilers won't choke.
* cabd_find(): avoid compiler complaints about non-initialised
variables. We know they'll get initialised before use, but the
compiler can't reverse a state machine to draw the same conclusion.
2011-04-26: Stuart Caie <kyzer@4u.net>
* configure.ac, mspack/system.h: Added a configure test to get
the size of off_t. If off_t is 8 bytes or more, we presume this
system has large file support. This fixes LFS detection for Fedora
x86_64 and Darwin/Mac OS X, neither of which declare FILESIZEBITS in
<limits.h>. It's not against the POSIX standard to do this: "A
definition of [FILESIZEBITS] shall be omitted from the <limits.h>
header on specific implementations where the corresponding value is
equal to or greater than the stated minimum, but where the value can
vary depending on the file to which it is applied."
(http://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html)
Thanks to Edward Sheldrake for the patch.
2011-04-26: Stuart Caie <kyzer@4u.net>
* chmd.c: all 64-bit integer reads are now consolidated into
the read_off64() function
* chmd_read_headers(): this function has been made resilient
against accessing memory past the end of a chunk. Thanks to
Sergei Trofimovich for sending me examples and analysis.
* chmd_init_decomp(): this function now reads the SpanInfo file
if the ResetTable file isn't available, it also checks that each
system file it needs is large enough before accessing it, and
some of its code has been split into several new functions:
find_sys_file(), read_reset_table() and read_spaninfo()
2011-04-26: Stuart Caie <kyzer@4u.net>
* mspack.h, chmd.c: now reads the SpanInfo system file if the
ResetTable file isn't available. This adds a new spaninfo pointer
into struct mschmd_sec_mscompressed
2011-04-26: Stuart Caie <kyzer@4u.net>
* test/chminfo.c: more sanity checks for corrupted CHM files where
entries go past the end of a PMGL/PMGI chunk, thanks to
Sergei Trofimovich for sending me examples and analysis.
2011-04-25: Stuart Caie <kyzer@4u.net>
* cabd_merge(): Drew D'Addesio showed me spanning cabinets which
don't have all the CFFILE entries they should, but otherwise have
all necessary data for extraction. Changed the merging folders
test to be less strict; if folders don't exactly match, warn which
files are missing, but allow merging if at least one necessary
file is present.
2010-09-24: Stuart Caie <kyzer@4u.net>
* readhuff.h: Don't let build_decode_table() allow empty trees.
It's meant to be special case just for the LZX length tree, so
move that logic out to the LZX code. Thanks to Danny Kroshin for
discovering the bug.
* lzxd.c: Allow empty length trees, but not other trees. If
the length tree is empty, fail if asked to decode a length symbol.
Again, thanks to Danny Kroshin for discovering the bug.
2010-09-20: Stuart Caie <kyzer@4u.net>
* Makefile.am: Set EXTRA_DIST so it doesn't include .svn
directories in the distribution, but does include docs.
2010-09-20: Stuart Caie <kyzer@4u.net>
* Makefile.am, configure.ac: Use modern auto* practises; turn on
automake silent rules where possible, use "m4" directory for libtool
macros, use LT_INIT instead of AC_PROG_LIBTOOL and use AM_CPPFLAGS
instead of INCLUDES. Thanks to Sergei Trofimovich for the patch.
2010-09-15: Stuart Caie <kyzer@4u.net>
* many files: Made the code compile with C++
- Renamed all 'this' variables/parameters to 'self'
- Added casts to all memory allocations.
- Added extern "C" to header files with extern declarations.
- Made system.c include system.h.
- Changed the K&R-style headers to ANSI-style headers in md5.c
2010-08-04: Stuart Caie <kyzer@4u.net>
* many files: removed unnecessary <unistd.h> include
2010-07-19: Stuart Caie <kyzer@4u.net>
* cabd_md5.c, chmd_md5.c: Replace writing files to disk then
MD5summing them, with an MD5summer built into mspack_system.
Much, much faster results.
* qtmd_decompress(): Robert Riebisch pointed out a Quantum
data integrity check that could never be tripped, because
frame_todo is unsigned, so it will never be decremented
below zero. Replaced the check with one that assumes that
decrementing past zero wraps frame_todo round to a number
more than its maximum value (QTM_FRAME_SIZE).
2010-07-18: Stuart Caie <kyzer@4u.net>
* cabd.c: Special logic to pass cabd_sys_read() errors back
to cabd_extract() wasn't compatible with the decompressor
logic of returning the same error repeatedly once unpacking
fails. This meant that if decompressing failed because of
a read error, then the next file in the same folder would
come back as "no error", but the decompressed wouldn't have
even attempted to decompress the file. Added a new state
variable, read_error, with the same lifespan as a decompressor,
to pass the underlying reason for MSPACK_ERR_READ errors back.
* mszipd.c: improve MS-ZIP recovery by saving all the bytes
decoded prior to a block failing. This requires remembering
how far we got through the block, so the code has been made
slightly slower (about 0.003 seconds slower per gigabyte
unpacked) by removing the local variable window_posn
and keeping it in the state structure instead.
2010-07-16: Stuart Caie <kyzer@4u.net>
* Makefile.am: strange interactions. When -std=c99 is used,
my Ubuntu's <stdio.h> (libc6-dev 2.11.1-0ubuntu7.2) does NOT
define fseeko() unless _LARGEFILE_SOURCE is also defined. But
configure always uses -std=gnu99, not -std=c99, so its test
determines _LARGEFILE_SOURCE isn't needed but HAVE_FSEEKO is
true. The implicit fseeko definition has a 32-bit rather than
64-bit offset, which means the mode parameter is interpreted
as part of the offset, and the mode is taken from the stack,
which is generally 0 (SEEK_SET). This breaks all SEEK_CURs.
The code works fine when -std=c99 is not set, so just remove
it for the time being.
2010-07-12: Stuart Caie <kyzer@4u.net>
* system.c: Reject reading/writing a negative number of bytes.
* chmd.c: allow zero-length files to be seen. Previously they were
skipped because they were mistaken for directory entries.
2010-07-08: Stuart Caie <kyzer@4u.net>
* qtmd.c: Larry Frieson found an important bug in the Quantum
decoder. Window wraps flush all unwritten data to disk.
However, sometimes less data is needed, which makes
out_bytes negative, which is then passed to write(). Some
write() implementations treat negative sizes it as a large
positive integer and segfault trying to write the buffer.
* Makefile.am, test/*.c: fixed automake file so that the
package passes a "make distcheck".
2010-07-07: Stuart Caie <kyzer@4u.net>
* doc/szdd_kwaj_format.html: explain SZDD/KWAJ file format.
* lzssd.c: fixed SZDD decompression bugs.
* test/chmd_compare: Add scripts for comparing chmd_md5 against
Microsoft's own code.
* test/chmd_md5.c: remove the need to decompress everything
twice, as this is already in chmx.c if needed.
2010-07-06: Stuart Caie <kyzer@4u.net>
* many files: added SZDD and KWAJ decompression support.
2010-06-18: Stuart Caie <kyzer@4u.net>
* system.h: expanded the test for 64-bit largefile support so
it also works on 64-bit native operating systems where you
don't have to define _FILE_OFFSET_BITS.
2010-06-17: Stuart Caie <kyzer@4u.net>
* libmspack.pc.in: Added pkg-config support. Thanks to
Patrice Dumas for the patch.
2010-06-14: Stuart Caie <kyzer@4u.net>
* qtmd.c, lzxd.c, mszipd.c: created new headers, readbits.h and
readhuff.h, which bundle up the bit-reading and huffman-reading
code found in the MSZIP, LZX and Quantum decoders.
2010-06-11: Stuart Caie <kyzer@4u.net>
* qtmd_static_init(): Removed function in favour of static const
tables, same rationale as for lzxd_static_init().
* qtmd_read_input(), zipd_read_input(): After testing against my
set of CABs from the wild, I've found both these functions _need_
an extra EOF flag, like lzxd_read_input() has. So I've added
it. This means CABs get decoded properly AND there's no reading
fictional bytes.
2010-06-03: Stuart Caie <kyzer@4u.net>
* test/cabd_md5.c: updated this so it has better output and
doesn't need to be in the same directory as the files for multi-
part sets.
2010-05-20: Stuart Caie <kyzer@4u.net>
* qtmd_read_input(), zipd_read_input(): Both these functions are
essentially copies of lzxd_read_input(), but that has a feature
they don't have - an extra EOF flag. So if EOF is
encountered (sys->read() returns 0 bytes), these don't pass on the
error. Their respective bit-reading functions that called them
then go on to access at least one byte of the input buffer, which
doesn't exist as sys->read() returned 0. Thanks to Michael
Vidrevich for spotting this and providing a test case.
2010-05-20: Stuart Caie <kyzer@4u.net>
* system.h: It turns out no configure.ac tests are needed to
decide between __func__ and __FUNCTION__, so I put the standard
one (__func__) back into the D() macro, along with some
special-case ifdefs for old versions of GCC.
* lzxd_static_init(): Removed function in favour of static const
tables. Jorge Lodos thinks it causes multithreading problems, I
disagree. However, there are speed benefits to declaring the
tables as static const.
* cabd_init_decomp(): Fixed code which never runs but would write
to a null pointer if it could. Changed it to an assert() as it
will only trip if someone rewrites the internals of cabd.c. Thanks
to Jorge Lodos for finding it.
* inflate(): Fixed an off-by-one error: if the LITERAL table
emitted code 286, this would read one byte past the end of
lit_extrabits[]. Thanks to Jorge Lodos for finding it.
2010-05-06: Stuart Caie <kyzer@4u.net>
* test/cabrip.c, test/chminfo.c: add fseeko() support
2009-06-01: Stuart Caie <kyzer@4u.net>
* README: clarify the extended license terms
* doc, Makefile.am: make the doxygen makefile work when using
an alternate build directory
2006-09-20: Stuart Caie <kyzer@4u.net>
* system.h: I had a choice of adding more to configure.ac to
test for __func__ and __FUNCTION__, or just removing __FUNCTION__
from the D() macro. I chose the latter.
* Makefile.am: Now the --enable-debug in configure will actually
apply -DDEBUG to the sources.
2006-09-20: Stuart Caie <kyzer@4u.net>
* qtmd_decompress(): Fixed a major bug in the QTM decoder, as
reported by Tomasz Kojm last year. Removed the restriction on
window sizes as a result. Correctly decodes the XLVIEW cabinets.
2006-08-31: Stuart Caie <kyzer@4u.net>
* lzxd_decompress(): Two major bugs fixed. Firstly, the R0/R1/R2
local variables weren't set to 1 after lzxd_reset_state().
Secondly, the LZX decompression stream can sometimes become
odd-aligned (after an uncompressed block) and the next 16 bit
fetch needs to be split across two input buffers, ENSURE_BITS()
didn't cover this case. Many thanks to Igor Glucksmann for
discovering both these bugs.
2005-06-30: Stuart Caie <kyzer@4u.net>
* cabd_search(): fixed problems with searching files > 4GB for
cabinets.
2005-06-23: Stuart Caie <kyzer@4u.net>
* qtmd_init(): The QTM decoder is broken for QTM streams with a
window size less than the frame size. Until this is fixed, fail
to initialise QTM window sizes less than 15. Thanks to Tomasz Kojm
for finding the bug.
2005-03-22: Stuart Caie <kyzer@4u.net>
* system.h: now undefs "read", as the latest glibc defines read()
as a macro which messes everything up. Thanks to Ville Skyttä for
the update.
2005-03-14: Stuart Caie <kyzer@4u.net>
* test/multifh.c: write an mspack_system implementation that can
handle normal disk files, open file handles, open file descriptors
and raw memory all at the same time.
2005-02-24: Stuart Caie <kyzer@4u.net>
* chmd_read_headers(): avoid infinite loop when chmhs1_ChunkSize is
zero. Thanks to Serge Semashko for the research and discovery.
2005-02-18: Stuart Caie <kyzer@4u.net>
* mspack.h: renamed the "interface" parameter of mspack_version() to
"entity", as interface is a reserved word in C++. Thanks to Yuriy Z
for the discovery.
2004-12-09: Stuart Caie <kyzer@4u.net>
* lzss.h, szdd.h, szddd.h: more work on the SZDD/LZSS design.
2004-06-12: Stuart Caie <kyzer@4u.net>
* lzxd_static_init(): removed write to lzxd_extra_bits[52], thanks
to Nigel Horne from the ClamAV project.
2004-04-23: Stuart Caie <kyzer@4u.net>
* mspack.h: changed 'this' parameters to 'self' to allow compiling in
C++ compilers, thanks to Michal Cihar for the suggestion.
* mspack.h, system.h, mspack.def, winbuild.sh: integrated some changes
from Petr Blahos to let libmspack build as a Win32 DLL.
* chmd_fast_find(): added the first part of this code, and comments
sufficient to finish it :)
2004-04-08 Stuart Caie <kyzer@4u.net>
* test/chminfo.c: added a program for dumping useful data from CHM
files, e.g. index entries and reset tables. I wrote this a while ago
for investigating a corrupt cabinet, but I never committed it.
2004-03-26 Stuart Caie <kyzer@4u.net>
* test/cabd_memory.c: added a new test example which shows an
mspack_system implementation that reads and writes from memory only,
no file I/O. Even the source code has a little cab file embedded in it.
2004-03-10 Stuart Caie <kyzer@4u.net>
* cabd.c: updated the location of the CAB SDK.
* cabd.c: changed a couple of MSPACK_ERR_READ errors not based on
read() failures into MSPACK_ERR_DATAFORMAT errors.
* mszipd_decompress(): repair mode now aborts after writing a
repaired block if the error was a hard error (e.g. read error, out
of blocks, etc)
2004-03-08 Stuart Caie <kyzer@4u.net>
* Makefile.am: now builds and installs a versioned library.
* mszipd.c: completed a new MS-ZIP and inflate implementation.
* system.c: added mspack_version() and committed to a versioned
ABI for the library.
* cabd.c: made mszip repair functionality work correctly.
* cabd.c: now identifies invalid block headers
* doc/: API documentation is now included with the library, not
just on the web.
* chmd.c: fixed error messages and 64-bit debug output.
* chmd.c: now also catches NULL files in section 1.
* test/chmx.c: now acts more like cabextract.
2003-08-29 Stuart Caie <kyzer@4u.net>
* ChangeLog: started keeping a ChangeLog :)