[gs-devel] Massive increase of filesize after merging two
PDFs
Ken Sharp
ken.sharp at artifex.com
Tue Jun 17 01:06:54 PDT 2008
At 00:29 17/06/2008 -0700, Marc Muehlfeld wrote:
>I also found a difference between the documents. If i copy/paste text from
>the original file with my Adobe Reader to Notepad, I get the text. If I do
>this with the merged file, it just get hieroglyphs.
The fonts have been re-encoded, and possibly subset. In the absence of a
ToUnicode CMap Acrobat simply uses the character codes. If the font has
been re-encoded so that (eg) /A is at position 1, then what you paste will
be 0x01. Low ASCII values are reserved for control and other special
characters.
>Any idea what went wrong?
I'd need to see the before and after files to have any chance of saying why
this occurs. Be aware that you are not simply concatenating files, not even
in the sense that most PDF tools do PDF merging. The input files are
completely interpreted, and a brand new file created from the result.
Effectively nothing from the original files is preserved.
Some elements may therefore be represented in a different fashion, for
example, a smooth shading might be turned into an image (this doesn't
happen in current versions, but might have in older ones, and is only an
example).
Most PDF merging tools will simply renumber the objects from the second
file, add them to the first file, update the page array and the xref table.
The actual content will not generally be interpreted.
>I tried it with GNU gs 7.05 on RHEL and GNU gs 8.15 on openSUSE 10.3.
Those are both pretty old (3 years old in the case of 8.15, 7.05 is more
than 4 years old), GNU Ghostscript is now up to date with GPL Ghostscript
at 8.62. I would try a newer version to see if there are any improvements
that might help you.
Ken
More information about the gs-devel
mailing list