[gs-devel] Massive increase of filesize after merging two PDFs

Ken Sharp ken.sharp at artifex.com
Tue Jun 17 01:06:54 PDT 2008


At 00:29 17/06/2008 -0700, Marc Muehlfeld wrote:


>I also found a difference between the documents. If i copy/paste text from
>the original file with my Adobe Reader to Notepad, I get the text. If I do
>this with the merged file, it just get hieroglyphs.

The fonts have been re-encoded, and possibly subset. In the absence of a 
ToUnicode CMap Acrobat simply uses the character codes. If the font has 
been re-encoded so that (eg) /A is at position 1, then what you paste will 
be 0x01. Low ASCII values are reserved for control and other special 
characters.


>Any idea what went wrong?

I'd need to see the before and after files to have any chance of saying why 
this occurs. Be aware that you are not simply concatenating files, not even 
in the sense that most PDF tools do PDF merging. The input files are 
completely interpreted, and a brand new file created from the result. 
Effectively nothing from the original files is preserved.

Some elements may therefore be represented in a different fashion, for 
example, a smooth shading might be turned into an image (this doesn't 
happen in current versions, but might have in older ones, and is only an 
example).

Most PDF merging tools will simply renumber the objects from the second 
file, add them to the first file, update the page array and the xref table. 
The actual content will not generally be interpreted.


>I tried it with GNU gs 7.05 on RHEL and GNU gs 8.15 on openSUSE 10.3.

Those are both pretty old (3 years old in the case of 8.15, 7.05 is more 
than 4 years old), GNU Ghostscript is now up to date with GPL Ghostscript 
at 8.62. I would try a newer version to see if there are any improvements 
that might help you.



                                 Ken



More information about the gs-devel mailing list