Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2022/04/28)Fwd 1 day (to 2022/04/30) >>>20220429 
artifexirc-bot <qwertynik> Understood a bit about the Type 3 fonts here: https://www.prepressure.com/fonts/basics/type3. With some clarity it is clear as to why they were not converted to black - probably because they are drawn. 05:15.38 
  <qwertynik> If the information that the vector has used Type 3 font is available, can a flag be added to state that convert Type 3 font 'drawings' too?05:15.39 
  <qwertynik> Yes @KenSharp detecting the background would be complex. However the goal here is to convert all text to black, remove all drawings and images and output the PDF.05:17.00 
  <mvrhel> Maybe05:23.09 
  <mvrhel> But I am not a font guy. And I am sure it will not work for PDF output05:23.37 
  <mvrhel> or pretty sure anyway05:23.46 
  <mvrhel> Rendering maybe05:23.56 
  <RayJohnston> IANAFG is sort of lile IANAL (I Am Not A Font Guy / I Am No A Lawyer). I am neither 🙂05:26.17 
  <mvrhel> funny05:28.44 
  <RayJohnston> AFGIAN (A Font Guy I Am Not) is shorter. ALIAN (A Lawyer I Am Not), not to be confused with ALIEN05:29.02 
  <RayJohnston> I suspect that IANAL caught on because of the ANAL part, and how legal interpretations of things tends to be ...05:30.37 
  <qwertynik> Ok @mvrhel. Is this because of a limitation in the PDF format, or, the current ghostscript code does not support it?05:31.30 
  <mvrhel> I believe he pdf format and how the color is set for type3 fonts05:33.32 
  <mvrhel> but that is for Ken to tell you when he gets here in a couple hours05:33.48 
  <mvrhel> I think it is like the uncolored pattern stuff05:34.30 
  <mvrhel> which relies upon what ever color is set in the graphic state05:34.45 
  <mvrhel> as to what gets drawn05:34.56 
  <mvrhel> easy enough to deal with when rendering05:35.04 
  <RayJohnston> Type 3 fonts are painted with generalized vector and image operations. In Ghostscript, they all are painted as text (using the text enumerator), but I suspect that the fact that the image/bitmap and vector operations don't know that it is being painted as part of text05:35.08 
  <mvrhel> hard to pack into the output pdf file05:35.09 
  <RayJohnston> and it gets even more hairy when going to the pdfwrite device which tries to preserve the input graphic state as much as possible.05:36.37 
  <RayJohnston> easy for you, maybe 🙂05:37.04 
  <RayJohnston> knowing that painting (at some point) came from text, particularly when colorspaces may be pattern or indexed or whatever, doesn't seem simple.05:38.02 
  <mvrhel> we added -dBlackVector recently05:38.16 
  <mvrhel> and that packs into PDF output too05:38.25 
  <mvrhel> but type3 fonts05:38.41 
  <mvrhel> no05:38.42 
  <RayJohnston> what does BlackVector do with colored Patterns and images ?05:38.57 
  <mvrhel> images are not converted05:39.09 
  <mvrhel> some patterns are05:39.14 
  <mvrhel> not uncolored ones05:39.19 
  <mvrhel> well not to pdf output05:39.28 
  <RayJohnston> "some" ...05:39.31 
  <mvrhel> rendered yes05:39.32 
  <mvrhel> yes. if you have a pattern that has an image in it05:39.50 
  <mvrhel> it is not going to be converted to black05:39.55 
  <mvrhel> if it has a vector drawing in it05:40.03 
  <mvrhel> that content will be rendered black05:40.10 
  <mvrhel> it is a difficult problem05:40.45 
  <mvrhel> a customer wanted this05:40.50 
  <mvrhel> and that was the best I could do without charging them a hefty NRE05:41.05 
  <RayJohnston> right, makes sense, becuase BlackVector will capture the vector parts of the pattern as black (I assume)05:41.20 
  <mvrhel> yes05:41.25 
  <mvrhel> but there are uncolored patterns05:41.38 
  <mvrhel> which use the current graphic state color value05:41.47 
  <mvrhel> those don't translate easily to PDF output05:41.59 
  <mvrhel> but render black05:42.08 
  <RayJohnston> yep. bitmapped (uncolored "stencil") patterns don't have a color05:42.51 
  <mvrhel> right05:43.24 
  <mvrhel> off to bed05:45.43 
  <RayJohnston> for raster output, probably a "tagged" output would retain the text tag, but that doesn't help with PDF output.05:45.45 
  <RayJohnston> me, too05:45.57 
  <RayJohnston> me, too (bed)05:46.12 
  <qwertynik> From the conversation so far, it is mostly clear that the usage of Type 3 fonts makes it difficult to change the text's (as it appears to human eye) color to black.05:51.01 
  <qwertynik> Before finding the -dBlackText flag in Ghostscript, was using MuPDF via a Python library to extract the text and then create a new PDF with black text. However, this was challenging considering the different orientation, and other attributes associated with text rendering.05:51.10 
  <qwertynik> But, since the text using Type3 font was extracted as 'text' using MuPDF, wondering if a supplementary flag can be added that states even if it is a Type 3 font rendered 'object', render it is a black - assuming it is text. Do not care about it being a drawing.05:51.27 
  <RayJohnston> @qwertynik what about using -dFILTERIMAGE -dFILTERVECTOR to just leave the text -- will that help you?05:52.09 
  <qwertynik> Yes certainly, that has helped so far. Have been using this command `gs -o op.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dBlackText -f BlackTextGenerationTesting.pdf` so far. And now bumped across Type3 fonts in the PDF 🙂05:53.34 
  <RayJohnston> so that, with -dBlackVector ??05:54.26 
  <RayJohnston> and/or -sColorConversionStrategy=Gray ??05:57.53 
  <RayJohnston> (pdfwrite options are quite myriad and have lots of "interesting" corner cases in themselves)05:58.40 
  <qwertynik> With this wouldn't the other vectors that is to be removed also remain?05:58.48 
  <qwertynik> With this wouldn't the other vectors (that are actually drawings) that is to be removed also remain?05:59.03 
  <qwertynik> With this wouldn't the other vectors ('actually' drawings) that is to be removed also remain?05:59.20 
  <RayJohnston> the "filter" subclass device filters BEFORE any color conversion done by -dBlackVector (I am pretty sure -- @mvrhel, or testing, would have to confirm that)06:01.02 
  <RayJohnston> There are several pseudo devices that can affect the processing: "FirstPage LastPage" that swallow ALL operations for a page, "ObjectHangler" that filters object type calls, and "Nup" that places objects on a "master" page.06:05.45 
  <RayJohnston> There are several pseudo devices that can affect the processing: "FirstPage LastPage" that swallow ALL operations for a page, "ObjectHandler" that filters object type calls, and "Nup" that places objects on a "master" page.06:06.00 
  <RayJohnston> These all happen before any "target" device (including pdfwrite) see the operation.06:06.51 
  <RayJohnston> so, I would expect -dFILTERIMAGE -dFILTERVECTOR to only leave the text06:09.15 
  <qwertynik> In that case, wouldn't the vector be filtered out - in this case the relevant text too? 06:09.37 
  <qwertynik> Attempted this command and the required text (using Type 3 font) was **removed**06:09.39 
  <qwertynik> `gs -o op.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dBlackText -dBlackVector -f MIAA-emailer6.pdf`06:09.40 
  <qwertynik> This is the test PDF.06:10.09 
  <qwertynik> https://cdn.discordapp.com/attachments/773567375458828329/969480693627375636/MIAA-emailer6.pdf06:10.10 
  <qwertynik> In that case, wouldn't the vector be filtered out - in this case the relevant text (in Type 3 font) too? 06:11.23 
  <qwertynik> Attempted this command and the required text (using Type 3 font) was **removed**06:11.24 
  <qwertynik> `gs -o op.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dBlackText -dBlackVector -f MIAA-emailer6.pdf`06:11.25 
  <RayJohnston> @qwertynik if you are looking to get the text "MERZ INSTITUTE", etc. it is NOT text. It is part of an embedded image. The "Dear Doctor" stuff IS text06:20.59 
  <RayJohnston> The "MERZ NEWS" is also text06:25.47 
  <qwertynik> @RayJohnston Looking to get the highlight portion as text06:28.44 
  <qwertynik> https://cdn.discordapp.com/attachments/773567375458828329/969485371584950312/unknown.png06:28.45 
  <qwertynik> @RayJohnston 06:29.34 
  <qwertynik> Yes, that's a part of the image. Not looking to extract this. 06:29.35 
  <qwertynik> Looking to get the highlight portion as text06:29.36 
  <qwertynik> https://cdn.discordapp.com/attachments/773567375458828329/969485371584950312/unknown.png06:29.37 
  <KenSharp> The current architecture of the pdfwrite device does not support this feature. The way it is implemented for rendering means that it mostly works for pdfwrite, but not always.07:32.09 
  <KenSharp> Specifically it is known (and expected) not to work for uncoloured type 3 fonts and uncoloured patterns, due to the way colour processing is lazily written.07:34.36 
  <qwertynik> Can a different device and command be used before running this command `gs -o op.pdf -sDEVICE=pdfwrite -dFILTERIMAGE -dFILTERVECTOR -dBlackText -f BlackTextGenerationTesting.pdf` to account for the lazy color processing?07:36.49 
  <KenSharp> No.07:37.21 
  <KenSharp> Or at least, not without rendering the text to an image07:37.35 
  <KenSharp> But then BlackText won't work07:37.49 
  <KenSharp> You might be able to convert the text to outlines using NoOutputFonts, and then use -dBlackVector07:38.19 
  <KenSharp> That's a 2-step process you understand ? First run to pdfwrite with NoOutputFonts, and then run the result through pdfwrite with -dBlackVector07:38.57 
  <KenSharp> Of course the text will no longer be text after that, and the file will be considerably larger07:39.19 
  <qwertynik> Sounds plausible. However, wouldn't this also keep other vectors that are actually drawings?07:40.41 
  <KenSharp> If you start with FILTERIMAGES and FILTERVECTOR along with NoOutputFonts, no07:41.07 
  <qwertynik> I think the text using Type 3 font is also being filtered out with the FILTERVECTOR flag. Will recheck now07:43.14 
  <KenSharp> It shouldn't be, it wasn't previously, was it ?07:43.42 
  <qwertynik> Yes, it is being filtered out.07:44.08 
  <qwertynik> Just re-verified.07:44.14 
  <KenSharp> Well you're stuck then07:44.24 
  <qwertynik> Reposting earlier message in case it is missed07:45.33 
  <qwertynik> Before finding the -dBlackText flag in Ghostscript, was using MuPDF via a Python library to extract the text and then create a new PDF with black text. However, this was challenging considering the different orientation, and other attributes associated with text rendering.07:45.34 
  <qwertynik> But, since the text using Type3 font was extracted as 'text' using MuPDF, wondering if a supplementary flag can be added that states even if it is a Type 3 font rendered 'object', render it is a black - assuming it is text. Do not care about it being a drawing.07:45.57 
  <KenSharp> No, I saw it, but I don't see the relevance, if you want to ask questions about MuPDF you'd be better off in the #mupdf channel07:46.18 
  <qwertynik> Ok sure. Posted here assuming they could be related and it would 'click' some idea.07:47.30 
  <KenSharp> The MuPDF developers mostly don't read #ghostscript and vice-versa, we've generally got enough to do with our own products.....07:48.03 
  <qwertynik> Is supporting such a flag technically possible in Ghostscript?07:49.24 
  <KenSharp> Supporting what flag ?07:59.34 
  <KenSharp> The problem isn't anything to do with recognising text, we know it's text, the problem is writing something other than the current colour as the colour to use for the text. While not destroying the current colour in case it also gets used for soemthign else, like a fill.08:03.56 
  <KenSharp> Is it possible to do that ? ALmost certainly, it's software. But I don't intend to invest the amount of effort required to support that.08:04.36 
  <qwertynik> Ok. Just realized that the generated PDF has most text with its color changed to black color. But for some text the color **isn't changed** - because Type3 font is used 🤦‍♂️09:20.07 
  <qwertynik> https://cdn.discordapp.com/attachments/773567375458828329/969528497464815636/unknown.png09:20.07 
  <qwertynik> Ok. Just realized that the generated PDF has most text with its color changed to black color. But for some text, annotated and highlighted, the color **isn't changed** - because Type3 font is used 🤦‍♂️09:31.39 
  <qwertynik> https://cdn.discordapp.com/attachments/773567375458828329/969528497464815636/unknown.png09:31.40 
  <Robin_Watts> @chrisl OK, I think what is there now should work.15:11.50 
  <KenSharp> possibly in #ghostscript-tech ?15:12.22 
 <<<Back 1 day (to 2022/04/28)Forward 1 day (to 2022/04/30)>>> 
ghostscript.com #mupdf
Search: