IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2014/10/09)20141010 
rsc So I can influence the font name for example.09:50.29 
  Right now the "font name" is "DejaVuLGCSans-Identity-H"09:50.43 
kens Well tell it to use DejaVuLGCSans then. But I have no idea if that will work at all09:51.07 
  The PostScript has 2 byte encoded text09:51.32 
rsc Is it enough to do a search and replace in the *.ps?09:51.37 
kens God no.09:51.43 
  PostScript is a programming language09:51.54 
  As I said, your text is double byte, you would need a type 0 or CIDFont to be able to handle that properly09:52.27 
  The application generating the PostScript needs to not do that in order for you to get soemthing which will work09:52.55 
rsc How to get "type 0"? I thought CIDFont is an issue?09:54.47 
kens Type 0 is what a CIDFOnt turns into when you load it.09:55.13 
rsc No I am totally confused. I thought CIDFont is the reason why I can't copy & paste without garbled results.09:55.43 
  *Now09:55.46 
kens But it can also be manufactured by other means. However I doubt you can do it from the application09:55.48 
  The fact that the application is using a CIDFont wis why the copy/paste doesn't work, yes09:56.08 
  And because the application is using a CIDFOnt, it emits the text in a form encoded suitably for the CIDFont. THat form will *NOT* work with a type 1 or type 42 (what you would think of as TrueType) font.09:56.57 
  SO you cannot simply search and replace the font name, replacing a CIDFont with a regular font, because hte text will not then be suitably encoded for that font.09:57.44 
  FWIW the CIDSystemInfo attached to the CIDFont in the PDF file does say that the Ordering is Unicode, so a smart application could use that to figure out what the text is09:58.33 
rsc Then neither evince nor Adobe is smart.09:58.54 
kens Well its a heuristic, and not totally reliable. It would be effort to code that, so I guess most people don;t bother09:59.22 
  THe chances of it being present, and correct, are small09:59.34 
  I must admit I'm not completely sure how our own txtwrite device is getting mostly useful text out, and I wrote that device.....10:00.10 
rsc The copy & paste result from Adobe looks "correct" but it is "10 00 41" - while "41" is "A"10:00.39 
kens Like I said, you are using 2 byte encodings (Unicode) so the 1st byte is always going to be 0x00 for Western languages10:01.27 
  Frankly there is no way you are going to get a PDF file you can reliably cut and paste from, starting from the PostScript you are using. All the PDF consumers are going to be forced to fall back on guesswork (because there is no ToUnicode information available) so therefore unreliable10:03.25 
  Some may work better than others.10:03.43 
rsc But the text for copy & paste is not separately supplied? It is generated from the pdf writer?10:07.03 
  And I can not make it trimming the first byte simply? :)10:07.36 
kens trimming the fist byte from what exactly ? THe PostScrip file ? The PDF file ? THe cut and pasted text ?10:08.25 
  I don't know what you mean by the text being separately supplied10:08.53 
rsc Copy & paste result is 0x10 0x00 0x41 if I am not mistaken. So where does this exactly come from?10:09.03 
  Is it possible to have some kind of hackish workaround there to only have 0x41 or 0x00 0x41 instead?10:09.42 
  (to get a correct copy & paste result)10:09.54 
kens Well it comes form the application doing the cut and paste I guess. Where it comes from exactly I can't guess. Howeve3r the text is present i the PDF file so it comes (basically) from there10:09.58 
rsc If I have an "A" in a PDF, is it there twice? Once for representation and once for copy & paste?10:10.42 
kens rsc by changing which file ? The only thing you can change is the cut and pasted text, change either the PostScript file by removing the bytes and it will give you an error when you try to process it, change the PDF file by removing the bytes and it will not open10:11.07 
  rsc, no the text is only there once.10:11.17 
  Cut/paste/search is done by examining the text in the PDF file. First you look up which font is being used (also in the PDF file) then you take the correct number of bytes and make a numeric character code from it.10:12.24 
  What happens after that depends on the font and the ionformation available.10:12.35 
  If there's a ToUnicode CMap then you take the character code as an index, and that tells you the corresponding Unicode code point.10:13.03 
  THat's 100% reliable and teh way most things work10:13.14 
  If you don't have a ToUnicode CMap then you are left with guessing.10:13.28 
rsc So if I have to stick to CIDFont (which is likely because I can not change the application fundamentally), I need ToUnicode CMap definately to get rid of this, right?10:13.53 
kens You can use the glyph names from type 1 fonts. You can look up the POST table (if its present) from a TrueType font. If neither of those is availabvle then most apps simply say 'lets hope its ASCII'10:14.19 
  rsc yes, if you are using a CIDFont the *only* reliable mechanism is a ToUnicode CMap10:14.45 
rsc Okay. For that usecase it would be enough if I cover characters from Western Europe.10:15.07 
kens You are using subset fonts, so you can't produce a 'one size fits all' ToUnicode CMap10:15.55 
rsc Let me go one step back. That fscking application here supports either "latin1" only by using Type1 fonts or "unicode" by using TTF. 10:17.20 
kens If you say so.10:17.35 
rsc Can I somehow figure out if it uses CID for the "latin1 only" stuff?10:17.43 
kens Look at the PostScript and see what font name it uses10:17.57 
  If its a name of the form <font name>-Identity-H or similar then its a CIDFont10:18.24 
  Also you cna look at the text in the PostScript and see if its single byte or doube byte encoded10:18.44 
rsc NimbusSanL-Regu, Type 1C tells Evince here.10:19.30 
kens If your PostScript contained a GlyphNames2Unicode entry in the font dictioanry then you would get a ToUnicode CMap generated for you, but since the PostScript doesn't actually have the font embedded, that can't happen10:20.01 
  rsc yes that's a type 2 font, but its basically the same10:20.26 
  You should have single byte encoded text, I would guess it will copy/paste/search as you expect10:20.48 
rsc Copy/paste/search works, thus likely single byte encoded.10:21.17 
kens Yes.10:21.23 
  Like I said, in the absence of any other information, applications will usually assume ASCII, and Latin1 is basically ASCII10:21.54 
rsc How can I generate such a "ToUnicode CMap"?10:22.52 
kens Like I said, you can't, it needs to be done programatically by the application embedding the font.10:23.24 
  In case you hand't guessed, you're in a very complicated area of PDF here10:23.45 
rsc Can't I provide some mapping list to ghostscrit?10:25.00 
kens Not really, no.10:25.14 
rsc Means a "ToUnicode CMap" is only a hypothetical but not practical solution?10:25.35 
kens Its highly practical for certain tasks; starting from another PDF file, or PostScript generated on Windows for instance.10:26.16 
  But if your application isn't generating it, its not easy to add afterwards.10:26.35 
rsc What would the application have to do exactly?10:27.33 
kens OK well there is no concept of a ToUnicode CMap in PostScript. THe Windows Postcript driver has a specific extension which includes a /GlyphNames2Unicode entry in an embedded font dictionary and we support that extension.10:28.38 
  So an application (or PostScript producer) would have to firstly embed teh font (your app doesn't so it fails at the first hurdle) then it would have to add the entry to the dictioanry and fill it in so that the character codes are matched to Unicode (actually UTF-16) values. In your case that would be an identity mapping of course.10:29.49 
rsc Uhm. Nothing that can be easily done as non-C-programmer I guess.10:32.40 
kens No, I'm afraid not.10:32.49 
  Just embedding the font would be a complex task10:32.58 
rsc But it is generic and not really application specific?10:35.17 
  So is it something where Artifex could stick a price to it?10:35.42 
kens The ToUnicode CMap is part of the PDF specification, the GlyphNames2Unicode extension is specific to the Adobe PostScript driver on Windows10:35.54 
rsc No Windows involved here, just Linux.10:36.09 
kens I'm not sure what you are asking about....10:36.25 
chrisl We'd have to modify every applications that emits Postscript......10:36.45 
rsc I thought if it could be an option to let you change the application to include the GlyphNames2Unicode entries to the PostScript.10:37.40 
kens As chrisl says, we would have to modify every application that emits PostScript. We would also have to change at least the one you are using to embed teh fonts too. We don't have that kind of manpower10:38.23 
rsc Why every application that emits PostScript? I thought the issue is that my application here doesn't just do the right thing?10:38.56 
kens You seem to be talking generically, not about a specific application10:39.20 
rsc Oh, sorry if I was unprecise about that.10:39.47 
kens If you mean your specific application then it would need to be modified to embed fonts in the output, and add teh relevant GlyphNames2Unicode information10:39.53 
chrisl And, frankly, especially right now, we don't have the man power to take on work like that10:40.05 
kens It would be a major undertaking for the people who maintain that appplication, well outside of anything we coudl undertake, especially at the moment.10:40.43 
rsc kens: okay, because it takes months to change that?10:41.03 
  kens: can you give me a very rough estimation how huge it would be?10:41.16 
  I anyway need to run to somebody and ask for budget etc.10:41.34 
kens Well we don't have any background in that application, so we would first have to understand it. Embedding fonts is a *very* complicated process and that in itself would take an experienced engineer (experienced with fonts and PostScript) months to write and test fully.10:42.17 
rsc Okay, so months.10:42.44 
kens Please don't ask us to undertake such a task, we would have to say no.10:42.50 
  rsc months *if* you have an engineer experienced in PostScript and fonts.10:43.12 
rsc kens: yes, I got this.10:43.20 
kens There are very few of those in the world.10:43.20 
rsc Is changing the application from CIDFont to something else better faster done?10:44.03 
  s/better/10:44.11 
kens I imagine the application uses CIDFonts for the very excellent reason that its the only way to support non-Western languages10:44.40 
rsc Is it? But how does say, libreoffice, solve this? I don't see "TrueType (CID)" there in such PDFs.10:45.29 
kens So changing to another font type probably isn't an option. I imagine that text is stored internally as Unicode code point values, so it would be hard to change10:45.37 
  rsc You can include 2 methods of course, one for Western text and one for non-Western (>256 characters in the language)10:46.26 
  More complex of course10:46.34 
  Supporting two methods for achieving the same end is usually somethign engineers ahte10:46.59 
  s/ahte/hate/10:47.40 
chrisl kens: what's the procedure when a bountiable bug is resolved?10:50.02 
kens I don't recall right now10:50.15 
  Probably best to notify henry10:50.33 
chrisl I'll do that....10:50.45 
kens Doesn't SHelly already know the procedure ? He must have claimed before.....10:51.27 
chrisl Yeh, I wasn't sure if it's a "pull" procedure from Shelly's end, or a "push" procedure from henrys 10:52.02 
kens I have a suspicion its a pull, but I could easily be mistaken, no harm in contacting henry anyway10:52.22 
chrisl Okay, I've let both Henry and Shelly know......11:07.24 
kens Seems reasonable11:07.32 
nsz tor8: yesterday i tried the urls on http://git.ghostscript.com/?p=user/tor/mujs.git;a=summary but could not clone them11:25.08 
  looking at the commit diff in browser looked ok, except i'd use isalpha instead of manual 'a'<=c && ...11:25.50 
  libc isalpha generates smaller and faster code11:26.04 
  http://git.musl-libc.org/cgit/musl/tree/include/ctype.h#n3011:26.23 
  this is how isalpha should be implemented11:26.35 
  hm actually libc isaplha is not correct semantically but the musl implementation is how to do efficiently what you do there11:30.20 
tor8 nsz: libc isalpha is setlocale dependent, so unusable11:39.12 
  and musl's isalpha (while minimal and elegant) only tests A-Za-z, not the full unicode range11:40.31 
  nsz: I'm concerned that you couldn't clone the repo though11:41.00 
nsz i mean you could do muslisaplha(c) || isalpharune(c)11:42.16 
  but it's just a minor nitpick11:42.29 
tor8 nsz: ahem, my bad. I'd confused git-export-daemon-ok and git-daemon-export-ok. should be able to clone now.11:42.45 
nsz :)11:42.55 
tor8 nsz: true, but as you said, it's a minor nitpick :)11:43.00 
nsz i's assume the current code is just optimization and isalpharune handles the ascii case as well11:43.34 
tor8 nsz: yeah. isalpharune handles ascii as well, but it's quite a bit slower since it involves a binary search through a table11:44.03 
nsz clone works but i can only checks things later11:49.07 
tor8 nsz: no rush11:49.45 
nsz btw locale is not an issue with isalpha unless setlocale is called (and the libc supports more than one 8bit encodings)11:49.58 
  the problem is that if c>255 is ub11:50.19 
rsc kens: okay, thanks so far.11:52.31 
tor8 nsz: we're a library, we have no control over whether the user has called setlocale or not :(12:42.29 
  hence we need to reimplement strtof and printf. such a stupid design, setlocale.12:42.49 
nsz yes that's a shame13:05.34 
  btw strtof and float printf are tricky to implement correctly13:06.32 
  (musl libc has correctly rounded implementations of these in c)13:06.54 
henrys chrisl, kens: shelly usually batches up a few and send me email then I review them. You don't need to do anything14:19.12 
kens thanks henrys14:19.48 
zx hello 14:20.38 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.14:20.38 
chrisl henrys: okay, cool, thanks!14:21.04 
zx I want to insert an image into an exist pdf file with mupdf.But there are not documents,I hope you can help me.Can you give me some examples? I use C language. I have added an annotation ,I can see the rect but cann't find the image.14:21.43 
  anynone here?14:23.27 
Robin_Watts zx: yes, people are here14:23.58 
kens I see we're getting customer emails not cc'ed to support again.14:24.14 
  Halfway through some conversation :-(14:24.23 
zx can you give me a good way to insert an image to an exist pdf file14:26.43 
Robin_Watts zx: Not using current mupdf, no.14:27.13 
kens Adobe Illustrator,possibly Photoshop14:27.17 
Robin_Watts zx: You could try to use the new filter stuff in mupdf.14:27.46 
  That would enable you to tack on arbitrary content to the end of the content streams.14:28.15 
zx could you give me an example14:28.20 
Robin_Watts but that requires a degree of PDF knowledge.14:28.25 
  No examples, no, it's still very new code.14:28.36 
  It was written to allow people to add watermarks.14:28.42 
  It may still only be on my repo...14:29.06 
  zx: http://git.ghostscript.com/?p=user/robin/mupdf.git;a=summary14:30.56 
  The 'add post processing option to page operator cleaning' commit is the one you need.14:31.13 
zx ok thanks alot14:32.13 
Robin_Watts essentially you call pdf_clean_page_contents and pass in the page you want to work with.14:32.28 
  You also pass in a pdf_pagE_contents_process_fn.14:32.38 
  That is called back after the page contents are cleaned, with the page contents in a buffer.14:33.02 
  You can then append to the buffer.14:33.09 
  Let me know how you get on with it. It's very new (almost completely untested) code.14:33.24 
kens gives up on the customer email, one for marcos to sanitise14:35.36 
henrys chrisl: NOCACHE doesn't work in pcl because it is done in gs_init.ps. so I need a call to 0 setcachelimit in all the other languages when we parse the parameter.15:47.40 
chrisl henrys: you could implement NOCACHE in pcl15:48.14 
  or I can do it....15:48.45 
henrys chrisl: no I got it.15:51.21 
mvrhel_laptop good morning15:53.11 
kens morning15:53.19 
henrys chrisl: I hate booleans that start with NO but I guess NOCACHE is something we are stuck with.17:10.18 
rayjj henrys: there's a lot of NO... options in the Ghostscript set17:11.36 
henrys we should try and be more positive17:12.09 
rayjj I think Peter sort of changed styles over time.17:12.16 
  but I agree that -dUseCache=/false would be better (or even better -dUseFontCache=/false so we know which cache)17:13.06 
chrisl I guess the preference was for options that didn't need a "=....."17:16.07 
kens is amused by the Good emails :-)19:53.27 
 Forward 1 day (to 2014/10/11)>>> 
ghostscript.com
Search: