IRC Logs

Log of #ghostscript at

 <<<Back 1 day (to 2013/07/02)2013/07/03 
henrys ray_laptop, mvrhel_laptop :some chance the new employee will show up this evening if I'm not here say hi. 01:53.39 
  or say hi if I'm here, too ;-)01:54.06 
mvrhel_laptop will do01:56.48 
  I am actually on the east coast the rest of this week so I will probably not be up too late 01:57.15 
  which reminds me need to send out email that I will be out tomorrow01:57.27 
tor8 Robin_Watts: paulgardiner: ping.09:21.04 
Robin_Watts pong09:21.20 
paulgardiner sorry biab09:21.23 
Robin_Watts tor8: Feeling any less achy?09:21.36 
tor8 my feet hurt!09:21.42 
  so sitting still helps :)09:21.51 
  looking through the patches I missed yesterday and monday, I spotted one function that could use a better name09:22.10 
Robin_Watts go on...09:22.19 
tor8 pdf_set_objects_parent_num -> pdf_set_obj_parent(_num)09:22.20 
  I didn't foresee the type3 mess... sorry about that one09:23.17 
Robin_Watts tor8: Me either, but it worked out OK.09:23.44 
tor8 good! :)09:23.56 
  what's the status on the pagetree branch?09:24.01 
  I've swapped out where we were09:24.07 
Robin_Watts robin/pagetree is the latest version I think.09:24.33 
  It's rebased onto master, and has various little fixes in that we discussed.09:24.49 
tor8 page tree creation in the pdf device, and/or rebalancing it could do with some work09:25.04 
Robin_Watts My progressive stuff is now based on the end of that.09:25.08 
tor8 the pdfclean retainpages builds a new pagetree array from scratch09:26.16 
  the pdf_insert/delete_page calls only work if there is an existing non-empty page tree09:26.50 
Robin_Watts tor8: right.09:28.40 
  My pdfwrite code had some stuff with "needs_page_tree_rebuild" which would then rebuild a page tree just-in-time before writes.09:29.27 
tor8 robin/pagetree doesn't build09:29.40 
Robin_Watts but that's going to be problematic now we lookup stuff.09:29.48 
  tor8: really?09:29.52 
tor8 pdf_close_document is missing an "int i;"09:30.10 
paulgardiner name change seems sensible to me.09:32.09 
Robin_Watts Updated pagetree branch.09:34.32 
  Who wants to do the name change?09:35.46 
  I will then.09:37.37 
paulgardiner Oops! sorry. Happy to do it later. I have to my accounts now, before I go away Fri09:43.05 
Robin_Watts no worries.09:47.40 
  tor8: renaming commit on robin/master09:47.47 
  tor8: We had a free user open bug 694375. I've put a proposed patch on it. Any thoughts ?09:53.45 
Robin_Watts runs10:02.28 
maxspot Hi all!10:08.00 
  I'm new of this IRC channel, I came here for one simple question...10:09.09 
  muPDF supports Layers/OCG ?10:09.37 
  anybody here?10:17.56 
tor8 maxspot: there is some support, but it's not extensive. what behavior in particular regarding optional content groups do you want to know about?10:25.26 
Irys hello11:25.30 
ghostbot hi, irys11:25.30 
Irys i noticed a problem when i merge two pfd files into one11:26.02 
  some numbers in merged pdf file are different than in the original files, cant figure out why11:27.23 
chrisl Irys: with Ghostscript?11:28.08 
Irys yes11:28.14 
  i used command gs -dNOCACHE -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=FV.pdf -dBATCH fv_2013-07_41035.pdf fv_2013-07_41036.pdf11:28.31 
chrisl Irys: you probably have two or file files with two or more font subsets with the same name, so pdfwrite cannot differentiate between the different font subsets, hence the wrong glyphs are used11:29.12 
Irys is there any workaround to this ?11:30.25 
  or can i paste link to those files so u could look at it ?11:30.45 
chrisl Irys: sort of - it's not great, and may cause some loss of quality, depending on the content of the files......11:31.25 
  You can covert to Postscript with ps2write, then convert the result back to PDF 11:32.20 
Irys thing that changes is only in one place, the numbers are changed, like it have it in cache and use it in next pdf file (page)11:32.26 
chrisl Irys: The "cache" is the font that pdfwrite accumulates to write to the output11:33.00 
  Irys: note that the "bug" is with the application that produced the PDFs, not with pdfwrite11:33.32 
Irys well its not mine application, and i cant change pdfs11:34.29 
chrisl Irys: you could try something actually designed for this kind of PDF manipulation, like pdftk, for example11:36.37 
Irys when i merge 100 files then about 5 pages are corrupted11:38.32 
  all files generated by the same program11:38.59 
  just found 2 that causes this, cant figure out what files causes this, found 2 that makes this happen11:40.46 
  and i have ~10k those files monthly ... and i need to merge some of them everyday :/11:41.40 
chrisl I would expect each file to work fine on its own, and it would require two or more to show the problem11:42.04 
Irys so its hard to 'edit' the corrupted one when i dont know which are corrupted11:43.01 
  seems like easiest solution is to do pdf2jpg, and then merge jpgs to pdf ? :)11:44.18 
chrisl Irys: why not do as I suggested and convert the files to PS, then back to PDF? You'll lose less doing that than converting to JPG11:45.00 
Irys this will 'fix' the pdf file ?11:45.30 
chrisl ps2write needs to regenerate the font subset, so it also generates a new subset prefix for the name - and ps2write does it properly11:46.16 
  Irys: it's worth a try, I'd say. The problems that can arise are if your PDFs have transparency in them they will get "flattened" to a large image, or if you have CIDFonts they will be converted to Type 3 fonts11:47.59 
Irys i'll try this then, thank you11:48.05 
chrisl Irys: IIRC, you have to convert each PDF to a PS, then "merge" the PS files through pdfwrite into a PDF11:48.48 
Irys well there is an image + text over image11:49.05 
chrisl Is the image visible "through" the text?11:49.45 
Irys its under the text11:54.01 
chrisl Irys: you might be okay, then. Best thing is to try it.....11:54.29 
Robin_Watts tor8: ping11:54.37 
tor8 Robin_Watts: yes?11:54.54 
Robin_Watts so, you're back at work now?11:55.01 
  Are you looking into the page tree rebuild/rebalancing?11:55.13 
tor8 in theory, lots of distractions going on at the moment so not getting much done :(11:55.36 
Robin_Watts Got a few minutes to talk about progressive loading?11:55.40 
tor8 but yes, I have the editor open trying to work on the page tree rebuild11:55.55 
Irys chrisl: it did the job, thank you11:56.37 
Robin_Watts So, I have my progressive code rebased on top of the pagetree stuff.11:56.38 
  and it loads the first page and displays that.11:56.45 
chrisl Irys: cool, np11:56.48 
Robin_Watts and finds subsequent pages as it reads through the file.11:56.56 
  but the problem I'm hitting is that linearisation does NOT send all the resources needed for a page at the same time.11:57.19 
  Any resource shared between pages is sent AFTER all the pages.11:57.38 
  so any fonts or shared images or shared graphic states etc all come late.11:58.02 
tor8 ewww11:58.31 
Robin_Watts So with the PDF reference manual, I can't properly display (say) page 3 until I have loaded the whole file.11:58.34 
tor8 why on earth would they do that?11:58.36 
Robin_Watts beats me.11:58.54 
  but that's unquestionably the case.11:59.03 
tor8 they probably only tested with the 1st page...11:59.09 
  or with a plugin that can do http sub-requests for different parts of the file11:59.23 
Robin_Watts The stated aim of linearisation is to be able to display the first page immediately, and then the rest later.11:59.37 
maxspot tor8: I'm a programmer, using muPDF.dll to read geometrical data from PDF...11:59.51 
  tor8: simply I want to know the Layer for every Path...12:00.15 
Robin_Watts So, I was wondering about having a rendering mode that did the best it could, ignoring any errors.12:00.16 
maxspot tor8: what support there is?12:01.12 
Robin_Watts maxspot: For free users, "none".12:01.25 
  where the exact definition of "none" varies according to how we are feeling.12:01.50 
  Alternatively, we do commercial licenses (which can come with or without support)12:02.06 
  and support contracts.12:02.09 
  maxspot: It's been ages since I looked at OCG stuff.12:02.59 
tor8 Robin_Watts: right. a mode for the font and image loading to just ignore errors (use substitute fonts and skip images)?12:03.19 
Robin_Watts and gstates, yes.12:03.33 
tor8 and skip xobjects and extgstates12:03.35 
Robin_Watts yes.12:03.38 
tor8 go for it12:03.46 
  maybe we should do that in any case...12:03.59 
kens chrisl Irys, I believe that you can get the same effect by 'converting' each PDF to a new PDF, tehn merging those. Because the fonts are rewritten, and a senisble unique prefix generated for each subset font name, the trick still works. And transparency is preserved.12:04.10 
Robin_Watts It could be a hint, or it could be an entry in the cookie.12:04.14 
tor8 but I guess switch the flag on ERROR_TRYLATER12:04.21 
maxspot Robin: If I buy commercial license...I've support ok... but then I can see the Layers ?12:04.53 
Robin_Watts I'd like to record the fact that an error was hit in the cookie, so the calling app knows that the page is incomplete.12:05.11 
tor8 I think we'd want to save the TRYLATER error, do our best, and somehow tell the error cookie that there were TRYLATER errors12:05.23 
  okay, cookie->trylater error count?12:05.34 
Robin_Watts tor8: yeah, something like that.12:05.43 
  maxspot: The code is the same for everyone, supported/commercial/non-supported.12:06.05 
  but we could make changes for a supported customer.12:06.14 
maxspot Ah, ok... now I understand :)12:06.30 
Robin_Watts maxspot: So, in these files, are you sure the information you want is OCG ?12:06.55 
  rather than marked content ?12:06.58 
tor8 maxspot: if you dig into the code that runs the pdf interpreter, we pass in a "usage" parameter which matches the OCG layers to enable/disable12:07.05 
maxspot Robin: with Acrobat I see different layers in my PDF file...12:08.00 
  Robin: Layers == OCG ?12:08.18 
Robin_Watts maxspot: That's what I'm asking :)12:09.22 
maxspot I'm not sure...12:09.53 
Robin_Watts I'd need to look at a file to be 100% sure that my memory here is right.12:09.57 
maxspot ok, I've a very little file here.. how I can send U ?12:10.24 
Robin_Watts maxspot: Upload it somewhere and paste the link?12:10.37 
  or mail me at robin.watts at artifex.com12:10.45 
maxspot ok, a moment...12:11.03 
  vias Mail...12:11.13 
  Robin: Mail sended !12:13.55 
Robin_Watts main receiverised.12:16.17 
maxspot Robin: these are the lines:12:16.46 
  Robin: 13 0 obj <</Intent 22 0 R/Name(Livello 1)/Type/OCG/Usage 23 0 R>> endobj 14 0 obj <</Intent 24 0 R/Name(Livello 2)/Type/OCG/Usage 25 0 R>> endobj12:16.58 
Robin_Watts Ok, so, yes that's OCG.12:18.24 
maxspot good...12:18.45 
Robin_Watts So we have *some* OCG support in the code. It's largely untested, and may well be unfinished though.12:18.59 
  You can try it out, and if it doesn't work, open a bug about it.12:19.58 
  If you fancy fixing the code then we'd be very interested in seeing what you come up with.12:20.19 
maxspot ok !12:21.06 
Robin_Watts It may even be possible to open up the bug under our bug bounty program so you could get paid something for fixing it (but this would need to be checked with my boss, so don't take this as a promise at this stage)12:21.08 
  Feel free to ask us questions here, and we'll help as much as we can, but we're a bit buried in customer work at the moment, so probably can't spare the time to do much direct coding ourselves.12:21.59 
maxspot Robin: For the moment, using muPDF.dll I don't konw where read the info about OCG :)12:23.32 
  Robin: In your structure.12:23.44 
Robin_Watts maxspot: OK. in source/pdf/pdf-xref.c12:25.50 
  when we pdf_open_document we end up calling pdf_init_document which calls pdf_read_ocg12:26.07 
  That builds the ocg structure within pdf_document.12:26.30 
  and we call pdf_ocg_set_config12:26.41 
  and that sets some state flags...12:27.06 
maxspot ok, perfect, thanks a lot...12:27.14 
Robin_Watts and then we run into a forest of FIXMEs.12:27.23 
  so this code looks incomplete :(12:27.31 
  but hopefully it points the way towards something that can do what you need.12:28.05 
maxspot now I go to read some code... than I come with new questions :)12:28.07 
Robin_Watts good luck.12:28.14 
maxspot thanks!12:28.22 
Robin_Watts np.12:28.36 
maxspot A curiosity: what is the relationship between muPDF and Ghostscript ?12:38.15 
kens siblings :-)12:38.35 
  Both are commercially licenced through Artifex12:38.45 
Robin_Watts with an age difference :)12:38.49 
kens Some of the same developers are involved12:38.53 
maxspot ah, ok.12:40.16 
paulgardiner Robin_Watts: did you get a chance to look at the annotation fix?13:18.42 
Robin_Watts paulgardiner: I thought I'd pushed it ?13:18.53 
paulgardiner No. But I should be able to now.13:19.13 
Robin_Watts hold on.13:19.21 
  You should be using fz_caught(ctx) not ctx->error->errcode, I think.13:20.10 
paulgardiner Oh right. Let me fix that.13:20.49 
Robin_Watts lunches.13:21.21 
  paulgardiner: New version needs pushing ?13:59.59 
paulgardiner s'ok. I should be able to now.14:00.31 
Robin_Watts looks good to me.14:01.05 
paulgardiner ta14:01.31 
  Robin_Watts, tor8: I've updated pdf_write_document to support incremental save. patch is on paul/incremental-save15:47.56 
Robin_Watts I have progressive loading working with pdf_reference17.pdf.15:51.03 
  pages display in the wrong font as the stuff downloads.15:51.22 
  I'd love to know whether chromes PDF stuff does byte range fetching...15:51.42 
kens You'd need to check the server end15:52.04 
  or inspect packets :-)15:52.12 
Robin_Watts kens: Or redirect via a proxy.15:52.18 
henrys hmm no sign from the new guy? I'm wondering if he received my email - it wasn't really from me but sent from google as part of setting up his new account 15:52.20 
Robin_Watts I have some code that runs a forwarding port and prints out what goes through it.15:52.43 
  henrys: He's been replying to welcome emails this morning, (but from his non artifex address)15:54.26 
  paulgardiner: What does fread do if you feed it a NULL file pointer?15:56.31 
  Likewise fwrite.15:56.43 
  I suspect the android code needs some more safety checking in case someone deletes a file/directory in the background.15:57.11 
  I wonder if the bytewise copy of the original file should be part of fz_write_document ?15:57.48 
paulgardiner Robin_Watts: I think it does exactly what a programmer who ignores the possibility deserves :-)15:58.01 
Robin_Watts ofla!15:58.11 
  paulgardiner: Could we push the pathname for the old file into fz_write_options ?15:59.28 
  That way fz_write_document would have all it needed.15:59.45 
paulgardiner yeah, but might you wish to do the copying differently on different platforms16:00.22 
  I didn't understand the bit about deleting directories in the background. Are you thinking the original file might no longer be there?16:01.45 
Robin_Watts paulgardiner: Yes.16:03.31 
paulgardiner So checking fin and fout for NULL would catch that.16:04.25 
Robin_Watts Yes16:04.34 
paulgardiner Ah right. Fine.16:04.41 
Robin_Watts Maybe we could push a FILE * into the options?16:04.50 
  if it's non NULL then that's the base file for the incremental save ?16:05.01 
  I really dislike shelling out to cp.16:05.09 
malc_ paulgardiner: annotations are still "broken"16:05.18 
Robin_Watts malc_: Even after b3f1971 ?16:05.45 
paulgardiner Robin_Watts: I have limited time left ot sort this out. Maybe I could fix the NULL file pointer possibility and leave deciding about handling the copy internally until I'm back.16:08.27 
Robin_Watts paulgardiner: Sure.16:08.43 
  I'm still reading the review. It just feels slightly herniated for the copy to be outside the api.16:09.12 
  paulgardiner: So in writexref, you're called with a from/to range of objects to write.16:11.00 
  and you attempt to shrink that range to subfrom/subto.16:11.21 
  Oh, I see. You break it down into a set of contiguous object blocks.16:12.06 
  That makes sense.16:12.09 
  demorgans law would make "to_include" easier to understand - at least to me.16:15.10 
paulgardiner I could change that, but I wrote it that way thinking it was more readable :-)16:17.46 
Robin_Watts fair enough :)16:17.55 
paulgardiner I should probably add a comment "Include it unless we are doing incremental update and this isn't an altered object"16:18.57 
Robin_Watts Is opts.use-list valid for incremental stuff?16:20.01 
paulgardiner Yeah, although I may be using differently16:20.43 
Robin_Watts use_list is for linearisation and/or garbage collection currently, neither of which are compatible with incremental mode.16:21.03 
paulgardiner I could avoid using it if I just check the entry again instead. Yeah, might be better16:21.43 
  I know exactly where you are looking. I thought it was a bit odd to use it, although it gives the correct results16:22.16 
Robin_Watts Why the change in pdf_get_xref_entry ?16:23.03 
paulgardiner Because the pdf_copy_dict might throw16:23.22 
Robin_Watts oh, cos the copy... yes.16:23.26 
  fz_var(new_table); ?16:23.39 
  no, ignore me.16:23.46 
  ok, so it all looks OK to me, apart from the use of use_list which confuses me.16:24.47 
  And I think I can even cope with the copy being outside the api if I think of an incremental save as being 'adding this new bit onto an existing file', which I guess is the intent.16:25.58 
paulgardiner I'll sort out use_list, NULL file pointer, and add a comment about to_include16:26.00 
Robin_Watts cool.16:26.07 
  paulgardiner: You're off on friday, right?16:26.17 
paulgardiner Great thanks16:26.20 
Robin_Watts but you're about tomorrow ?16:26.26 
paulgardiner Yeah off fri. About tomorrow but working only part of the day16:26.48 
Robin_Watts malc_: ping?16:33.25 
  paulgardiner, tor8: 2 small changes on robin/master16:33.40 
paulgardiner Both look good to me16:42.01 
Robin_Watts Thanks.16:57.14 
malc_ Robin_Watts: aye17:21.11 
Robin_Watts malc_: Can you give me a file, and a command line to reproduce the problem please?17:21.57 
  Was that "aye" acknowledging my ping, or confirming that the problem still occurs post b3f1971 ?17:22.47 
malc_ Robin_Watts: both :)17:23.48 
  And DDI0406C_arm_architecture_reference_manual.pdf will do17:24.49 
Robin_Watts malc_: command line?17:25.07 
  I see it.17:26.19 
malc_ Robin_Watts: you "see it" as in command line no longer needed17:27.58 
Robin_Watts yeah.17:28.03 
malc_ cool17:28.07 
Robin_Watts paulgardiner, tor8: So, fz_throw is a combined logging and error interface, right?17:28.45 
  If we throw an error with a message, that message gets printed, even if we then catch the error and behave normally.17:29.09 
  That's what malc_ is seeing I believe.17:29.22 
  malc_: So the messages are NOT spurious. There are annotations with no appearance stream.17:30.55 
  but it's probably reasonable for link annotations to have no appearance.17:31.55 
malc_ it's defintely reasonable, apart from visual clutter the speed hit taken by constant longjumping is noticable17:39.31 
Robin_Watts tor8, paulgardiner: Fix on robin/master for the annotation thing.18:01.04 
  tor8: paulgardiner has stopped for the night, but I discussed the problem and my proposed solution on the phone with him.18:01.25 
  tor8, paulgardiner, mvrhel_laptop: I've just pushed the latest version of my progressive stuff to robin/progressive18:53.19 
  henrys, and anyone else who wants to keep a high level view on how it will work:;a=blob;f=docs/progressive.txt;h=c0964a45b32e0e0e270b2114f008b91e6d2961d4;hb=3bac751597d0fcc7819926a9e0fe2d845bd1cc8618:54.06 
henrys Robin_Watts: thanks that is useful18:55.29 
sebras Robin_Watts: is that the annotation bug I mentioned?19:23.53 
Robin_Watts sebras: It is.19:24.40 
sebras yey! <- the happy yey! of today! :)19:25.03 
  Robin_Watts: do you want to take a sanity look at sebras/master and ask tor8 to have a look tomorrow?19:26.05 
  Robin_Watts: it's the debian platform files.19:26.19 
Robin_Watts looking.19:28.24 
sebras Robin_Watts: btw, I like your patch better than pauls even though both effectively solve the problem. to me it seems we should be wary of adding error codes.19:28.35 
Robin_Watts sebras: Indeed.19:28.45 
sebras one question though. what is all this FIXME: TryLater?19:28.48 
Robin_Watts sebras: See the link I posted to henrys above :)19:29.02 
sebras clicks.19:29.16 
henrys sebras:did you get your nda yet?19:29.27 
sebras well. highlights and pastes (hello x11).19:29.29 
  henrys: I did get it. thanks! now it is just a matter of passing a signed version of it to miles.19:30.01 
henrys scan and email right?19:30.22 
Robin_Watts seems plausible.19:30.23 
sebras henrys: probably yes. I'll make sure tor8 gets one when he goes to the next meeting as well. somehow lawyers and accountants tend to prefer original documents over copied ones. :)19:31.30 
  Robin_Watts: thanks to the review. now I'll just pester tor8 to merge it as well.19:32.12 
  I think he's got both the debian installation prerequisites and the package building knowledge now to verify my claims.19:33.01 
  Robin_Watts: hm.. is it really correct that each page's specific objects appear before all the shared objects in a linearized file?19:34.43 
Robin_Watts sebras: Yes. Mind bogglingly stupid, but true.19:35.07 
sebras that seems a bit strange. I can see why the shared objects would preceed the pages themselves, but not why the objects for page 2 would be before page1 ...19:35.17 
  did they have a summer intern developing the linearization stuff?! :-P19:35.41 
  Robin_Watts: it's Content-Length, with a dash. and what do you mean by "verify that the linearized object is not out of date."?19:38.59 
  I can see how mupdf can determine if it has got the full file or not using the Content-Length though.19:39.18 
Robin_Watts The linearized object contains an entry with the file length in.19:40.00 
  If the length does not agree, then someone has added another xref to the end and so the file is not linearised.19:40.24 
sebras Robin_Watts: aha. I see. the document is excellent so far, but I'd suggest adding these two sentences.19:40.50 
  would it make sense to add an option to pdfclean to make linearized documents sane from a design point of view...?19:43.06 
  i.e. something that optimizes them for mupdf rather than the standard?19:43.17 
Robin_Watts sebras: If I do that, I run the risk of having other document readers behave badly.19:43.51 
  It's a crap standard, but it's the one we've got.19:43.58 
sebras Robin_Watts: I agree, but there might be reasons for a customer to do something like this anyway..19:44.26 
Robin_Watts I'll ponder it.19:45.42 
sebras this is just from the top of my head, I'm not sure whether it is useful.19:46.07 
Robin_Watts I think that chrome may use byte-range requests to pull down the bits it needs.19:46.08 
  We can do that too.19:46.15 
  I need to write that bit of the doc tomorrow :)19:46.33 
sebras whoa.. I remember when the cookie was an int...19:47.14 
Robin_Watts cookie was never an int. It may have been a structure with an int in it :)19:47.50 
Robin_Watts has to run an errand...19:47.59 
sebras yeah, probably. :)19:48.01 
  Robin_Watts: progressive.txt looks good to me. just that minor thing I mentioned about the Content-Length typo and the linearization object verification.19:49.43 
tor8 Robin_Watts: some its / it's mistakes in progressive.txt20:02.02 
  and the semiocolon (after Adobe does NOT do this) should be a colon20:03.04 
  incomplete_ok should be incomplete_okay (we (the royal we) spell it okay, not ok)20:04.30 
  the (see later) reference about fetching byte ranges is pointing to text that doesn't at the moment?20:05.56 
Robin_Watts yeah, I have to write that tomorrow20:06.13 
tor8 looks good so far (apart from the minor language corrections noted above)20:06.48 
  I'm too knackered to look at the actual code today though...20:07.14 
Robin_Watts I understand. I'm done for the night too.20:08.38 
 Forward 1 day (to 2013/07/04)>>>