IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2015/09/09)20150910 
srini Mudraw html does not recognize decorations like underlines...?06:43.11 
kens chrisl, you having any network problems ths morning ?06:54.17 
  I'm wondering if my ISP's DNS is shafted06:54.32 
chrisl kens: I haven't so far, but then I just sat down06:56.10 
  Nope, everything seems fine06:56.56 
kens2 Weird06:57.02 
  Shifting to the Google DNS server helps, in that I can now resolve the IP addresswes, but I can't ping a boat load of sites06:57.34 
  Eghttp://www.bbc.co.uk is OK< buthttp://www.dilbert.com isn't06:58.08 
  It does seem to be progressively improving, I suspect my ISP has a problem06:58.44 
chrisl dilbert.com works fine for me - actually, everything seems quite quick this morning06:59.07 
kens2 It is fine for me too, now.....06:59.28 
  Oh it seems dilbert just doesn't respond to pings07:00.06 
  I am having to use the Google DNS though07:00.26 
chrisl Well, lots of sites ignore pings these days07:00.50 
kens2 yeah, but I know it used to be OK, which is why I'm using it, they must have changed07:01.10 
chrisl Yeh, it's not responding to ping for me either.07:02.18 
kens2 Not to worry, now I switched DNS everything is working. I've had to do that before with my ISPs DNS07:02.46 
srini What is the best time to catch up with mupdf developers?07:11.23 
kens2 an hour or two from now07:11.33 
srini kens2, Thank you!07:11.47 
tor8 Robin_Watts: so you're finally doing that forwarding device? I remember running into some trouble doing that before.08:02.58 
  the way paths and a few other things require back-and-forth communication08:03.20 
  s/paths/patterns/08:03.26 
  but I guess enough time might have passed for those issues to have been fixed in the passing08:04.55 
Robin_Watts tor8: they do?08:34.07 
tor8 to be fair I was trying a 'tee' device, to feed two devices from the same source08:34.34 
  that didn't fare well with the pattern device call having a return code and that the interpreter can change behaviour based on08:35.01 
Robin_Watts tor8: Ah, yes.08:35.10 
tor8 a straight up forwarding/decorating device shouldn't have the same troubles08:35.33 
  when I get some time over I would like to expose the device and interpreter interfaces to javascript, so we could script this kind of stuff without having to recompile08:36.17 
  a "mutool run script.js"08:36.35 
Robin_Watts tor8: Yeah, I have some prototype code somewhere that exposes the device interface into android java.08:43.51 
tor8 Robin_Watts: is there a quick way to get the bounding box for a display list?08:44.15 
  I'm thinking of adding two fz_rects to the display list struct08:44.33 
  one for the mediabox passed to the fz_begin_page call and another for the actual inked area08:44.43 
  as would be calculated by the bounding box device08:44.59 
Robin_Watts What if you have more than one beginpage?08:45.03 
tor8 you shouldn't, ever08:45.12 
Robin_Watts Suppose you open a display device, and then fz_run_contents, then fz_run_annotations.08:45.41 
tor8 but yes, that is a fair question. I guess we could take the union.08:45.43 
  fz_run_annotations wouldn't call beginpage08:45.51 
Robin_Watts So fz_end_page is not the last thing in a display list?08:46.07 
tor8 only for actual pages08:46.41 
Robin_Watts Or suppose you want to 'merge' 2 pages, by running 2 pages into the same device with different CTMs ?08:46.55 
tor8 fz_run_page wraps the page contents and the annots with begin/end page calls08:47.06 
  that's a better scenario08:47.30 
Robin_Watts tor8: Right, so fz_run_page gives different results from running the contents and annots separately.08:47.34 
tor8 and in that case, taking the union is the right thing to do, but I haven't really considered how to handle that case on the backend side with the pdfwrite device etc08:48.12 
Robin_Watts tor8: I agree that taking the union would be the right thing to do.08:48.38 
  I'm disquieted by the difference in behaviour between fz_run_page and fz_run_contents/annots.08:49.06 
  Holding a couple of top level rects doesn't offend me massively.08:49.38 
tor8 if you're calling fz_run_page_contents, I expect you to call fz_begin_page manually as part of the contract for doing things the low level way08:49.41 
Robin_Watts I worry that computing the second one might slow down display list collection.08:50.00 
tor8 I'm looking through the code and wondering what the different rects that the display list collection calculates are08:50.30 
  if it might be easy to just stash them away at an opportune spot into a high level rect08:50.58 
Robin_Watts pdfapp.c does not call fz_begin_page, end_page then.08:51.35 
tor8 Robin_Watts: on tor/master is the main reason for this ... a few high level functions to wrap the most common ways of creating and using devices08:51.46 
  Robin_Watts: and in adding a few more to that I ran into the issue with the bounding box missing from the display list.08:52.33 
  I want to call fz_new_display_list_from_page() and then fz_new_pixmap_from_display_list()08:52.53 
  without having to query the page mediabox manually, so the display list needs a way to bound itself for the latter call08:53.14 
Robin_Watts Nor does platform/android/jni/mupdf.c08:53.18 
tor8 preferably using the mediabox08:53.19 
  those bits of code predate the addition of the begin_page/end_page calls08:53.39 
  it might be that we should call begin_page/end_page for the run_page_contents call08:54.02 
Robin_Watts right, but it's an area that should be cleaned up. Not blaming anyone.08:54.03 
tor8 Robin_Watts: agreed.08:54.07 
Robin_Watts I agree that we'd like a way to get the media box out of the displaylist.08:55.07 
  A top level rect seems reasonable.08:55.26 
  which is the union of all the begin_pages in the list.08:55.40 
tor8 but if we're going to use the beginpage/endpage calls in high level output devices then getting the endpage before the annotations is going to be awkward08:55.49 
  yes, that sounds ideal08:55.59 
Robin_Watts It's the other calculation (the inked area) which may be trickier to achieve without doing excess work.08:56.20 
tor8 but also means that if I make separate display lists for annotations, which bounding box do I use for those?08:56.34 
Robin_Watts I'd be tempted to just use fz_bound_page for that one, and let it work the usual way.08:56.40 
tor8 could be that each annotation has its own bounding box that I could feed into the begin_page calls when creating the display list for them08:57.07 
Robin_Watts Could we add a flag to begin_page where we can say if it's a page or an annotation?08:57.30 
  Actually... do we ever want annotations to change the media size of the page ?08:57.54 
tor8 we don't08:58.16 
  so going to be a bit of a hack08:58.42 
Robin_Watts So, there are 3 bboxes we need.08:58.52 
  1) the mediabox (the union of the page boxes)08:59.03 
tor8 I need the page mediabox and the annotation for rendering annotations to temporary bitmaps08:59.32 
Robin_Watts 2) the inked region (the union of all the marking operations for a page intersected with 1))08:59.40 
  3) the bbox that completely covers both the page and the annotations.08:59.59 
tor8 I'm planning to do several separate bitmaps and then compose the final screen from those rather than re-rendering the page + annotations each update from display lists09:00.06 
Robin_Watts tor8: Right.09:00.32 
tor8 I don't think we need 3)09:00.37 
Robin_Watts Well, we might be able to live without 3) if we can fz_bound_annot.09:00.47 
tor8 ah, we do have a fz_bound_annot function09:01.03 
Robin_Watts Maybe hold separate displaylists for each annotation?09:01.13 
tor8 and I can live with just 1) for now09:01.16 
  I was going to hold separate display lists for each annotation and the page09:01.25 
  so I should just need to make sure to begin/endpage each display list with its appropriate bbox09:01.39 
Robin_Watts So each annot displaylist will have no 'mediabox', as it won't have any begin_page/end_page.09:01.49 
tor8 being the mediabox or annotation box09:01.50 
  or as you said, add a flag to begin_page09:02.18 
Robin_Watts Either annotations have begin/end_page, or they don't.09:02.31 
  If they do have it, then we can't get a proper mediabox from a displaylist generated by run_page.09:02.57 
tor8 how about making sure the devices can cope with nested begin/endpage calls?09:03.04 
Robin_Watts Not sure that helps.09:03.25 
tor8 take the union of the top level ones and intersect child beginpage boxes09:04.00 
  we could pass a 'mediabox' to fz_new_display_list...09:04.49 
Robin_Watts I still can't see how that nesting scheme solves this situation, and it would require changes to the bbox device to properly clip, or it'd be nonsensical.09:05.33 
tor8 but I rather like the begin_page / end_page calls as they are09:05.54 
Robin_Watts tor8: But we don't necessarily know the mediabox at display list creation time.09:06.02 
tor8 yeah, ignore the nesting thing, that's just adding bad complexity09:06.12 
  actually, I think we do09:06.29 
Robin_Watts I am tempted to say leave begin_page as it is (except properly insert it on run_page_contents).09:06.51 
  We can keep the media box generated by that.09:07.01 
tor8 but that was one of the reasons for adding the begin_page/end_page calls09:07.02 
Robin_Watts We should either add a begin/end_annotation call (or add a flag to begin_page).09:08.17 
  annotations don't include a mediabox.09:08.27 
  Actually, let me start again.09:09.23 
  Add a flag to begin_page to say if it's a page or an annotation.09:09.39 
  We keep 3 rects in the displaylist.09:09.48 
  1) The mediabox of the pages.09:09.52 
  2) the mediabox of the annotations09:09.58 
  3) the union of the inked regions of the page09:10.43 
  That gives us everything we could possibly need I think.09:11.06 
  and every call is clearly defined.09:11.19 
tor8 Robin_Watts: okay. how do you envision the call nesting for the begin_page(is_annot) calls?09:12.55 
Robin_Watts tor8: every fz_run_annot starts with a begin annot and ends with an end_annot.09:19.03 
  every fz_run_contents starts with a begin_page and ends with an end_page09:19.25 
  So it's simple to see from the display list where each thing came from.09:19.38 
tor8 so basically: begin_page, page contents, end_page, begin_annot, ..., end_annot, begin_annot ... end_annot09:20.13 
  I was otherwise thinknig begin_page ... begin_annot ... end_annot begin_annot ... end_annot end_page09:20.52 
  Robin_Watts: on tor/master, a basic first step that just records page bounding boxes and doesn't attempt to cope with per-annotation separate display lists09:23.32 
Robin_Watts tor8: I prefer the former to the latter.09:24.27 
tor8 hm, maybe we shouldn't be passing the ctm to fz_bound_display_list09:24.48 
  Robin_Watts: actually, now I remember another reason why I added the calls09:25.04 
Robin_Watts the latter makes it impossible to do fz_run_page_contents then fz_run_annot.09:25.06 
tor8 so that we can use them in the pdfwrite device so you can get natural page delineations without having to recreate the device for each page09:25.41 
  which makes a lie of my earlier statement that I only expect to see one begin_page call09:26.04 
  and means that we really ought not to call begin_page automatically in run_page_contents, since that would foul up any n-up style devices09:26.37 
Robin_Watts tor8: ok... just mulling that.09:29.25 
  tor8: No... I think I disagree.09:30.14 
  when we call fz_run_page, that must call fz_begin_page, ...contents ..., fz_end_page.09:30.33 
  If I run that to a display list and then run the display list out again, I want to get the exact same sequence of events.09:31.03 
tor8 fz_run_page draws the whole page and wraps it with begin_page/end_page and that's fine09:31.18 
Robin_Watts If we have an n-up device, then that should swallow the begin_page/end_pages, and regenerate a new one.09:31.39 
tor8 but when calling run_page_contents, I think we should make the begin_page/end_page wrapping the clients responsibility09:31.41 
Robin_Watts ok, I can't immediately see an objection to that.09:32.17 
tor8 run_page_contents is for the guts of one page without the begin/end_page package09:32.24 
  and n-up can be done in two ways here I think09:32.51 
  1) at the fz_document level, by calling run_page_contents with various matrices09:33.07 
  2) at the fz_device level by eating begin_page/end_page calls and decorating and forwarding calls09:33.39 
kens ah drat, I just created a branch on master I didn't mean to09:44.39 
  tor8 or Robin_Watts would one of you mind killing the branch dynamic_PJL_allocs please ?09:45.34 
Robin_Watts kens: on golden?09:47.25 
kens yes pleae09:47.33 
Robin_Watts git push golden :dynamic_PJL_allocs09:47.37 
srini I need help in mupdf text extraction to html09:47.45 
kens Having screwed it up I don't thnk I trust myself to undo it09:47.45 
srini is there a way to get the text-decorations? Underline StrikeThrough etc.... and also is there any patch on h1 h2 h3 detection?09:48.24 
Robin_Watts done.09:48.25 
kens thanks Robin_Watts09:48.30 
srini font-color also09:48.41 
Robin_Watts srini: The text decorations aren't defined in PDF.09:48.44 
srini Robin_Watts, so we have not way to detect underlined text?09:49.11 
  *no way09:49.32 
kens You could detect the presence of a thin line parallel to, and in close proximity to, a glyph09:50.10 
  Which 'might' be an underlined piece of text, or it might be the top of the frame for a figure right underneath it09:50.39 
  So you would have to further test that the path extended from the beginning of the text to the end of the text, and no further09:51.14 
srini How about h1 h2 h3 detections09:52.10 
kens No idea what you are talking about there09:52.22 
Robin_Watts srini: In the PDF file, all we get is 'put this glyph in this font, at this position'.09:52.30 
  And things like underlines/strikeouts are added on later using line art.09:52.45 
  There is no concept of 'structure'.09:52.55 
  Hence we have no h1/h2/h3 information.09:53.03 
srini Hmmm ... got it... 09:53.09 
Robin_Watts We don't even have paragraph information.09:53.11 
  Now, when we do text extraction we do some funky heuristics that try to reassembly chars to words, words to lines, lines to paragraphs, etc.09:53.40 
  but those are all prone to failure.09:53.47 
kens chrisl, looks like I was suffering from ths problem ths morning:09:53.51 
  http://www.theregister.co.uk/2015/09/10/major_plusnet_outage_dns_goes_titsup/09:53.53 
srini I actually tried a way sometime.... find most used font and size, considering this a base font size, continue searching fonts greater than base font size and mark them into h1 h2 h3 etc 09:54.15 
  but did not actually work well that way09:54.27 
Robin_Watts srini: That's exactly the kind of think you'd need to do.09:54.44 
  your best bet is to operate on the structures you get back from the fz_stext_device.09:55.12 
kens All these things are, in the absence of dtaa, heuristic. If your heuristic doens't work well, then you need to refine it09:55.13 
Robin_Watts s/think/thing/09:55.28 
srini Any official/unofficial patch available for text decorations?09:56.49 
  And, how about font-color?09:57.17 
Robin_Watts nope.09:57.20 
kens I thought the current color was in the XML output (I coudl be wrong)09:57.41 
Robin_Watts kens: Not in the text extraction output, I thought.09:58.04 
kens Ah....09:58.11 
  Its been a while (ie more than 10 minutes) and therefore I cannot recall for certain09:58.34 
Robin_Watts oh, wait, you may be right.09:58.50 
  the color might be in the style.09:58.56 
srini Robin_Watts, kens - many things are not available in the text extraction... they are there in the library09:59.01 
Robin_Watts no, the text color is passed to fz_lookup_text_style, but is not actually used.09:59.50 
  You could extend that fairly simply if you wanted.09:59.58 
kens thnks PlusNet is havig a *very* bad day, I'm glad I'm not on their Hell Desk10:01.26 
srini I am not a seasoned hand on coding... decoration especially underline... can someone help me add code?10:03.33 
Robin_Watts srini: Forget about underline. That ain't gonna happen easily.10:04.22 
kens likewise strikethrough I'd have thought10:04.35 
Robin_Watts same deal, yes.10:04.41 
  color should be a simple addition for anyone that knows C.10:04.53 
kens that one might be easier to detect, but its till a project10:04.54 
Robin_Watts kens: strikethough and underline are both basically the same problem (dot products with the baseline and examine the offset)10:05.47 
kens http://www.theregister.co.uk/2015/09/10/major_plusnet_outage_dns_goes_titsup/Actually, I wonder if the underline and strikethrough are done as part of the font definition, in which case they are inherently undetectable in PDF anyway10:05.54 
  Robin_Watts : that assumes that teh work isn't done by definiing a type 3 font which first draws then underlying font glyph, then draws a line10:06.25 
Robin_Watts kens: true.10:06.38 
kens I'd be fairly certain that's how QAuark Xpress would do it, but I doubt if MS Office does (for insgtance), but I muight be wrong10:06.55 
  There's a bunch of werid stuff in the MS PS output that I've never needed to look at10:07.23 
srini mostly I come across text merging issues.... they are generally because of very small space <less than general tab space> at the beginning of the paragraphs...10:25.46 
  can we have a method to alter them from the command line...10:26.19 
hj__ is there a difference in handling file name between 9.16 and 9.07. we have a input file name using chinese characters and ghostscript 9.16 gives an open error13:27.14 
kens You probably need to use the latest code. THere may be a work-around, I don;t recall13:27.55 
hj__ in our debug it looks that "os_status" is called with the UTF8 name passed to stat. are there no specific os_status/os_rename etc for windows doing the wide char version13:31.07 
  when converting the utf8 name in os_status to wchar and calling _wstat instead of stat it seems to work13:34.18 
  a quick look in the code...the os_rename/os_status/os_delete should have a windows specific implementation13:35.14 
chrisl hj__: as kens said, that's been fixed, a few months ago13:40.35 
hj__ thanks.13:41.37 
chrisl hj__: here's the commit: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=1572afcd13:43.00 
hj__ chirshl thanks for the stat commit. do you know if the rename/remove also have got a similar change?13:46.52 
  i just looked in the git source. the os_delete and os_rename just call the default c functions. this might need some changes for windows wide char13:51.54 
kens If you think you;'ve found a bug, feel free to open a bug report. Don;t forget to include a sample file and instrcutions for reproducing the bug.13:53.57 
hj__ oke13:54.20 
henrys kens: begrundginly I'm good with it, it was intended never to fail, only doing allocations at startup; the job manager should alway be able to run on a pcl printer, but I realize that's too inflexible.14:02.31 
kens henrys I could go with that if it wasn't for the fact that we don't action the PJL immediately but save it all up. TO do that, I need a bigger (potentially much bigger) allocation14:04.50 
  By the way, it won't fail if it can't allocate a new string or tbale, it just keeps on using the old one14:05.36 
  If it only runs at startup, a failure to allocate one of these small strings woud be only one step ahead of the whole interpreter faolding up anyway I'd have thought14:06.20 
  Anyway, I'll move ahead to the distillerparams14:07.16 
henrys kens: sounds good.14:08.21 
kens Well, ths will be the more interesting part :)14:08.48 
henrys kens: where are the device parameter setting going to take place exactly as I'm rewriting much of plmain and friends.14:09.00 
  ?14:09.09 
kens It'll be simple strings14:09.22 
  I have yet to decide on delimiters for arrays and dicts, probably I'll just stick with the PS ones14:09.57 
  henrys, what do you see with C705.bin and -dFirstPage=2 -dLastPage=4 ?14:10.28 
  D'oh I clearly cannot read14:10.42 
  I will plan to process the dsitillerparams at the same time as all the other devcie stuff14:11.05 
  SO all the params will get assembled into a big set ofstorage, then we'll call some magic procedure which will turn it all into param lists and call putdeviceparams14:11.37 
  IIRC this is pretty much what happens for the existing settings ?14:12.02 
henrys kens: I haven't looked at C705.bin but it can't work, it installs XL's device in pcl's state the page count is gone.14:14.56 
kens THe page count is held by the First/Last page device14:15.25 
  Its independent of the interpreter14:15.31 
  Err, I think.....14:15.50 
  It does work on WIndows14:16.02 
henrys XL and PCL have their own device and their own subclass.14:16.28 
kens Well each should operate independently, assuming you pass hte device params to both14:17.09 
  Like I said, I did try ths on Windows and it seemd to do what I expected14:17.36 
henrys oh maybe I should look again.14:18.52 
kens It *ddn't* work on Linux, but it didn't work wihtout -dFirstPage either14:19.12 
  I have no clue why14:19.28 
henrys I will look at it, it was "a note to self" type bug. It can wait.14:21.52 
kens No problem, I'll leave it with you14:22.30 
mvrhel_laptop tor8: is there any particular reason that fz_keep_page does not have a prototype in document.h?16:48.14 
  I was maybe going to make use of it in my page cache16:49.04 
  or Robin_Watts if tor8 is not around...16:50.11 
Robin_Watts mvrhel_laptop: Sounds like an omission to me.16:50.37 
mvrhel_laptop ok16:50.52 
  I will add it16:51.47 
  Robin_Watts: the tiny minor commit is on my mupdf repos if you can take a 2 sec look17:00.07 
Robin_Watts looking.17:00.30 
  Can't argue with that.17:01.01 
mvrhel_laptop I hope not :)17:01.10 
  what day do you fly in to chicago?17:01.24 
  Robin_Watts: are you guys coming early for the show?17:02.22 
Robin_Watts mvrhel_laptop: No, flying on wednesday.17:03.14 
  We have to stay til sunday night, so that's quite long enough away from home (or so I'm told).17:03.35 
  Who will deal with the spiders in my absence? :)17:03.44 
mvrhel_laptop Robin_Watts: :)17:03.50 
Robin_Watts mvrhel_laptop: I have tea.17:04.05 
mvrhel_laptop Robin_Watts: I may try to meet up with you all for dinner on Wed. what time do you arrive?17:04.05 
  oh great. do you want anything like dark chocolate malt balls, or dark chocolate pb cups?17:04.28 
Robin_Watts We land at 12:3017:04.30 
mvrhel_laptop oh early17:04.33 
Robin_Watts mvrhel_laptop: Definitely not!17:04.39 
mvrhel_laptop I need to bring something17:04.45 
Robin_Watts Still working off the holiday fat :(17:04.52 
  Helen loved the last ones but said "Tell Michael never to bring these again!"17:05.09 
mvrhel_laptop they are my favorite17:05.26 
rayjj tor8: Scott is on the phone and a potential customer wants to know if we support EPUB 4 OS17:11.08 
Robin_Watts wtf is EPUB 4 OS ?17:11.23 
mvrhel_laptop it the one after 317:11.53 
Robin_Watts MuPDF supports epub v1.17:12.06 
  v2 is 'fixed format' rather than 'reflowable' so moving towards pdf. We don't support that properly.17:12.32 
rayjj that's what I didn't know either. As far as I knew their is a v 3 (that we can't handle)17:12.58 
Robin_Watts hmm. Maybe we support v2, but not v3.17:13.54 
  Certainly I've never heard of a v4.17:14.00 
  Googling "EPUB 4 OS" is unhelpful.17:14.13 
  I suspect the customer is parroting something requested by their engineering team, and filtered through scott it's arriving corrupted.17:14.53 
  No idea where the corruption might be occurring.17:15.02 
rayjj Robin_Watts: it's a .jp company, so that doesn't help17:15.51 
Robin_Watts Get scott to ask them to forward him some links about what they want supported.17:16.21 
  and we can look at those.17:16.30 
rayjj It might be a question about which OS's 17:19.16 
henrys I hope our imiplementation is compatible with epub 2, that's what we are saying to the world17:19.50 
Robin_Watts and a link about the OS they want supported would make that clear.17:19.53 
rayjj Scott is asking them. I had him list that we support Windows, unix/linux, Mac OS/x, iOS and Android in case that answers their question17:20.31 
Robin_Watts henrys: Right. I think I was wrong earlier. We have a v2 implementation, but we don't support the new stuff in v3.17:20.32 
henrys yeah what we've said is 2 compatible and we'd add 3 features as time permits if they are sane17:21.25 
rayjj Robin_Watts: that's what I recalled as well. But I also hadn't heard about v4. V3 was just released 2014, so it's a little soon for a v4 :-)17:21.25 
  henrys: Robin_Watts: thanks. That's what I told Scott. v2, not v3, but if they need v3 we'd have to discuss it (and get sample files)17:22.29 
henrys sounds fine rayjj 17:22.46 
nvs Hi! Are there any mupdf guys here?17:33.02 
mvrhel_laptop bbiaw17:46.32 
 Forward 1 day (to 2015/09/11)>>> 
ghostscript.com
Search: