IRC Logs

Log of #ghostscript at irc.freenode.net.

Search:
 <<<Back 1 day (to 2016/02/22)20160223 
mvrhel_laptop Robin_Watts: for the logs, there is a commit on my mupdf repos for you to review. It fixes some mem leaks in the pdf-device. When I was checking for leaks in my pdf_create branch I stumbled upon these in the master00:19.43 
Robin_Watts let me look.00:21.37 
  looks good to me.00:22.32 
mvrhel_laptop oh Thanks Robin_Watts did not expect you to be up01:06.45 
  I will push to golden then01:06.52 
halabund Can Ghostscript convert everything in a PDF to RGB (even if it CYMK)? Also, can it get rid of layers in the PDF (whatever those are)?10:14.02 
kens It can certainly colour convert, its in the documentation.10:14.23 
  As for layers, it depends what you mean by layers10:14.31 
halabund My new colleagues don’t know LaTeX, so I am forced to work in that abomination of MS Word and it keeps destroying some embedded PDFs. Google tells me that these two things may be the reason10:14.33 
kens But basically Ghostscript will attempt to maintain the content of a PDF file when processing it. So probably not, but like I said, it depends what you mean by layers10:15.17 
  Oh of course you could always convert it into an EPS and embed that into MS Word10:18.08 
  But depending ont the content of the PDF you may not be happy with the result10:18.30 
Robin_Watts tor8: You here?11:55.10 
tor8 Robin_Watts: yes.11:56.41 
Robin_Watts So, was pondering this shapy texty thing when running this morning.11:56.56 
tor8 okay.11:57.24 
Robin_Watts At the moment we're managing to keep stuff fairly high level in the device interface.11:57.40 
  It would seem to be a bit of a shame if we lose the ability to pass high level text through the device interface.11:58.17 
  So, how would you feel about fz_text_spans gaining text direction information?11:58.49 
tor8 you mean the bidi direction?11:59.08 
Robin_Watts yes, ish.11:59.35 
  For every piece of text, we potentially have some extra information.11:59.56 
  1) What language it's specified to be in in the source text.12:00.06 
  2) The direction given to it in the source text.12:00.16 
  3) The direction of the text (unset/l2r/r2l/number-in-r2l-context)12:01.09 
tor8 I am not entirely sold on the idea ... for PDF, XPS, etc we won't have any of this information set12:01.18 
Robin_Watts tor8: And so for PDF/XPS we can ignore it.12:01.30 
  But for html (and other sources) we can pass it through.12:01.50 
tor8 which means you'll get varying results depending on what you do on text that looks identical but comes from different sources12:01.52 
Robin_Watts tor8: Yes.12:02.07 
  but with the language information, text may not look identical, for instance.12:03.01 
  (once we get that hooked up to harfbuff)12:03.09 
  We already have wmode as an int.12:03.46 
  If we change that to be a bitfield, then we get all the other stuff included for no extra size.12:03.59 
tor8 Robin_Watts: ahem. I just zapped the wmode (and put it back into the fz_font where it belongs)...12:04.09 
  so there's space for another field to take its place12:04.15 
Robin_Watts ok.12:04.20 
tor8 on the other hand, if we make it into a bitfield, we could put the wmode and all other extra bits you want here back into it12:04.47 
Robin_Watts We can also have a field in there for 'should be shaped' or not.12:05.04 
  so PDF can leave that blank.12:05.23 
tor8 I was just being annoyed by passing around the wmode argument to all the text functions just because of XPS having IsSideways as an extra attribute not part of the font12:05.32 
Robin_Watts and we can have a routine that takes an unshaped fz_text to a shaped one.12:05.43 
tor8 but I could change my mind if we want to pass around other extra bits of information12:05.43 
  hmmm, you mean stuffing raw unpositioned text into the fz_text and then shaping that to position it? not sure that belongs in there.12:06.47 
  maybe if we add the extra bits of info that pdf text objects have, like the leading and charspace etc12:07.04 
  so we can replicate the PDF text commands that go between BT and ET with the fz_text functions12:07.29 
  text layout interfaces are complicated ... I would like to keep the fz_text simple, as a plain container for already laid out text12:08.28 
  http://git.ghostscript.com/?p=user/tor/mupdf.git;a=blob;f=source/fitz/text.c;h=de4211cc8569eb61bcd30f9df7073e7cae43a5a5;hb=a1066e62b3337e3cb4c1108070f5f4b89d8fab3b#l9912:09.17 
  I'm pretty sure vertical text layout with that function is still "broken" -- we don't offset the origins using the metrics, etc12:10.44 
  but if you could stuff harfbuzz into that function, that's all I wanted to start with12:11.11 
  annotating the text spans with language and bidi levels; we could do that to let the text extraction device be smarter12:11.38 
  or rather, let it be dumber by reading that info instead of trying to guess12:11.54 
Robin_Watts The problem is that if we have fz_text as a really dumb low-level "just put this text here" block of data, it means that text extraction etc or html-write or whatever has to work MUCH harder to extract the original information.12:13.46 
tor8 Robin_Watts: yeah, but it already needs to work that hard for PDF12:14.18 
Robin_Watts Having something that carries high level information which the low level info can be easily obtained from covers both ends.12:14.32 
tor8 still, I can see the point of having some high level information in there about bidi at least would be useful12:14.53 
  seeing as we already carry along the unicode values12:15.06 
Robin_Watts yes, it needs to work hard for PDF, but it would be nice to remove some of the guesswork for cases that we can get away with.12:15.09 
  yeah.12:15.10 
  The bidi stuff is enough that we can work backwards from the shaped stuff losslessly, I think.12:15.42 
tor8 then I'm okay with adding bidi levels; and simply make the pdf/xps guess the bidi info12:16.40 
  and then simplify the structured text bidi reversing stuff12:17.00 
  in fact, I'd be perfectly okay with starting over the structured text extraction from scratch :)12:17.19 
Robin_Watts tor8: The text extraction falls loosely into 2 parts.12:44.26 
  There is the gluing of text fragments back into spans, and then the derivation of things like columns etc from those spans.12:44.55 
  I'm broadly happy with the approach we take for the first half of that problem (certainly it's better than things we've done before, cos it copes with text at an angle etc).12:45.31 
  but it could probably be improved a bit.12:45.49 
  The second half of the problem is a horrible nightmare though. I started it with good intentions and ended up just happy to get out alive.12:46.21 
tor8 Robin_Watts: yes, the first half is probably okay... it's the second half I'm having doubts about.12:49.09 
Robin_Watts tor8: The second half is a horrible problem. I am absolutely sure that it's possible to do a better job.12:49.43 
  I'm also sure that it's a potential black hole for time.12:49.59 
  It feels like a university level research project to me.12:50.28 
  i.e. go away, and spend some time on it, and at the end of 3 years you might not have anything that works, but you should have enough stuff to write up a thesis on things you tried.12:51.07 
tor8 Robin_Watts: yes. not really something that belongs in a shipping product...12:54.06 
Robin_Watts The only saving graces of the stuff we have is that 1) it kinda works, and 2) it's optional.12:55.07 
  You text extract, then you can call the analysis or not.12:55.22 
ediee Hi13:05.38 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.13:05.38 
ediee I want to know that whether the images can be extracted from the pdf page??13:06.07 
tor8 Robin_Watts: I wonder if (with the bidi flags added) we could skip the extraction step for stuff like search and copy&paste13:06.23 
kens If by 'images' you mean bitmaps, then yes13:06.26 
ediee ok13:06.48 
tor8 just get the fz_text objects and work from there. we'd still need to add the space insertion heuristics for pdf files that don't emit spaces.13:06.56 
ediee In mupdf's reflow mode why the images is not showing??13:07.16 
  In mupdf's reflow mode why the images is not showing??13:08.39 
kens tor8 Robin_Watts that question is for you13:08.50 
Robin_Watts ediee: Dunno.13:09.04 
  Presumably this is on Android ?13:09.32 
ediee yess...13:09.39 
Robin_Watts Is it all images, or just specific ones?13:09.48 
ediee In mupdf's reflow mode if the images exists in pdf page then it will not show13:09.57 
  all the images13:10.05 
  it shows only text13:10.09 
Robin_Watts I would have expected jpegs to work. Others should be converted to PNGs and get shown too.13:10.36 
ediee but how??13:10.53 
  bcoz we wont get any images array to do that13:11.11 
kens Are you writing an app eddiee ?13:11.40 
ediee yess... planning to do so13:12.06 
  but reflow mode is in dilemma13:12.16 
kens http://www.bbc.co.uk/news/education-35631030OK are you clear on the licencing terms13:12.18 
Robin_Watts ediee: OK, so before we go any further, let's just check you understand the licensing terms.13:12.24 
kens D'oh13:12.25 
Robin_Watts MuPDF is released under 2 licences. You must use one of the licenses, or you can't distribute your app at all.13:12.53 
ediee ok... i didnt know that13:13.12 
Robin_Watts The first license is the GNU AGPL.13:13.19 
ediee wat are the 2 licenses... ??13:13.24 
Robin_Watts This is a free license. It says (basically) that you can use the code for free, but in exchange, you must be prepared to give away the source for your ENTIRE app to any end user of your app that asks for it.13:14.09 
  i.e. if fred bloggs gets your app, he gets the right to ask for the entire source code, which he can then pass on to anyone else he wants.13:15.03 
  So, most people writing commercial applications think that that's a non-starter.13:15.24 
  If you're writing a free app, then that may be fine though.13:15.48 
ediee im writing a free app only..13:16.09 
Robin_Watts And you're happy to give away the full source code too ?13:16.24 
ediee and wat abt the second license?13:17.09 
Robin_Watts (Some people write free apps that talk to their own specific services, so they are unhappy to give away the source code.)13:17.15 
  The second license is the Artifex Commercial license.13:17.29 
  This costs money, but in exchange you are freed from all the strictures of the GNU AGPL.13:17.57 
ediee ok... wat are all the features I will get in v13:18.43 
  Commercial license13:18.44 
  ok... wat are all the features I will get in Commercial license??13:18.58 
Robin_Watts ediee: Exactly the same code.13:19.03 
  Exactly the same features.13:19.09 
  Just you get to distribute it without having to abide by the terms of the GNU AGPL.13:19.32 
ediee ok13:19.42 
  can I get solved with the reflow issue?13:19.50 
  wat i described previously?13:19.57 
Robin_Watts ediee: Some commercial licenses come with support included.13:20.17 
ediee means?13:20.28 
Robin_Watts (or you can buy a separate support contract).13:20.30 
  ediee: We're generally a friendly bunch, and will (time permitting) help out where we can.13:20.51 
  Problems for commercial customers take priority of course.13:21.06 
ediee ok13:21.11 
  so can u solve my problem?13:21.18 
  for reflow mode?13:21.24 
Robin_Watts So, the way the reflow stuff works is that the page is run through the text extraction device.13:21.36 
  This gives us a set of structures at the end (lines of text on the page etc).13:21.59 
ediee ok... but text extraction has no issues... the issue is with images13:22.11 
Robin_Watts We then have some code that converts those structures back into HTML.13:22.18 
  And that's what the reflow code uses.13:22.25 
ediee if the page has images like mathematical formulaes, scientific notations, etc....13:22.41 
  they all wont be displayed in reflow mode13:22.52 
Robin_Watts If you set a flag on the text extraction device then it will keep images as part of that text extraction process too.13:22.54 
  This did all work fine before.13:23.07 
ediee set a flag?13:23.16 
  whr?13:23.20 
Robin_Watts It's possible it's been broken and we haven't noticed it.13:23.24 
ediee can u show some sample code?13:23.25 
Robin_Watts ediee: Are you using our example MuPDF app as a basis?13:23.51 
kens At ths point, sharing an example file that does not work might be helpful13:24.16 
ediee Robin : yesss13:24.45 
  kens : ok.. then can u plz share some links13:24.56 
  which i can refer13:25.01 
kens No, I'm suggesting you share a file with us13:25.12 
Robin_Watts ediee: OK, so in platform/android/jni/mupdf.c13:25.27 
ediee ok13:25.45 
  robin : can u plz elaborate?13:26.01 
Robin_Watts ediee: I'm telling you to load that into an editor.13:26.46 
  Then look for the JNI_FN(MuPDFCore_textAsHtml) function13:27.03 
  In there, you should see a call:13:27.19 
  dev = fz_new_stext_device(ctx, sheet, text);13:27.27 
  After that, try adding:13:27.32 
  fz_disable_device_hints(ctx, dev, FZ_IGNORE_IMAGE);13:27.56 
  That should tell the text extraction to stop ignoring images.13:28.20 
  Then try that out.13:28.36 
ediee ok13:29.07 
  Robin : let me try and get back to u13:29.17 
  textAsHtml is used for reflow mode??13:29.40 
Robin_Watts I believe so.13:32.19 
ediee Robin : stop ignoring images means I assume that it should include image... right?13:33.09 
Robin_Watts Yes.13:33.18 
ediee Robin : what happens if the pdf page itself is an image.. for e.g., a scan copy...13:34.58 
Robin_Watts ediee: Then reflow ain't gonna help much :)13:38.28 
ediee ok... but it will display the page... i presume13:38.45 
Robin_Watts ediee: Should do.13:38.52 
ediee ok... :)13:39.27 
  Robin : let me try this13:39.37 
  Robin : it does not shows images13:42.07 
  i have tried13:42.10 
  i think there is no img tag in JNI_FN(MuPDFCore_textAsHtml)13:42.34 
  there we write all the html 13:42.42 
Robin_Watts fz_print_stext_page_html(ctx, out, text) knows how to write img tags.13:42.58 
  OK, so presumably you are either on a windows or a linux box ?13:43.30 
ediee linux13:43.43 
Robin_Watts OK, so build "mutool" for linux.13:43.56 
ediee but I want so file... to include in my android app13:44.23 
Robin_Watts Should be as easy as doing "make build=debug" in the top level.13:44.30 
  ediee: Yes, I know what you want, this is a test.13:44.54 
ediee how to build mutool13:45.30 
  ?13:45.31 
Robin_Watts Should be as easy as doing "make build=debug" in the top level.13:45.47 
  Once you've built that, run: mutool draw -o out.html in.pdf13:46.09 
ediee ok13:46.09 
Robin_Watts and then hopefully there should be images in the out.html file.13:46.37 
ediee Robin : ok let me check 13:47.00 
  im getting fatal error while doing "make build=debug"13:48.13 
  error is : fatal error: X11/Xcursor/Xcursor.h: No such file or directory13:48.18 
Robin_Watts make build=debug HAVE_X11=no13:49.12 
HenryStiles kens: I meant to tell you sometime ago there isn't intended to be a "set" in pjl. The only way to set something is through the language. I wanted to keep that as is. Do you need that for some reason, it looked like you just added it for completeness.13:52.16 
kens I don't remember adding a SET, is this the C code ?13:52.53 
  Because as I recall it only works with DEFAULT13:53.11 
ediee Robin : I cant able to use mutool draw13:55.21 
Robin_Watts ediee: Why not?13:55.45 
ediee i dont know13:55.58 
  the draw option is not there13:56.05 
Robin_Watts ediee: What version of mupdf are you using?13:56.18 
ediee 1.813:56.24 
Robin_Watts Do you have build/debug/mudraw ?13:57.03 
  (You were running build/debug/mutool draw, right?)13:57.28 
HenryStiles kens: you added pjl_set_envvar and pjl_set_defvar, no?13:57.57 
kens Err probably13:58.09 
  And yes, I think I added set_defvar for completeness13:58.37 
  Also possibly because there was a C warning, but I'm unsure of tht now. If its a problem then you can pull it back out13:59.00 
HenryStiles kens: yeah just verifying you didn't need it for something with PDF/A13:59.46 
ediee Robin : yess it got work now13:59.57 
  im checking the output14:00.08 
  Robin : no the output is not as like as pdf page14:01.49 
Robin_Watts ediee: That's not what I asked.14:02.07 
  What I asked was "are there images in the output" ?14:02.13 
ediee yess.. there is images in the output14:02.33 
Robin_Watts ediee: Right.14:02.47 
kens HenryStiles : If I need it I'd be calling it, so removing them will stop it cvompiling :-)14:03.01 
ediee Robin : but the page is not is the format what the original pdf has14:03.44 
  ?14:03.46 
Robin_Watts ediee: So, if you've done the alteration to mupdf.c as I described above, and rebuilt correctly, then there will be images in the page that is sent to the webview for reflow.14:04.13 
  The layout not being correct is an entirely different question :)14:04.32 
ediee ok... now wat abt the layout??14:04.59 
  its getting different 14:05.03 
Robin_Watts ediee: Well, I can't comment on that without seeing an example file.14:05.14 
  And even then, this is likely to be something that will require me to invest some time into looking at it.14:05.38 
ediee Robin : ok14:07.20 
  I will try 14:07.26 
  and let u knw14:07.30 
  thanks you for ur support14:08.51 
  will u be available tomorrow?14:08.59 
Robin_Watts ediee: I will be here tomorrow, yes.14:11.12 
ediee ok let me try today.. I will chat with u tomorrow abt today's progress14:11.45 
inarus Hi15:06.44 
ghostbot Welcome to #ghostscript, the channel for Ghostscript and MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.15:06.44 
inarus I need concatenate pdf. I am working for a company. Can I use the publicly available soft or do I need the commercial one?15:10.27 
kens Which software are you referring to ?15:10.50 
  In either case (MuPDF, Ghostscript) the software is provided under the terms of the GNU AGPL, provided you abide by the terms of the licence you can use it. Otherwise you need a commercial licence.15:11.40 
  Please note that if you are referring to Ghostscript it does *NOT* concatenate PDF files.15:12.00 
inarus Ok. My bad, I read some Web pages explaining how to concatenate pdf files with ghostscript. I must have misunderstood15:14.12 
kens Many people think that Ghostscript concatenates PDF files, it does not. It interprets the input and can create a *new* PDF file which is visually the same as the input(s). However, the actual contents of the PDF files are not reflected in the output, so it is not concatenating the files.15:15.24 
inarus Do you mean that content and/or formating could be missed?15:19.37 
kens The *visual* appearance should be the same. Metadata may not be carried over and the internal representaton will not be the same15:20.06 
Robin_Watts inarus: Stuff like Outlines or Annotations etc15:20.23 
kens No Outlines and Annotatoins are preserved15:20.33 
Robin_Watts kens: Stuff *like* Outlines and Annotations :)15:21.00 
kens But The Creator won't be nor will some other elements, and the fotns may be differently described, the character codes could be differnt, images may be compressed differntly etc15:21.07 
inarus Ok I get it. That might be a major issue for me, thank you15:22.21 
kens NP15:22.27 
rayjj inarus: the logs may not have caught up, but for most purposes, gs can combine PDF's into a single PDF. Links that specify a page number may be a problem (kens can address that)15:27.12 
  kens: does the pdfwrite adjust the page number destination in links for PDF's after the first input ?15:28.00 
kens Up to a point yes15:28.37 
tor8 Robin_Watts: a bunch of commits on tor/master for review. sebras' stuff is LGTM but a second pair of eyes wouldn't hurt.21:34.40 
marcosw HenryStiles: ping23:06.44 
HenryStiles marcosw: hi23:38.15 
 Forward 1 day (to 2016/02/24)>>> 
ghostscript.com
Search: