| <<<Back 1 day (to 2017/01/31) | 20170201 |
Heiko | Hello! | 13:05.48 |
| I'm trying to open a PDF document in MuPDF | 13:06.38 |
| in this document are some strange characters shown instead of the correct text. | 13:07.14 |
| the pdf was produced by Acrobat from a Word document | 13:07.28 |
| I let run mutool draw -Ftrace on the document and saw, that the replacement character 0xFFFD is shown instead of the correct char. | 13:08.46 |
| <span font="Verdana,Bold" wmode="0" trm="0 9 -9 0"> <g unicode="U+fffd" glyph="V" x="177.66" y="72.4585" /></span> | 13:09.25 |
| Is there any chance to get this working? | 13:09.33 |
chrisl | Heiko: It probably doesn't have the fonts (or CIDFonts) embedded | 13:09.39 |
| The PDF that is | 13:10.00 |
Heiko | the document is using the Verdana font and the exact same text is shown correct on page 2 (with the same Verdana,Bold font) | 13:11.10 |
chrisl | Heiko: Well, head over to https://bugs.ghostscript.com/ open a bug, and attach the PDF | 13:12.02 |
Heiko | chrisl: thanks I'll do so! is there anything I can/should try first? | 13:13.35 |
chrisl | Heiko: It's almost impossible to tell without being to poke at the innards of the PDF | 13:14.05 |
| Heiko: actually, what you could do is build the latest code from git, and see if the problem is still there | 13:14.59 |
Heiko | ok, I already tried the latest git core on different systems (Linux, macOS), so I'll open a bug report. | 13:16.10 |
chrisl | Great. Not everyone does try the latest release, never mind the fully up to date git code | 13:16.44 |
mulover | hey guys. imho strange issue here: mupdf-1.10a in use on two machines, self compiled under "a": debian linux and "b" gentoo linux. Under low serverload a "mutool draw -w 900 -h 900 -o testout/%d.png calendaria.pdf" performs consistent at about 1800ms at debian linux. But in at about 9/10 tries with the same call and PDF under gentoo linux it performs at about 4500ms, also quite constant values! At about one try out of the 10 tho | 13:18.31 |
| similar to the debian system. | 13:18.40 |
| the PDF has 14 pages. The effect is also there for smaller PDF, but one can see it better with 14 pages. | 13:19.33 |
| I can provides stacktraces / detailied timings? | 13:20.24 |
| The effect is also there with older versions of mupdf, just with slighlty different timings, but the same, huge difference. In Rare cases, i get two fast runs in a row on gentoo linux too. | 13:21.30 |
| One try was to use the gentoo package instead of the self compiled version, which made no difference: most of the time way to slow, only sometimes fast. Again: without siginificatn serverload! | 13:23.12 |
chrisl | mulover: given that the mupdf doesn't change, and the mupdf makefiles don't change, it's probably going to be down to compiler, compiler version, compiler configuration, compiler options (from the environment), glibc version...... hard to see how we'd know | 13:29.49 |
| s/given that the mupdf doesn't change/given that the mupdf code doesn't change | 13:30.09 |
mulover | it was built from the same source package, no changes to the makefile were made. What can i do to provide you with the needed information? | 13:31.26 |
chrisl | Well, as I say, it comes down what the distros themselves are doing, so I don't think there's a lot we could do - I don't see how this is a mupdf problem | 13:33.59 |
mulover | I was afraid to hear that, but hoped for some hints. | 13:35.27 |
chrisl | Well, checking the compiler being used would be a start | 13:35.43 |
| If one system is using gcc and the other using clang, that would be a hint | 13:36.03 |
Heiko | mulover: you could also check the versions of the used libraries in the ebuild file on gentoo and compare the installed versions with your debian system. | 13:37.44 |
mulover | afaik its gcc on both, but i am not sure. actually i am a sysadmin noob. | 13:38.09 |
Heiko | if you open the pdf in X, you could also run "eselect opengl list" and look, if the correct openGL engine is selected | 13:38.31 |
mulover | there is no x on both systems. some command line to try is not available? | 13:39.27 |
Heiko | mulover: you could also try a another gcc-version (on my gentoo box, I have gcc-5.4.0 installed) and for my feeling, mupdf runs fast | 13:39.55 |
chrisl | I was assuming this wasn't using anything other than basic "make" for the build - ebuild and similar is just going to confuse things | 13:40.12 |
Heiko | mulover: what you can try is, to install eg. gcc-5.4.0, select it with gcc-config and then recompile all dependend packages and mupdf at last (or, if your box is fast and you have time, remerge the entire system, so everything is compiled using the same gcc/glibc/<whatever> versions | 13:42.45 |
deekej | hello guys! :) question... What exactly is the difference in between 'put' and '.forceput' in this commit: | 13:50.54 |
| http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=99e331527d541a8f01ad5455c4eb2aabd67281a6;hp=0d4644c003067fc14ca1db9c600dce420c06e6b1 | 13:50.55 |
| I trying to understand the reasoning behind the change (except it fixes the bug), I would like to know *how* it actually fixes it | 13:51.43 |
| or does anyone still remember the bug number associated with it, so I could read through it? :) | 13:52.22 |
kens | No I have no useful memory | 13:52.37 |
| But bluntly, .forceput is a Ghostscript internal, don't use it | 13:52.56 |
chrisl | deekej: .forceput allows our *internal* postscript to write to a write-protected dictionary. The definition is only available during initialisation. | 13:54.28 |
kens | Its probably bug #696817 | 13:55.51 |
chrisl | deekej: from the context, I'm going to assume that, due to the slightly odd way that GSView 5 drives Ghostscript, it causes the .locksafe procedure to be called *after* the early part of initialisation, after systemdict is made read-only | 13:56.49 |
mulover | I just copied the whole directory mupdf-1.10a-source containing the compiled binaries from the debian to the gentoo system. It even works, but the effect ist still there :-( | 14:05.02 |
kens | Seems to me that pretty much points the finger at the gentoo system then | 14:05.26 |
| Whether it be hardware or software | 14:05.41 |
mulover | will be software then, i can see the effect on two different gentoo servers. | 14:06.15 |
Robin_Watts | mulover: Try outputting to /dev/null and see if the issue goes away. | 14:06.38 |
chrisl | mulover: You are building as we ship it? Not using share libraries instead of the included "thirdparty" ones | 14:07.17 |
deekej | chrisl, kens: the reason why I'm asking is that after backporting the recent patches for CVEs, the older versions of ghostscript suddenly stopped displaying *.ps content via evince (https://bugzilla.redhat.com/show_bug.cgi?id=1410260) | 14:08.51 |
| turned out that commit I mentioned above fixes the issue, but even though it's just a one-liner, I would to be sure about the possible side-effects of using '.forceput' instead of 'put' | 14:09.51 |
Heiko | chrisl: ok, I just recreated the PDF with a newer Acrobat Version and it worked now | 14:10.06 |
kens | <sigh> There are no side effects to speak of, other than the fact that it seems evince is using the same hackiness as GSView. | 14:10.35 |
chrisl | FFS, 8.70 is seven years old! No one should be using it | 14:10.37 |
Heiko | chrisl: thanks for your help! | 14:10.45 |
chrisl | Heiko: NP - you did the important part! | 14:11.07 |
kens | If you try to 'put' into a read-only dictionary then you get an invalidaccess | 14:11.09 |
| .forceput forces the operation to complete without error | 14:11.25 |
| It breaks the language hideously, which is why its only available during initialisation | 14:11.41 |
deekej | ok, thank you guys for the clarification | 14:13.09 |
kens | People should really update to something a little less anicent though | 14:13.32 |
deekej | I would like to switch to newer ghostscript as well, but I can't force that on our customers :-/ | 14:13.37 |
| I'm not the one making the decision here | 14:13.52 |
chrisl | Stop back porting fixes. They'll soon change their minds | 14:14.08 |
Robin_Watts | chrisl: too right. | 14:14.24 |
deekej | :D :D well, I guess if I would stop doing that, I would soon lose my job :D | 14:15.10 |
chrisl | deekej: there's no sane argument for continuing to use an old, unsupported version of a package | 14:15.53 |
mulover | Writing to /dev/null makes no difference. In total maybe a bit faster. | 14:15.54 |
deekej | but I'm working on that we have the latest ghostscript at least for RHEL8, when who knows that will come out | 14:16.03 |
mulover | I compiled from http://mupdf.com/downloads/mupdf-1.10a-source.tar.gz | 14:16.10 |
deekej | chrisl: well, if it would be just for me, I would vote for rebase in RHEL-6 and RHEL-7. But I'm not the one making the decision here :-/ | 14:17.37 |
chrisl | deekej: I realise that, and I'm ranting at you, just ranting at the idiocy that makes people think pulling random, untested changes into an old release is somehow "safer" than using a newer, properly tested release | 14:18.58 |
mulover | it makes me crazy, that even the page timings are consistent. On a fast run each page is consistantly fast, on a slow run consistantly slow resulting in consistant totals slow or fast. It feels like some random choice: fast / slow ;-) | 14:22.18 |
Robin_Watts | mulover: I'd guess it's either an input or an output thing. | 14:23.12 |
deekej | chrisl: yeah, I see your point. Usually, the argument of our management and our customers is often about that they "need" stability, something that does not change so often. But in case of ghostscript, it does not make so much sense, because it's not a critical running user-space application. However, I know some people use it in their scripts for production. And AFAIK, there are even still people running on RHEL-5. I hope to get | 14:25.04 |
| stronger "voice" in this matter in the future, so we could at least do rebases until it is decided that RHEL needs to start to stabilize itself. | 14:25.04 |
mulover | So i need to try to "strace" or something like that and talk to the sysadmin? | 14:25.55 |
chrisl | mulover: So, one thing you could check is the filesystems in use - if one is using ext2 and one using ext4, or something, that could account for the difference | 14:26.18 |
Robin_Watts | mulover: That's why I was suggesting using /dev/null. | 14:27.03 |
chrisl | Or even worse, over NFS/CIFS etc...... | 14:29.29 |
Heiko | mulover: you can also try it on a tmpfs mount point, this sould also be fast | 14:30.58 |
mulover | fs: ext4, definitily no network mount or something like that. Writing to /dev/null makes no difference, just a bit faster in total for both cases. | 14:31.57 |
chrisl | deekej: the thing is, we have a fairly extensive regression testing system, which tests every commit we make. But that system has never seen the combination of commits you currently have on top of the 8.70 release. Worse, it is not always obvious when a change/fix relies on a (possibly much) earlier one | 14:32.16 |
| mulover: that rather points the finger at glibc | 14:32.51 |
deekej | chrisl: yes, I see what you mean. You even have your own instance of coverity running to check the code, AFAIK. | 14:34.30 |
mulover | OK. thank you, i will have to talk the sysadmin. | 14:35.45 |
deekej | chrisl: I guess I will try to use all these point to receive an exception for ghostscript, so it could be rebased to latest version whenever possible | 14:35.47 |
chrisl | deekej: As I said, I wasn't really aiming all that at you, it's just it does come up shockingly often. And I don't think the argument limited to Ghostscript. I get that "stable" is good for a while, but I sure wouldn't want to be using a 7+ year old kernel on a live server | 14:38.01 |
| Am I the only one who has a really strong dislike of the SOURCE_DATE_EPOCH thing? | 14:39.00 |
Robin_Watts | no. | 14:39.21 |
chrisl | I'm stunned it seems to have garnered the traction that it's got :-( | 14:40.06 |
Robin_Watts | actually, I should backtrack a bit. My dislike is not as strong as yours I suspect. | 14:43.09 |
jogux | chrisl: I don't particularly like it, but I support the notion of verifiably reproducible builds. | 14:43.52 |
deekej | chrisl: anyway, it seems we have find another bug related to the .locksafe (in version 9.20), I will submit it later today for you to see the details | 14:43.58 |
chrisl | I dislike the fact it's a defined way to embed false information in a build.... | 14:44.22 |
ray_laptop | deekej: thanks, as one of the designers of the whole locking junk, I'm curious what you've found | 14:58.09 |
| and raph isn't at Artifex anymore | 14:59.27 |
deekej | ray_laptop: actually, it might not be connected to .locksafe (we just saw an error mentioning it before) | 15:02.57 |
| ray_laptop: here's the BZ https://bugs.ghostscript.com/show_bug.cgi?id=697529 | 15:03.07 |
kens | I think my answer there is going to be 'don't use pdf2dsc' | 15:04.07 |
| If you want DSC output use ps2write | 15:04.14 |
deekej | kens: can you elaborate on why is that, please? | 15:05.02 |
kens | Because pdf2dsc isn't a supported product, whereas ps2write is | 15:05.21 |
deekej | ah, ok | 15:05.27 |
kens | pdf2dsc is a hacky PostScript program | 15:05.35 |
| The contents of the 'lib' folder aren't truly part of Ghostscript, they are potentially useful extras | 15:06.33 |
deekej | in that case, is there a reason why it is still being kept around? should I remove it from Fedora 25 distribution (and make a note in man page)? | 15:06.36 |
kens | Like I said, they are potentially useful extras | 15:06.54 |
| IMO if you want PostScript from a PDF file, use the driver that produces PostScript | 15:07.46 |
| I'll take a quick look, but I suspect its simply that you are tying to access a file outside the limted scope of the paths when you have set SAFER | 15:09.17 |
| I imagine if you turn off safer it will work | 15:09.25 |
chrisl | deekej: the SAFER restrictions are being more consistently, and more rigorously applied, now | 15:10.43 |
kens | Well the content of the 'dsc' file includes: | 15:11.47 |
| (input.pdf) (r) file | 15:12.01 |
| I imagine if you try to runj that with SAFER then it will give you an invalidfileaccess error. | 15:12.27 |
| Looks to me like 'works as intended' | 15:12.33 |
chrisl | Yeh, I don't think that will have changed in 9.20, though | 15:12.38 |
kens | The output file 'out.dsc' works fine for me | 15:13.12 |
deekej | ok, so this in the end is NOTABUG, and user has to convert it some other way (like the ps2write you mentioned) | 15:13.41 |
kens | Ort don't use -dSAFER | 15:13.53 |
| On the .dwsc file | 15:13.58 |
| Sorry, .dsc file | 15:14.04 |
| Which I guess would mean chaning evince. | 15:14.15 |
chrisl | It *might* work if you use full, absolute paths | 15:14.21 |
deekej | well, I think we should use the -dSAFER consistently as you :) | 15:14.24 |
kens | Its kind of more than NOTABUG its WASABUGNWFIXED | 15:14.31 |
| It looks like there was a problem where -dSAFER wasn't being applied, and now it is. | 15:14.59 |
| The application of SAFER causes the file to be unavailable | 15:15.11 |
| If you instead use ps2write to turn it into a real, genuine PostScript program, then it will work fine. | 15:15.36 |
| Note that the '.dsc' file uses Ghostscript specific operators, and so will not work on any other PostScript consumer | 15:15.54 |
chrisl | Oh, ick..... it expects to be run with -dDELAYSAFER | 15:16.38 |
kens | Err, no, it just sets .setsafe if it is run with DELAYSAFER | 15:17.17 |
chrisl | Yes, but that means it *explicitly* won't work with -dSAFER | 15:17.45 |
kens | RIght, so not going to work :-) | 15:17.56 |
chrisl | And indeed, it fails as I would expect | 15:18.14 |
deekej | ok | 15:19.14 |
| I'm trying to find syntax on how to convert it with ps2write, but google is not returning anything useful so far. Do you have some link that I could check? | 15:20.09 |
kens | gs -sDEVICE=psw2write -sOutputFile=out.ps input.pdf | 15:20.58 |
| err ps2write there, not psw2write | 15:21.17 |
deekej | ok | 15:21.28 |
chrisl | I'm rather baffled why you'd want to do that at all - evince reads PDFs as PDFs, why convert a PDF to fake/hacky Postscript? | 15:21.52 |
kens | Presumably just to prove that its a broken PS file | 15:22.32 |
| WHich it isn't, exactly. Its a PostScript program which only works with Ghostscript, and only if you don;t set -dSAFER :-) | 15:22.56 |
deekej | honestly, I don't know. I'm just trying to deal with another bug report :) the OP did not provide any valid usecase for that AFAIK | 15:22.58 |
chrisl | deekej: if there is a way to influence the parameters that evince passes to Ghostscript, then there is a solution | 15:24.03 |
deekej | chrisl: what do you mean exactly? I'm still tryin to wrap my head around it. Running: | 15:27.56 |
| gs -dSAFER -sDEVICE=ps2write -sOutputFile=output.dsc input.pdf | 15:28.09 |
| will produce a file which is readable by evince without problem | 15:28.20 |
kens | Yes, but that's not the same as the output from pdf2dsc | 15:28.33 |
deekej | so why is not this approach used in 'pdf2dsc' in the first place? | 15:28.39 |
| ah, ok | 15:28.43 |
kens | pdf2dsc is not the same as the pdf2ps script | 15:28.58 |
chrisl | What ps2write outputs is "real" Postscript. What pdf2dsc outputs is a pseudo Postscipt file which directs Ghostscript to read the original PDF | 15:29.20 |
deekej | ok, now I get it :) | 15:29.42 |
kens | And decoates it with DSC (Document Structure Convention) comments | 15:29.43 |
deekej | thank you very much for your clarification and time :) | 15:29.51 |
chrisl | Hence my confusion about why *anyone* would want to use it! | 15:30.05 |
deekej | yeah, I see your point | 15:30.29 |
kens | I admit I can't see the utility of pdf2dsc. Especially now that ps2write produces DSC PostScript. I wonder if we should deprecate and remove it | 15:30.33 |
| It appears the last update was nearly 10 years ago | 15:31.17 |
chrisl | deekej: So, the problem with the pdf2dsc output combined with -dSAFER is that the pdf2dsc output tries to the open the original PDF. By that stage, it's trying to open (as far as gs is concerned) some arbitrary file - which is exactly what SAFER is meant to protect against | 15:31.36 |
| kens: I wonder if it is used by gv | 15:32.10 |
kens | I do see that it was a gv-related bug report | 15:32.21 |
| Does anyone still use gv ? :-D | 15:32.37 |
Robin_Watts | Would it be possible to replace pdf2dsc with a call to gs -sDEVICE=ps2write -o out.ps ? | 15:32.59 |
kens | Robin_Watts : we could alter the script | 15:33.11 |
chrisl | The problem with that is that it doesn't do the same thing | 15:33.18 |
kens | And ditch the pdf2dsc.ps file | 15:33.18 |
| Yeah was going to say that | 15:33.23 |
| Though I question how valuable the current program is | 15:33.41 |
Robin_Watts | chrisl: "doesn't produce identical results", true. "does what the script names suggests it does" though? | 15:33.59 |
chrisl | Well, I'm not sure there is such a thing as a "dsc".... | 15:34.37 |
kens | Hmm it looks like this might behave a little differntly with multi-page PDF files | 15:34.59 |
chrisl | There was a gv commit in 2014 - wow! | 15:35.30 |
kens | Ah, I see, it does call DoPDFPage for each page in the PDF file | 15:35.35 |
| TBH I'm doubtful about the utility of this script/program | 15:35.55 |
| chrisl I suspect this is so that dumb applications like gv can treat a PDF file like a DSC-compliant PostScript file, and navigate through it. | 15:37.37 |
chrisl | It does *seem* that gv uses pdf2desc.ps | 15:37.49 |
kens | Well it wraps each page in comments appropriate for the page media, and draws the requisite page from the PDF file. So in effect it makes the PDF file look like a DSC-compliant PostScript file | 15:38.34 |
| Now we *could* replace that with ps2weite, which would have the same effect, but.... Its not the same thing. Because ps2write will render at a specific resolution for transparency, whereas what's here will render the PDF page at the 'current' resolution. | 15:39.43 |
| If GV does some kind of 'zooming in' then the difference might be visible. | 15:40.02 |
chrisl | Basically, gv doesn't understand PDF, but it does understand DSC compliant Postscript. Hence it uses pdf2dsc.ps | 15:40.16 |
kens | Indded. | 15:40.26 |
| Or even indeed | 15:40.31 |
chrisl | So, how about we remove the script, and leave the Postscript file | 15:40.39 |
kens | We can do that, certainly | 15:40.49 |
| I'd be inclined to have the program spit out a warning that the output of the program is only suitable for use with GV or something. | 15:41.12 |
chrisl | That won't affect gv (and others who really understand what they're doing), but would put off casual users | 15:41.25 |
Robin_Watts | ok, if I limit the amount of memory available to gpcl on windows, appropriately, I can get it to die with memory corruption too. | 15:41.25 |
kens | But I guess that might make gv unhappy ;-) | 15:41.26 |
chrisl | kens: if you make it a warning, gv users would never see it | 15:42.04 |
kens | Doesn't come out on stdout somewhere >? | 15:42.40 |
| Anyway, I think this basically boild down to 'wors as expected' | 15:43.11 |
| If the original bug reporter wants to discuss it further, they can come here :-) | 15:43.38 |
chrisl | kens: gv sets -dQUIET when it does the conversion. And besides, gv is a gui app, so probably usually runs without a direct connection to a visible tty | 15:44.13 |
kens | OK then having it spit somehing on stdout would be OK (except that the program re3commends running with -q....) | 15:44.46 |
chrisl | Well, just have it write to stdout..... | 15:45.49 |
| Forward 1 day (to 2017/02/02)>>> | |