Ghostscript IRC logs

	<<<Back 1 day (to 2022/01/30)	Fwd 1 day (to 2022/02/01) >>>	20220131
artifexirc-bot	<ShakespeareFan00> Hi.		10:38.21
	<ShakespeareFan00> Are there commands in Ghostscript that can be used to find out the DPI of indvidual pages in a PDF without needing to render the whole page?		10:39.06
	<ShakespeareFan00> I'll have a look at the manual though		10:39.14
	<KenSharp> PDF doesn't have a dpi		10:39.17
	<ShakespeareFan00> Erm..		10:39.46
	<KenSharp> PDF is a vector format (though it includes the possibility of bitmap image data) you can rescale the vectors arbitrarily. The DPI of the bitmap data depends on the number of samples in the data in each direction, and teh area of the media that the image covers.		10:40.58
	<ShakespeareFan00> https://phabricator.wikimedia.org/T224355 was why I was aksing		10:41.00
	<ShakespeareFan00> In the specfic PDF concerned, the PDF I think has raster scans (compressed from JP2/Tiff) originally)		10:41.46
	<KenSharp> Right so the image on the page has an 'effective' dpi, which is given by the number of image samples in each direction, divided by the size of the media iit covers in each direction		10:41.47
	<KenSharp> If the quality of the image is decreasing when you run it through tools then the most likely reason is that it is uusing lossy compression.		10:42.38
	<ShakespeareFan00> https://bugs.ghostscript.com/show_bug.cgi?id=702531#c1		10:42.58
	<ShakespeareFan00> I am suspecting that it's an issue on the Wikimedia side of things.		10:43.22
	<KenSharp> And that is rendering the PDF to a specific resolution. Obeviously then the resolution is given by the outptu image, not the PDF file.		10:43.43
	<ShakespeareFan00> But was asking if there was a command in Ghostscript and related tools that would enable a 'better' approach to deciding on a more sensible rendering value.		10:44.07
	<ShakespeareFan00> ( The Ghostscript ticket also mentioned MuPDF)		10:44.23
	<KenSharp> Well without spending time looking into the problem, then I cannot answer the question, especially not without some idea what is 'bad' about the current rendering.		10:44.46
	<KenSharp> Peter says the output is fine when rendered at 150 dpi (which is low resoltuoin)		10:44.59
	<KenSharp> @ShakespeareFan00 I'm really not at all clear on what you want. You asked a question which doesn't have a meaning (dpi of PDF pages)		10:46.16
	<KenSharp> Now you've moved on to quoting a closed bug report.		10:46.33
	<KenSharp> The answer in the bug is that there is nothing wrong with Ghostscript's rendering.		10:46.50
	<ShakespeareFan00> Let's step back a bit		10:46.55
	<KenSharp> Without kowing what you are doing and what your problem is there is nothing we can say to help		10:47.03
	<ShakespeareFan00> Short explanation...		10:47.19
	<ShakespeareFan00> Images from PDF don't render at high quality..		10:47.31
	<KenSharp> Well there's your first problem.		10:47.42
	<KenSharp> What do you mean by 'high quality' ?		10:47.48
	<ShakespeareFan00> At a suitable resolution.		10:48.05
	<KenSharp> quality and resolution are not the same thing		10:48.27
	<ShakespeareFan00> The PDF's are old documents, which Wikisource (a Wikimedia Project was wanting to transcribe.)		10:49.08
	<ShakespeareFan00> The PDF's are old documents, which Wikisource (a Wikimedia Project) was wanting to transcribe.		10:49.18
	<ShakespeareFan00> If the output rendering from the PDF is at a very low quality, (I think the term in one of the tickets I linked is 'undersampled' then some charcters are not possible to read reliably..		10:50.10
	<KenSharp> Well looking at the original PDF file in the Ghostscript bug report you pointed at, the original scan is of very low quality. Are you seriously expecting Ghostscript to improve it ?		10:50.49
	<ShakespeareFan00> No.		10:51.08
	<ShakespeareFan00> But It was found that by upping a nominal DPI in the rendering made the image more readable		10:51.53
	<ShakespeareFan00> The PDF itself isn't low quality, because I can get a good image in other readers.		10:52.13
	<KenSharp> The PDF I'm looking at there, the one attached to the Ghostscript bug report #70253 is of low quality (IMO)		10:52.42
	<ShakespeareFan00> I don't think Ghostscript is wrong here.		10:52.52
	<KenSharp> For example the capital 'B' in BOOKS has a gap in the lower portion, and the 'Boyd's' Directory' the top line is missing in the B		10:53.12
	<KenSharp> Peter suggested using anti-aliasing to 'blur' the text which would tend to reduce that because adjacent pixels would tend to fill the gaps		10:53.58
	<KenSharp> Now it is possible that you are rendering to an image format (using GS) at a lower 'effective' resolution than the original image, which will reduce the quality		10:55.03
	<ShakespeareFan00> However, I was asking fi there was a Ghostscript supported way of asking about nominal page/image dimensions, so that other tools can make more appropriate rendering choices.		10:55.10
	<ShakespeareFan00> I don't think Ghostscript's wrong here, (as the closure of the linked ticket confirms)		10:55.33
	<ShakespeareFan00> However, I was asking if there was a Ghostscript supported way of asking about nominal page/image dimensions, so that other tools can make more appropriate rendering choices.		10:55.58
	<ShakespeareFan00> The alternative to getting better renderings from the PDF, is to ask someone on the Wikisource project to regenerate it, directly from TIFF/JP2.		10:56.47
	<ShakespeareFan00> For single volumes that;s fine, Not for 674 volumes...		10:57.04
	<KenSharp> The way to discover the effective resolution of this kind of image is to determine the width and height of the image, then divide that by the number of inches in each direction. That gives you the effective resolution of the image. If you render at that same resolution then the pixels in the image will map 1:1 to pixels in the output (actually they won't, exactly, because of rounding but more or less). So basically you wo		10:57.09
	<ShakespeareFan00> Yes.		10:57.22
	<ShakespeareFan00> Does PDF store a recomended/nominal size of an embedded image?		10:57.53
	<KenSharp> No		10:58.00
	<ShakespeareFan00> Bother:(		10:58.06
	<KenSharp> Imges are drawn using the Current Transformation Matrix and the Matrix of the image,.		10:58.16
	<KenSharp> However.....		10:58.21
	<KenSharp> For simple cases like this where the image covers the entire page, you only need the dimensions of the media		10:58.39
	<KenSharp> Which is given in the Page dictionary		10:58.46
	<KenSharp> One moment and I'll tell you what that example file has		10:58.58
	<ShakespeareFan00> Some educated guesses can also be made about possible original media sizes		10:59.34
	<ShakespeareFan00> based on what the content of the document is, but I'd of course not want to do that unless there isn't another method.		11:00.06
	<KenSharp> So a6.pdf has a MediaBox of 0 0 474 646 which is 6.5833 x 8.972 inches		11:00.20
	<KenSharp> The image (which is a JPX, and therefore is lossy compressed) is 3292x4490 samples		11:01.07
	<ShakespeareFan00> Okay.. so to get a more suitable 'DPI/rendering' value, the Media Box should be read from the Page Dictionary? (And then used in conjunction with infromation about the JPX?)		11:02.01
	<ShakespeareFan00> Okay.		11:02.05
	<KenSharp> So that's 500x500 effective dpi		11:02.14
	<ShakespeareFan00> I think one of the tickets at Wikimedia Phabricator had arrived at a simmilar conclusions		11:02.32
	<KenSharp> So clearly if you render at 150 dpi then you are reducing teh resolution by a factor of about 3		11:02.38
	<KenSharp> If you render at 600 dpi then you should get a decent result		11:02.53
	<KenSharp> This only works easily for full-page images		11:03.36
	<ShakespeareFan00> So the tool that invokes GhostScript needs to use a 'higher' resolution.		11:03.39
	<ShakespeareFan00> Thanks 🙂		11:03.43
	<ShakespeareFan00> As you say this of course will ONLY work for raster scan based full pages.		11:03.57
	<KenSharp> You can get the information from the file by using a variety of tools. MuPDF will report on the content of PDF files		11:03.58
	<ShakespeareFan00> Thanks 🙂 That answers my question 🙂		11:04.16
	<KenSharp> So you could use that to find out how each file was made (the effective resolution) and then use that to determine a reasonable rendering resolution		11:04.37
	<ShakespeareFan00> On something else..		11:04.53
	<KenSharp> I think mutool info would give you the information, but I am not entirely certain		11:05.05
	<ShakespeareFan00> Do you know of a Djvu->PDF conversion tool?		11:05.17
	<KenSharp> Umm, no.		11:05.28
	<KenSharp> The DejaVu people have a PDF->dejavu device for Ghostscript		11:05.51
	<KenSharp> You could always ask them		11:05.58
	<ShakespeareFan00> I'll consider asking...		11:06.07
	<ShakespeareFan00> Generally What I want to do is DJVU->PDF direction Not PDF->Djvu		11:06.31
	<KenSharp> Here's the output from mutool for the a6.pdf file:		11:06.58
	<KenSharp> D:\temp>\mupdf\mupdf\platform\win32\debug\mutool info a6.pdf		11:06.59
	<KenSharp> a6.pdf:		11:07.00
	<KenSharp>		11:07.02
	<KenSharp> PDF-1.5		11:07.03
	<KenSharp>		11:07.04
	<KenSharp> Pages: 1		11:07.06
	<KenSharp>		11:07.07
	<KenSharp> Retrieving info from pages 1-1...		11:07.08
	<KenSharp> Mediaboxes (1):		11:07.09
	<KenSharp> 1 (3 0 R): [ 0 0 474 646 ]		11:07.11
	<KenSharp>		11:07.12
	<KenSharp> Images (1):		11:07.13
	<KenSharp> 1 (3 0 R): [ JPX ] 3292x4490 8bpc DevRGB (4 0 R)		11:07.15
	<ShakespeareFan00> Yes... I see what you mean 🙂		11:07.31
	<ShakespeareFan00> Does mutool produce only plaintext, or can it be asked to generate a structured report like JSON format?		11:08.05
	<ShakespeareFan00> Does mutool produce only plaintext, or can it be asked to generate a structured report in something like JSON format?		11:08.16
	<KenSharp> You're getting beyond my limited knowledge there 🙂		11:08.23
	<ShakespeareFan00> My apologies.		11:08.33
	<KenSharp> MuPDF has a Java interface but I know nothing about it.		11:08.39
	<ShakespeareFan00> I'll go RTFM.		11:08.39
	<KenSharp> You would get a more useful answer on the #mupdf channel 😄		11:08.59
	<KenSharp> My colleagues over there would heva a much better idea than me		11:09.11
	<KenSharp> My colleagues over there would have a much better idea than me		11:09.40
	<RayJohnston> actually JavaScript, not Java		16:51.18
	<RayJohnston> the mutool run command executes javascript and allows JS to open documents, examine contents, etc.		16:52.41
	<RayJohnston> it's not too hard to use editing tools to change the mutool info output into JSON format: awk, python, emacs, ...		16:54.51
	<ShakespeareFan00> PERL... Noth that anyone still uses that..		16:57.40
	<ShakespeareFan00> PERL... Not that anyone still uses that..		16:57.47
	<RayJohnston> Talk to @Robin_Watts 🙂		17:23.26
	<RayJohnston> @ShakespeareFan00 I see that you moved your discussion over to #mupdf. That's a better tool for you, probably.		17:24.44
	<<<Back 1 day (to 2022/01/30)	Forward 1 day (to 2022/02/01)>>>

Log of #ghostscript at irc.freenode.net.