MuPDF IRC logs

	<<<Back 1 day (to 2020/05/26)	Fwd 1 day (to 2020/05/28)>>>	20200527
pedr0	hi all, I came across the 'Postscript Tutorial and Cookbook' where there is a section where it is explained how to connect to a Postscript interpreter of a printer device, in that case it is an apple laserwriter. Does anything of that sort still exist in the modern market ? Are there around printers where you can connect to their PS interpreter ?		07:20.01
kens	pedr0: rarely, but yes there are probably still some		07:32.20
	Is there a reason you want to do that ?		07:32.35
pedr0	not really, it just makes me interested,curious		07:40.04
	would you know a model's name, or where I can get some more information ?		07:40.25
kens	It may well only exist as a debugging aid these days, its not something most people would want to do.		07:40.30
	I don't know of any printers that actually allow it I'm afraid, sorry.		07:40.46
pedr0	well, it not very pratical indeed ;-)		07:40.53
kens	If you wanted to play with PostScript you;d be better using Ghostscript		07:41.03
pedr0	but in those days, you connect to the PS inter. and you run commands, but would you be able to produce a page that way ?		07:41.57
kens	Technically, yes		07:42.09
pedr0	fantastic		07:42.14
kens	If you're good enough to write program code on the fly, without making any errors		07:42.24
pedr0	impossible		07:42.32
	:-)		07:42.36
kens	Merely difficult, especially in PostScript		07:42.48
pedr0	not for me for sure		07:42.51
kens	PostScript is widely regarded as a write-only language		07:43.05
	and its stack-based mechanism puzzles most people		07:43.24
	Though it does mean that if you are using it interactively you can at least look at the state		07:43.45
pedr0	but, it shouldn't I reckon. Isn't a JVM working that way too ?		07:43.51
	Java Virtual Machine		07:43.58
kens	Yes, probably, though I'm not a Java programmer		07:44.12
pedr0	but is PS only used in conjunction with printing devices ? Or software rendering on a screen ?		07:45.22
kens	These days, only for printing. There was a system called NeWS which used PostScript for screen drawing.		07:45.52
	Display PostScript		07:46.03
	It had some extensions for multiple canvases and so on		07:46.14
pedr0	it's a niche but interesting area I reckon - technologically 'intense', dense. I've recently started to work with PDFs and I've found them very different from what I thought, and much more difficult, hence the curiosity which led me to .. PostScript		07:47.35
kens	Well PostScript is a programming language, unlike PDF, but although they started with the same imaging model (more or less). They are different now, mostly because of transparency		07:48.16
pedr0	NeWS, were you referring to this : https://en.wikipedia.org/wiki/NeWS		07:49.51
kens	Yep, that's it		07:50.02
	Ghostscritp still has some legacy Display PostScript support, but I doubt it actually still works.		07:51.17
pedr0	If I had more time, and I were less overworked than I am I'd like to experiment with PS a bit, as a way to understand the whole thing better. I think it would help my understanding of the PDF as well, but you know ... you need a goal and I can't find one on PS yet..		08:09.29
kens	Not really surprising :-)		08:09.50
pedr0	:-(		08:10.04
	if the name of device springs to mind, let me know :-)		08:21.51
	*of a		08:21.59
kens	To be honest, I don't get to play with printers these days, adn I suspect that's the only way to find out if one supports it		08:22.26
	You need (obviously) a bi-directional communication port, so network or serial interfaces are probably the only way		08:22.57
	USB that is, for serial		08:23.10
	It looks like HP printers with PostScript support allow you to telnet to port 9100 and enter the executive from there. I've not tried this, my HP printer is downstairs and not currently turned on		08:26.55
	NB you have to start the interactive session by sending executive to the printer		08:27.39
	Obviously you have to type that without seeing any feedback.... I recall typing it very carefuly when talking to printers in the distant past		08:28.12
pedr0	but I can be sure that an HP printer comes with PostScript support ?		08:31.21
	thanks a lot by the way		08:31.28
kens	Well you'd have to look up the model number, I don't thin kall HP printers come with PostScritp support, I know mine does because I was careful to make sure it did (its useful for me for testing).		08:32.04
	.msg chanserv info #mupdf		09:17.59
myopia	does mupdf support djvu files?		14:06.29
kens	myopia: no, not supported		14:08.29
myopia	is there some way I can lobby you for including support though? iirc, djvu had long been an libre format before pdf		14:09.36
kens	I very much doubt there is any intrest here for supporting it, though I am not the final authority. Being an open format isn't really a criteria for us		14:10.18
sebras	myopia: there is a patent on (parts of?) djvu, but djvulibre has been granted a patent license. you can read more about the details here: http://djvu.sourceforge.net/licensing.html		14:11.12
	myopia: sumatrapdf on windows can decode djvu (and uses mupdf to render PDFs), while okular and envince both are able to decode djvu.		14:12.53
myopia	I guess even a prestigious lineage, even with exceptions to GPL cannot dispel industrial inertia and commercial use licensing... thanks for pointing me to sumatra though		14:15.30
kens	We only have a limited number of developers, we can't do everything		14:17.11
myopia	(I think it's more of *tex projects failing to move from pdf dooming the whole thing in the first place)		14:18.15
ator	myopia: I've looked into it. the patent license for djvulibre only applies if you use that actual library.		14:58.09
	which is GPL and thus cannot be used by mupdf and/or ghostscript's commercial customers		14:58.39
	myopia: djvu is a fancy bitmap image format, unlike PDF it can't do proper vector graphics for text and line art, etc.		14:59.41
myopia	(yeah, I said "commercial use licensing..." which is sad. djvu does ebooks right though, esp. those scanned ones)		15:00.06
ator	myopia: it's a good high quality format for scanned text with a few images, shame about lizardtech (the former djvu owners) really hostile patent policy		15:01.12
	they tried to commercialize the format, and in doing so doomed it to obscurity		15:01.30
	we do occasionally see the DjVu compression ideas used in PDF files, but these files are always slow		15:02.43
	they draw each page with two JPEG2000 encoded color images, with a JBIG2 bitmap used to mask/select between the two color images		15:03.33
	one is a background image, with the text blanked out. the other is the text with the background blacked out.		15:04.27
	the background image is drawn first, then the foreground image using the bitmap as a mask		15:04.43
	thus the background and foreground images can be compressed to both ignore the text outlines and be very small		15:05.05
	but ... since these PDF files use JPEG2000 for the color images, they're horribly slow (JPEG2000 is a TERRIBLE image format)		15:05.23
sebras	ator: they could choose to use normal JPEG. I wonder why they opt for j2k.		15:06.15
ator	sebras: my guess -- because DjVu uses a precursor to the wavelet compression that made it into JPEG2000		15:08.40
	IW44		15:08.43
	likewise, the JB2 compression for the DjVu bitmap is a precursor to JBIG2		15:08.59
sebras	ator: so you think they chose it to come as close to DjVu as possible?		15:10.13
ator	I suspect the software that makes these may be the same that was used to compress DjVu files, with the same 'ignore these pixel values' in the wavelet compressor algorithms		15:10.17
	but targeting JPEG2000 + JBIG2 in a PDF rather than IW44+JB2 in a DjVu		15:10.30
	or just a misguided attempt to make the files smaller by using JPEG2000 (which, in fairness, does result in smaller files than jpeg)		15:11.10
sebras	perhaps they are valuing compression more than rendering speed.		15:11.14
kens	I think they are, IIRC djvu's claim to fame was its small size		15:11.42
	back when that mattered		15:11.46
sebras	to some (like archive.org) it probably still matters, but not to casual users.		15:12.35
ator	sebras: the difference between JPEG2000 and JPEG is not big enough to matter, IMO		15:13.16
kens	I'm not certain the compression is that much better than just a regular compressed file any more, but I could be wrong, I certainly haven't tested it		15:13.21
ator	splitting the scanned image into planes separated by high frequency details like text and using an imagemask to select between them, is the big win		15:13.47
sebras	ator: yes, that I buy.		15:14.05
ator	because then you have two low-res high compression lossy color images, and one bitmap with compression that's tailored for text scans		15:14.15
kens	That was always the argument in favour, yes. I'm just not certain that it matters that much these days.		15:14.51
ator	and you still have all the coffe stains and marginalia and color illustrations preserved		15:15.02
	without getting the terrible ringing artefacts from compressing a scanned text page as JPEG		15:15.23
sebras	ator: I'm surprise to see in mujs/utf.c that one is not allowed to encode a small value in multiple continuation bytes.		15:20.39
	didn't know that that was illegal.		15:20.45
ator	There shall be only one! (way to represent a code point)		15:21.21
	though there is one exception I added (modified UTF-8), for the 0 byte		15:21.39
sebras	except NUL.		15:21.41
	yes, I saw it.		15:21.47
	ator: this comment in runetochar() is a bit questionable too:		15:28.47
	* four character sequence		15:28.52
	* 010000-1FFFFF => T4 Tx Tx Tx		15:28.52
	the range only goes up to Runemax, not to 1FFFFF.		15:29.07
	https://en.wikipedia.org/wiki/Tags_(Unicode_block) "new shiny! oops, we made a mistake, they're deprecated! double oops, we made a mistake mistake, they're undeprecated but should be used differently!"		15:41.10
*sebras*	just discovered IVD and variation selectors. unicode is anything but simple.		15:45.38
	there besides the comment above and the number of set bits in the comment to Rune4 I'm now good with "		15:49.44
	Support 4-byte UTF-8 sequences.".		15:49.50
myopia	I would suggest unicode MES-1/2 subset if you are trying to using unicode internally		15:50.07
ator	ah, Runemax is 10FFFF not 1FFFFF. my bad.		15:50.41
myopia	note that utf8 in its 4-byte form produces 3 + 6 + 6 + 6 = 21 bits of output which can go as high as 1FFFFF, leaving 15 additional planes unpresentable in utf-16		15:52.14
	which is why utf-8 parsing can be very hard to get right. the difficulty propagate to anything that uses utf-8 in place of utf-16		15:53.07
sebras	myopia: yes, but mujs only allows characters up to 10FFFF, so anything above that will be encoded as U+FFFD		15:53.08
	at least that's how I read the changes I just reviewed. :)		15:53.27
myopia	I would suggest SUB (was it U+001A) in place of U+FFFD if you work with utf-8 very often (U+FFFD is fine if you are working with utf-16 though)		15:54.15
	then for every invalid byte you produce a SUB instead of three more bytes		15:54.38
	keeping the byte-to-byte mapping. SUB is usually rendered as a box anyway.		15:55.20
sebras	myopia: one would hope that codepoints that cannot be encoded in UTF-8 is relatively rare for mujs.		15:59.28
myopia	(iirc javascript uses UCS-2 charset)		16:02.49
	(meaning anything beyond BMP chars is illegal)		16:03.03
	citing http://es5.github.io/x2.html#x2 it's "either UCS-2 or UTF-16"		16:05.26
sebras	http://git.ghostscript.com/?p=mujs.git;a=commitdiff;h=832e0690493eaa6b9875e477c79ea3200c2c4310		16:25.21
	ator: I'm confused as to whether ArrayBuffer_slice() adheres to the spec.		17:44.24
pedr0	hi guys - I am reading the PDF guide you've written and I've found "The PDF Output device is still a work in progress, as its handling of fonts is incomplete". I've started using the document writer interface and I've found a lot of messages where it goes "Cannot create ToUnicode mapping for ....". Is there anything I can do to work around the problem ?		18:20.26
	incomplete. Nonetheless for certain classes of files it can be useful.		18:20.27
sebras	ator: maybe I'm silly but in the TypeedArray commit, why is there no js_tofloatarray and js_todoublearray?		19:12.59
	ator: I can't fault the code as it is written, but there are a few corners where I'm not fully clear on what is happening.		19:14.44
	ator: I'd have a hard time seeing problems with the stack indices, getglobal, js_call, etc.		19:23.30
	it all looks probable though.		19:23.37
Zsolt	hello		22:01.18
mubot	Welcome to #mupdf, the channel for MuPDF. If you have a question, please ask it, don't ask to ask it. Do be prepared to wait for a reply as devs will check the logs and reply when they come on line.		22:01.18
Zsolt	what kind of lossless image compression is available for pdf files, beside the jpeg2000 lossless?		22:02.00
	I know for black and white there is CCT4 and JBIG2 (lossless)		22:02.31
	but for color I can select between zip and jpeg2000		22:03.03
	I don't know the zip compression is a pdf specific zip?		22:03.35
	or the usual zip? as for zip archive files . . .		22:03.45
sebras	Zsolt: zip based on the DEFLATE compression algorithm.		22:07.00
	Zsolt: which is what PDF uses (i.e. without the zip archive trailers)		22:07.31
Zsolt	sebras, is there any other lossless compression method for color images beside zip and jpeg2000?		22:30.27
sebras	Zsolt: jpeg2000 is normally not lossless, but run length encoding is.		22:33.49
	Zsolt: LZW falls into the lossless category too.		22:34.46
Zsolt	so LZW and zip compression is different?		23:04.04
	is there a rule related to the resolution of an image scan? like 300ppi at least for image scans		23:13.27
	<<<Back 1 day (to 2020/05/26)	Forward 1 day (to 2020/05/28)>>>

Log of #mupdf at irc.freenode.net.