| <<<Back 1 day (to 2020/10/05) | Fwd 1 day (to 2020/10/07)>>> | 20201006 |
Wizzup | Hi! I'm trying to find out if mupdf can write PDF/A files, and google is failing me, so I figured I'd ask in here. :-) | 10:53.12 |
ator | Wizzup: we can't convert random files to PDF/A compliant files, if that's what you're looking for. | 10:59.59 |
Wizzup | Yeah, I think the answer is "yes, mupdf can write PDF/A files if you use it right". I wasn't looking to convert random documents. | 11:00.33 |
| My use case is turning scans of books into PDFs, inserting OCR results, storing the JPEG2000 (input) images in the PDFs (I checked and it looks like mupdf can write jpx), tag the PDF somewhat for accesibility purposes, and save as PDF/A-2. | 11:01.30 |
ator | We write files in a way that is compatible with PDF/A, so if the input is compliant we won't mess it up if care is taken. | 11:01.41 |
Wizzup | It looks like this should be possible (as opposed to ghostscript, which doesn't seem to do JPEG2000) | 11:01.43 |
| Great. | 11:01.48 |
ator | Small things like which line endings are used, stuff like that. | 11:02.05 |
| But we make no checks or guarantees that the accessibility, colorspace, and compression options are valid for PDF/A. | 11:02.46 |
Wizzup | Check, but it seems like being in control of the input materials and the code itself, it should be possible to create PDF/A files. | 11:04.16 |
| (for those interested) I'm looking at replacing a big old stack that uses abbyy ocr and foxit/luratech pdf tools and switching it all over to tesseract and (preferrably) mupdf. Will have to look at MRC compression, but other than that, it looks like it's quite feasible. | 11:06.58 |
ator | Yes. | 11:11.16 |
malc_ | Wizzup: why are you switching? | 11:28.22 |
Wizzup | Moving to open source software is a big part of it. Getting away from annoying licensing deals, too. Tesseract has come a long way as far as OCR goes, too. It's a mix of quality, annoying licensing, and control (which open source provides). | 11:32.33 |
malc_ | Wizzup: foxit's source is sorta open (chinese scumm), abby is closed (semi-russki scumm).. in any case it looks like you want to severe all ties to communism... okay | 11:41.33 |
Wizzup | I wouldn't phrase it like that, but it did give me a smile. :-) | 11:43.33 |
sebras | ator: what's the release plan? | 11:49.46 |
ator | sebras: see if pete's problem is something we can easily fix or if it's holding up the release, otherwise I think we're good with what's on origin/master now | 11:50.22 |
sebras | ator: since they're branched out, does it matter? | 11:50.47 |
ator | let's move to #artifex | 11:51.39 |
sebras | ok | 11:51.46 |
ator | pete's not in this channel :) | 11:52.26 |
sebras | ah. | 11:56.07 |
malc_ | https://idioms.thefreedictionary.com/for+Pete%27s+sake | 11:58.53 |
| Wizzup: isn't your use-case Déjà vu's raison d'être, if you'll pardon my french (finally i can use the expression literally) | 12:06.37 |
Wizzup | (This might go offtopic real soon, so might want to move the discussion?) PDF is more convenient for viewing I believe. And apparently there are quite some users who like the PDFs, also for accessibility reasons. | 12:15.12 |
ator | dejavu was designed for this, but the format died on the vine due to draconic licensing and patent issues that meant no ecosystem (proprietary or open source) of software to support it. | 12:16.45 |
malc_ | Bottom line - i was right | 12:19.54 |
| <<<Back 1 day (to 2020/10/05) | Forward 1 day (to 2020/10/07)>>> | |