| <<<Back 1 day (to 2020/05/12) | Fwd 1 day (to 2020/05/14)>>> | 20200513 |
paulgardiner | sebras: thanks for the review and picking up the typo. The potential crash bug didn't impinge on the release. In any case, refactorings tend to introduce one or two of those. | 09:22.02 |
ator | pedr0: I can't find any examples that use "toIndirect". where did you see that? | 10:22.47 |
| pedr0: see docs//examples/pdf-portfolio.js for some code that reads a stream and saves it to disk | 10:23.40 |
pedr0 | It may be me having an older release maybe | 10:44.14 |
| I found that in pdf-merge.js | 10:44.32 |
| line 8 | 10:44.36 |
| I've managed to read a stream into a Buffer - now what I'd like to do is to use that as a string, so I can run some regexes on it. Forgive my utter ignorance in JS - how can I do that ? What's not really clear to me is where the line between JS types and MU types is - is a buffer a MUd defined type ? | 10:46.27 |
| I am on 1.16.1 | 10:50.29 |
darkdragon-001 | What is the difference between the android versions -mini and -viewer? They both have the same size... | 11:06.43 |
| 2. Is it possible to implement select© functionality with your library? | 11:07.07 |
ator | darkdragon-001: 1. slightly different ui, mini doesn't have slick animations and swiping. mini has much simpler source code, it's designed as a jumping off point/tutorial for how to use the library. viewer is more refined, but much harder to understand code. | 11:16.16 |
| 2. yes. | 11:16.19 |
| in terms of actual functionality, they're pretty much equivalent, and which one you prefer is up to you. | 11:16.59 |
darkdragon-001 | thanks! | 11:24.16 |
pedr0 | is there any alternative that using the Buffer as an array and looping through it to build a string ? | 11:40.09 |
ator | yes, that sounds like a much older release. we changed toIndirect into asIndirect to be compatible with the naming in the Java version of the same APIs | 11:49.34 |
| a long time ago. | 11:49.38 |
| pedr0: there is no function to turn the buffer into a JS string. | 11:52.00 |
| even if there was, what encoding would you expect? JS strings are unicode strings. a Buffer is just an array of bytes. | 11:52.24 |
pedr0 | okay, but when I use mutool clean I can get the stream to be 'readable' in a regular editor like vim | 11:52.52 |
| isn't there anyway to achieve the same effect in the JS world ? Is the buffer containing the stream's data as it is in the file or has it gone through some processing first ? | 11:53.41 |
ator | some streams. the stream for a JPEG image would be hard to read in a text editor. | 11:53.53 |
pedr0 | *any[space]way | 11:53.58 |
| I am talking about page's content streams | 11:54.16 |
| sorry, I should have clarified that | 11:54.28 |
| what I am after is the flow of PDF instructions that draw a page | 11:55.01 |
ator | page content streams can also contain random non-ascii bytes. like when drawing strings using 8 or 16-bit font encodings. | 11:55.02 |
| inline image data can also be binary | 11:55.11 |
| I have on my TODO (on a rainy day) list an item to expose the PDF operator processor API to JS and Java | 11:56.42 |
| which would possibly help you do what you're looking to do | 11:57.01 |
pedr0 | I am probably missing something, or more than something here. When I use mutool clean with the correct options I can see the flow of pdf instructions there. What has happened to the stream to be readable ? I thought it would be de-compressed but I never considered the encoding aspect | 11:57.13 |
ator | pedr0: well, it really depends on the pdf instructions | 11:57.35 |
| some PDF instructions can contain binary data | 11:57.42 |
| like the instruction to draw text can contain binary data in the string that is drawn | 11:57.54 |
pedr0 | I get that, but I am happy to see the operators only. | 11:58.10 |
ator | and the instruction to draw an image (BI, ID, EI set of instructions) embed a binary data stream in the middle | 11:58.13 |
| so you want the content stream tokenized? | 11:58.31 |
pedr0 | what do you exactly mean by tokenized ? | 11:58.45 |
ator | turned into tokens | 11:58.59 |
pedr0 | okay but what's a token, you meant instruction by instruction ? | 11:59.29 |
ator | numbers, strings, and operators | 11:59.54 |
| and comments | 12:00.11 |
| etc. | 12:00.16 |
pedr0 | I'd like to have the same stuff that I get from mutool clean, really. | 12:00.44 |
ator | what do you want to do with bytes that are not ASCII bytes? | 12:01.25 |
| that's my original question. | 12:01.30 |
pedr0 | leave them as they are | 12:02.02 |
ator | that does not compute. a JS string is an array of unicode characters. | 12:02.38 |
| a content stream in PDF is an array of bytes. | 12:02.52 |
pedr0 | unicode stored in memory un-encoded (32 bits) per-char ? Oh, right | 12:05.55 |
| but I still don't get it and sorry if I ask again | 12:06.12 |
ator | JS is 16-bits per character | 12:07.06 |
pedr0 | but when I run: mutool show pdf pages/1/Contents and I get the stream printed out in the terminal- is that the stream as it is simply un-compressed | 12:07.21 |
| ? | 12:07.38 |
ator | if you jsut want to turn the bytes into characters and use latin-1 encoding, then you'll have trouble trying to save that out (because writing it will encode as UTF-8) | 12:07.38 |
| mutool show pages/1/Contents converts non-printable and non-ASCII characters into "." and wraps the lines at 80 columns | 12:08.51 |
| mutool show -b pages/1/Contents will print the raw bytes | 12:09.03 |
| which your terminal may or may not make sense of. use at your own risk. | 12:09.36 |
pedr0 | stty sane is your friend in that case | 12:10.06 |
| :-) | 12:10.11 |
| so the stream in a buffer is what it is in the file - uncompressed if I am using readRawStream() | 12:13.36 |
| *compressed | 12:14.00 |
| thanks for your help, it seems I can't really do what I wanted to do then | 12:14.20 |
| sometimes I can hack a stream by using a combination of mutool clean + sed and I was hoping to consolidate the whole thing into a JS script | 12:17.45 |
ator | yes, compressed (but decrypted) if using readRawStream | 12:18.33 |
| readStream is decompressed as well | 12:18.45 |
| <<<Back 1 day (to 2020/05/12) | Forward 1 day (to 2020/05/14)>>> | |