| <<<Back 1 day (to 2018/08/03) | 20180804 |
sebras | hm... I have a PDF with a cmap where in the beginbgchar block I see "<30> <00> <00SD> <003D>" | 11:33.29 |
| mupdf interprets this as src=<30> dst=<00> and discards this because dst does not contain 16 bits. | 11:33.58 |
| next it attemps to parse <00SD> only to discard "S" because it is not a hex character. | 11:34.20 |
| the "D" after that is discarded because it is not a full hexbyte | 11:34.47 |
| so in the end src=<00> dst=<003D> | 11:35.12 |
| I guess we _could_ parse the initial "<30> <00>" as if it was "<30> <0000>", i.e. prepending a "00" to the dstcode. and the "<00SD> <003D>" could be parsed as "<00D> <003D>" which would then be treated as "<0D> <003D>" because the defined codespace range in begincodespacerange is <00> <FF> | 11:37.17 |
| i.e. my idea is that we prepend too short hex strings with "0" until they are long enough and hex strings that are too long are truncated at the MSB end. | 11:38.11 |
| of course all of this ought to come with suitable warnings because the file does not appear to adhere to the specification. | 11:38.31 |
| doing this would affect _every_ hexstring in a PDF though, so I'm not sure if this is a good idea. opinions? | 11:45.32 |
| Forward 1 day (to 2018/08/05)>>> | |