forked from qpdf/qpdf
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Properly handle strings with PDF Doc Encoding (fixes qpdf#179)
The QPDF_String::getUTF8Val() method was not treating strings that weren't explicitly Unicode as PDF Doc Encoded. This only affects characters in the range 0x80 through 0xa0.
- Loading branch information
1 parent
2780a18
commit 4bb3046
Showing
11 changed files
with
201 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,8 @@ | ||
2018-02-17 Jay Berkenbilt <[email protected]> | ||
|
||
* Fix QPDFObjectHandle::getUTF8Val() to properly handle strings | ||
that are encoded with PDF Doc Encoding. Fixes #179. | ||
|
||
* Add qpdf_check_pdf to the "C" API. This method just attempts to | ||
read the entire file and produce no output, making possible to | ||
assess whether the file has any errors that qpdf can detect. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
ž | ||
žč | ||
žđ | ||
žć | ||
žš | ||
ž ajklyghvbnmxcseqwuioprtzdf | ||
š | ||
šč | ||
šđ | ||
šć | ||
šž | ||
š ajklyghvbnmxcseqwuioprtzdf |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
� 128 0x80 0200 U+2022 BULLET | ||
� 129 0x81 0201 U+2020 DAGGER | ||
� 130 0x82 0202 U+2021 DOUBLE DAGGER | ||
� 131 0x83 0203 U+2026 HORIZONTAL ELLIPSIS | ||
� 132 0x84 0204 U+2014 EM DASH | ||
� 133 0x85 0205 U+2013 EN DASH | ||
� 134 0x86 0206 U+0192 SMALL LETTER F WITH HOOK | ||
� 135 0x87 0207 U+2044 FRACTION SLASH (solidus) | ||
� 136 0x88 0210 U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK | ||
� 137 0x89 0211 U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK | ||
� 138 0x8a 0212 U+2212 MINUS SIGN | ||
� 139 0x8b 0213 U+2030 PER MILLE SIGN | ||
� 140 0x8c 0214 U+201E DOUBLE LOW-9 QUOTATION MARK (quotedblbase) | ||
� 141 0x8d 0215 U+201C LEFT DOUBLE QUOTATION MARK (double quote left) | ||
� 142 0x8e 0216 U+201D RIGHT DOUBLE QUOTATION MARK (quotedblright) | ||
� 143 0x8f 0217 U+2018 LEFT SINGLE QUOTATION MARK (quoteleft) | ||
� 144 0x90 0220 U+2019 RIGHT SINGLE QUOTATION MARK (quoteright) | ||
� 145 0x91 0221 U+201A SINGLE LOW-9 QUOTATION MARK (quotesinglbase) | ||
� 146 0x92 0222 U+2122 TRADE MARK SIGN | ||
� 147 0x93 0223 U+FB01 LATIN SMALL LIGATURE FI | ||
� 148 0x94 0224 U+FB02 LATIN SMALL LIGATURE FL | ||
� 149 0x95 0225 U+0141 LATIN CAPITAL LETTER L WITH STROKE | ||
� 150 0x96 0226 U+0152 LATIN CAPITAL LIGATURE OE | ||
� 151 0x97 0227 U+0160 LATIN CAPITAL LETTER S WITH CARON | ||
� 152 0x98 0230 U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS | ||
� 153 0x99 0231 U+017D LATIN CAPITAL LETTER Z WITH CARON | ||
� 154 0x9a 0232 U+0131 LATIN SMALL LETTER DOTLESS I | ||
� 155 0x9b 0233 U+0142 LATIN SMALL LETTER L WITH STROKE | ||
� 156 0x9c 0234 U+0153 LATIN SMALL LIGATURE OE | ||
� 157 0x9d 0235 U+0161 LATIN SMALL LETTER S WITH CARON | ||
� 158 0x9e 0236 U+017E LATIN SMALL LETTER Z WITH CARON | ||
� 159 0x9f 0237 U+FFFD UNDEFINED | ||
� 160 0xa0 0240 U+20AC EURO SIGN |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
• 128 0x80 0200 U+2022 BULLET | ||
† 129 0x81 0201 U+2020 DAGGER | ||
‡ 130 0x82 0202 U+2021 DOUBLE DAGGER | ||
… 131 0x83 0203 U+2026 HORIZONTAL ELLIPSIS | ||
— 132 0x84 0204 U+2014 EM DASH | ||
– 133 0x85 0205 U+2013 EN DASH | ||
ƒ 134 0x86 0206 U+0192 SMALL LETTER F WITH HOOK | ||
⁄ 135 0x87 0207 U+2044 FRACTION SLASH (solidus) | ||
‹ 136 0x88 0210 U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK | ||
› 137 0x89 0211 U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK | ||
− 138 0x8a 0212 U+2212 MINUS SIGN | ||
‰ 139 0x8b 0213 U+2030 PER MILLE SIGN | ||
„ 140 0x8c 0214 U+201E DOUBLE LOW-9 QUOTATION MARK (quotedblbase) | ||
“ 141 0x8d 0215 U+201C LEFT DOUBLE QUOTATION MARK (double quote left) | ||
” 142 0x8e 0216 U+201D RIGHT DOUBLE QUOTATION MARK (quotedblright) | ||
‘ 143 0x8f 0217 U+2018 LEFT SINGLE QUOTATION MARK (quoteleft) | ||
’ 144 0x90 0220 U+2019 RIGHT SINGLE QUOTATION MARK (quoteright) | ||
‚ 145 0x91 0221 U+201A SINGLE LOW-9 QUOTATION MARK (quotesinglbase) | ||
™ 146 0x92 0222 U+2122 TRADE MARK SIGN | ||
fi 147 0x93 0223 U+FB01 LATIN SMALL LIGATURE FI | ||
fl 148 0x94 0224 U+FB02 LATIN SMALL LIGATURE FL | ||
Ł 149 0x95 0225 U+0141 LATIN CAPITAL LETTER L WITH STROKE | ||
Œ 150 0x96 0226 U+0152 LATIN CAPITAL LIGATURE OE | ||
Š 151 0x97 0227 U+0160 LATIN CAPITAL LETTER S WITH CARON | ||
Ÿ 152 0x98 0230 U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS | ||
Ž 153 0x99 0231 U+017D LATIN CAPITAL LETTER Z WITH CARON | ||
ı 154 0x9a 0232 U+0131 LATIN SMALL LETTER DOTLESS I | ||
ł 155 0x9b 0233 U+0142 LATIN SMALL LETTER L WITH STROKE | ||
œ 156 0x9c 0234 U+0153 LATIN SMALL LIGATURE OE | ||
š 157 0x9d 0235 U+0161 LATIN SMALL LETTER S WITH CARON | ||
ž 158 0x9e 0236 U+017E LATIN SMALL LETTER Z WITH CARON | ||
� 159 0x9f 0237 U+FFFD UNDEFINED | ||
€ 160 0xa0 0240 U+20AC EURO SIGN |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#include <qpdf/QUtil.hh> | ||
#include <qpdf/QPDFObjectHandle.hh> | ||
#include <iostream> | ||
#include <stdlib.h> | ||
#include <string.h> | ||
|
||
static char const* whoami = 0; | ||
|
||
void usage() | ||
{ | ||
std::cerr << "Usage: " << whoami << " infile" << std::endl; | ||
exit(2); | ||
} | ||
|
||
int main(int argc, char* argv[]) | ||
{ | ||
if ((whoami = strrchr(argv[0], '/')) == NULL) | ||
{ | ||
whoami = argv[0]; | ||
} | ||
else | ||
{ | ||
++whoami; | ||
} | ||
// For libtool's sake.... | ||
if (strncmp(whoami, "lt-", 3) == 0) | ||
{ | ||
whoami += 3; | ||
} | ||
|
||
if (argc != 2) | ||
{ | ||
usage(); | ||
} | ||
char const* infilename = argv[1]; | ||
std::list<std::string> lines = | ||
QUtil::read_lines_from_file(infilename); | ||
for (std::list<std::string>::iterator iter = lines.begin(); | ||
iter != lines.end(); ++iter) | ||
{ | ||
QPDFObjectHandle str = QPDFObjectHandle::newString(*iter); | ||
std::cout << str.getUTF8Value() << std::endl; | ||
} | ||
return 0; | ||
} |