Jump to content

Talk:Ascii85

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

RFC 1924 is an April's Fool. Either the section about RFC 1924 should be removed or it should be clearly marked that this is a proper RFC. -- AP 27.12.2013. — Preceding unsigned comment added by 213.157.86.178 (talk) 13:27, 27 December 2013 (UTC)[reply]


I think Base85 would be a more natural name for this article.

The German wikipedia has a more complete article titled Base85 at http://de.wikipedia.org/wiki/Base85 which should be reconciled with this one. (But I can't read German to any significant degree, so I'm not a good candidate for doing so.)

A version of Base85 is used in the Git system.

A version of Base85 for encoding IPv6 addresses is defined in RFC 1924 (published April 1, 1996).

64.81.70.227 02:13, 28 November 2006 (UTC)[reply]

"A version of Base85 is used in the Git system." Really? Where? I don't believe that's true. 71.41.210.146 (talk) 22:59, 7 January 2008 (UTC)[reply]
Git binary diffs use base 85. Mercurial git-style diffs too. — Preceding unsigned comment added by 193.52.208.118 (talk) 17:06, 12 October 2011 (UTC)[reply]
Git base85.c is there https://raw.github.com/git/git/987460611ab08ebac83ead9a5ee316d3b3823e7f/base85.c and Mercurial base85.c here http://selenic.com/hg/raw-file/1d07bf106c2a/mercurial/base85.c

The Ascii85 encoding as it is described on Wikipedia and in the PLRM (See PostScript) produces incorrect output for trailing bytes. See:

[Example of how the specification fails http://www.suehappycowboy.org/blog/?p=63] and [in archive.org http://web.archive.org/web/20070519011120/http://www.suehappycowboy.org/blog/?p=63]

The main page should include a description that accounts for this blunder as well as the correct algorithm to correctly generate Ascii85 data.

Ascii85 is a better name because the characters used in the encoding are sequential in the Ascii table. Base64 reorders the characters. Also, the PLRM (See PostScript) refers to it as Ascii85.

The specification doesn't fail, it just doesn't give a detailed explanation of how to write a correct decoder. The standard programming trick (pad with maximum "u" characters to compensate for the rounding-down effect of truncation) is already in the article.
As for the name, I don't think much of the sequential argument, looking at the relative popularity of Adobe's usage and the original btoa, I think that's the most common name these days. 71.41.210.146 (talk) 22:59, 7 January 2008 (UTC)[reply]

How can ascii85 encoder gives us 337 characters, not 340? —Preceding unsigned comment added by 195.218.217.69 (talk) 20:11, 15 August 2009 (UTC)[reply]

Size cost & compatibility

[edit]

While I appreciate Verdy_p's contribution, I do not think that the "Size cost of the Adobe and possible compressions" and "Compatibility of the Adobe version of Ascii85 with other text-based protocols" sections add value to the article.

Do we really need these extremely detailed sections? They seem like Original Research to me, and the provided information can, IMHO, only be useful for someone actually implementing the encoding. I have written an implementation myself, and even so, the information seems overly detailed and relatively useless. I would be happy to have an external link to a document that includes these minutiae, but they should not clutter up the article.

This is not original research, given that PostScript (level 3) already supports (and even implements) the special compression (there's no limitation on the number of Ascii-85 compressed sections in the embedded resource, the Ascii85 encoding being usable anywhere a non-encoded section is possible). For Postscript, whose language is ASCII-based, not encoding it saves the 25% cost (in fact about 23-24% in many images or embedded TrueType fonts, when encoding them with the supported "z" extension). And in many binary images and fonts, there are frequently long sections of values that do not require the encoding, and this reduces the cost on average to about 10% in those images (this is independent of the extra compression by generic algorithms like deflate (which also produces slightly better results if the input is not restricted to the 85 characters).
What I meant with original research was the explanation of how file size can be minimized and the calculation of the exact overhead down to a byte. Yes, it is possible to make files smaller that way or more accurately estimate resultant size, but is this documented somewhere? From Wikipedia:OR:

The term "original research" refers to material—such as facts, allegations, ideas, and stories—not already published by reliable sources."

For example, the "Size cost" section details how the encoding size changes in very special circumstances. The lede already mentions the general case of a 25% increase in size, which covers the vast majority of input.

But the 25% metric is not exact even in the simplest case: the effective size depends on whever the input is an exact multiple of 4 bytes or not, and the 4 characters for the required leading and trailing sequences must be counted (even if this counts for less than 0.1% for all inputs larger than 1KB), as well as the extra line-breaks typically introduced after every 75 output characters (including the leading or trailing sequences) which count for about 1.3% (if linebreaks are encoded with a single LF) or 2.6% (if linebreaks are encoded with 2.6%), so that the encoded binary object can be embedded in texts and transporteed in MIME-like protocols that use limited line-lengths.
The 25% metric only applies to a generic Base-85 encoding (which is not implemented even in the btoa or Adobe versions...). The exact sizes only apply to the standard Adobe version (btoa still has different metrics, it is discussed with less details only because btoa is rarely used, but the Adobe version is used very extensively in PostScript).
These sizes are estimated by many Postscript document generators (including printer drivers, and within some PDF generators even if the PDF format is a binary envelope format that should generally not need to embed binary objects in texts, but will typically do the reverse by embedding and compressing texts in the binary envelope).
All these things must be discussed appropriately, because they are part of the differences between the Adobe and btoa variants. Otherwise we could just discuss about Base-85 and no section about btoa or Adobe variants would be necessary... verdy_p (talk) 17:30, 16 August 2010 (UTC)[reply]
To me, "appropriately" means mentioning them in a sentence or two, not devoting more space to it than the entire remainder of the article combined. Apart from it being original research, my argument is that this information is simply unnecessary for the article itself. The article is about Ascii85 not how you can optimize filesize by not using Ascii85 some of the time -- unless this is commonly done and you can provide a source.

I propose that the sections be deleted, though it may be possible to abbreviate some of the content and add it to the existing sections (e.g. the compatibility information could be abbreviated and added to the last paragraph of the "Basic idea" section).

— DataWraith (talk) 16:48, 16 August 2010 (UTC)[reply]
The compatibility is NOT part of the basic idea, because it depends on the exact choice for the alphabet used to represent base-85 digits. It is part of the specification of each variant (just like it is when discussing and comparing the variants of Base64...).
It's good to indicate that Ascii85 does not suit within SGML/XML documents (and in fact none of the variants exposed in the article can be safely embedded in SGML/XML documents), and to know why.
The Basic idea section already points out that Ascii85 makes use of characters that may be reserved in certain contexts, such as text based protocols (or SGML/XML, even though this is not mentioned explicitly). Since Ascii85 is not commonly embedded in, for example, XML, pointing out that it is incompatible is not necessary or useful. This contrasts with Base64 where compatible variants were created and specified, and thus can be usefully discussed.
It it is also expected that other variants are certainly existing that don't have this limitation (I've not found them, so I can't discuss how they do it, as this would really be original research). verdy_p (talk) 17:30, 16 August 2010 (UTC)[reply]
"It is expected" is not enough. Please provide a citation from a reliable source. — DataWraith (talk) 20:09, 16 August 2010 (UTC)[reply]

Why not "Ascii96"?

[edit]

We can use digits d (0 ≤ d ≤ 94) as the ASCII code d + 32, use 95 as "¥", can use all ASCII characters.

base 10 base 96 equals
0 (space) 0 × 960
1 ! 1 × 960
2 " 2 × 960
3 # 3 × 960
4 $ 4 × 960
5 % 5 × 960
6 & 6 × 960
... ... ...
16 0 16 × 960
17 1 17 × 960
... ... ...
25 9 25 × 960
26 : 26 × 960
27 ; 27 × 960
... ... ...
32 @ 32 × 960
33 A 33 × 960
34 B 34 × 960
... ... ...
58 Z 58 × 960
... ... ...
65 a 65 × 960
66 b 66 × 960
... ... ...
90 z 90 × 960
... ... ...
94 ~ 94 × 960
95 ¥ 95 × 960
96 !(space) 1 × 961 + 0 × 960
97 !! 1 × 961 + 1 × 960
98 !" 1 × 961 + 2 × 960
... ... ...
191 1 × 961 + 95 × 960
... ... ...
255 "_ 2 × 961 + 63 × 960
... ... ...
4095 J_ 42 × 961 + 63 × 960
... ... ...
65535 '*_ 7 × 962 + 10 × 961 + 63 × 960
... ... ...
First: ¥ ist not part of ASCII. And the "space character" is not considered as "printable" character. But using your encoding you would need infinite large numbers if your data grows more and more. That's why all these "binary to text" encodings encode data in chunks, usually encoding n octets in n+k characters. For 96 (or 94) "printable ASCII characters" it would need at least n=9 and k=2, so you encode 9 octets in 11 printable characters. That gives you an encoding overhead of 22.22 %. That is a bit better than the 25 % overhead of Base85 (It would save 29 KiByte for each encoded megabyte of binary data), but it would need 72 bit arithmetics which is much more expensive than the 32 bit arithmetics that Base85 needs. --RokerHRO (talk) 11:37, 19 May 2015 (UTC)[reply]
Someone has published a proposed Base91 standard: http://base91.sourceforge.net/. Though one of the strengths of base encoding is being able to store and transfer data in situations where the special/non ASCII characters aren't allowed (0-32, 127-255). The more characters you add to the encoding set, the less cross-platform it becomes. For example, even Base64 has a "Base64URL" alternative, because the former uses unsafe URL characters. In addition, the closer you get to base-256/binary, the less the relative overhead is reduced (where the approximate overheads per base are 16 100%, 32 60%, 64 33%, 85 25%, 94 23%). 174.20.251.155 (talk) 02:19, 2 March 2016 (UTC)[reply]

About Ascii85 example

[edit]

Anything not using the Ascii order can't really be called Ascii85, but a Base85 variant.

Now, about the example: [EDIT]after a second look, it depends on the implementation so there is nothing wrong with it. AveYo (talk) 13:16, 20 June 2015 (UTC)[reply]

For a true Ascii85 encoding, the safest choice would be to pick the last 85 printable characters, code 42{*} to 126{~}.

Only two problematic characters remain for xml, code 60{<} and code 62{>}, which can be replaced with code 40{(} and code 41{)}.

--AveYo (talk) 01:09, 10 June 2015 (UTC)[reply]

I have no problem to rename the article from Ascii85 to Base85. --RokerHRO (talk) 13:50, 10 June 2015 (UTC)[reply]

-- First, says who, and second, Ascii85 is the title of a specific implementation, regardless of its accuracy. But that aside, the page covers more than just Ascii85, so it should probably be renamed. 174.20.251.155 (talk) 02:17, 2 March 2016 (UTC)[reply]

-- It's been 3 years since this dicussion about rename started. Why isn't it getting any traction? Bureaucracy? Ssg (talk) 00:06, 17 June 2018 (UTC)[reply]