Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on base64-js #339

Open
jonkoops opened this issue Jan 17, 2024 · 20 comments
Open

Remove dependency on base64-js #339

jonkoops opened this issue Jan 17, 2024 · 20 comments
Assignees

Comments

@jonkoops
Copy link

The base64-js package has not been maintained for more than 4 years, so it should probably be removed as a dependency.

@dcousens dcousens self-assigned this Jan 23, 2024
@dcousens
Copy link
Collaborator

dcousens commented Jan 23, 2024

@jonkoops do we actually need it for browsers in 2024? Could we simply use atob and btoa now?

@jonkoops
Copy link
Author

You are correct @dcousens, that is in fact what we did for Keycloak.

@dcousens
Copy link
Collaborator

dcousens commented Jan 23, 2024

https://nodejs.org/api/globals.html#atobdata was added in Node 16, so I think we're good if we're assuming LTS for the library (we are)

@jonkoops
Copy link
Author

Yeah, this is also an opportunity to align the implementations between Node.js and the browser, might be a good code-cleanup. Did some quick testing, but I was unable to get things to work in a 15 min time-span (had some trouble running the tests on a browser as well).

@dcousens
Copy link
Collaborator

dcousens commented Feb 5, 2024

@jonkoops if you want to help review #349 🧡

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

I suspect atob and btoa will have more performance overhead than a custom base64 implementation due to the need for string preprocessing (to match node.js behavior) and the extra conversion back and forth between a binary string a buffer.

I don't have proof because I haven't benchmarked it, but the implementation in #349 is probably the fastest/best we can get.

edit: On second thought, if we optimistically parse base64 with atob and fallback to preprocessing on error, it may be faster? Something like:

try {
  return Buffer.from(atob(str), 'binary');
} catch (e) {
  return Buffer.from(atob(removeInvalidCharacters(str)), 'binary')
}

I'll have a crack at it at some point.

@dcousens
Copy link
Collaborator

dcousens commented Feb 5, 2024

Using atob/btoa is strongly preferred, if at least for the first attempt for easier code review.
We can optimize afterwards.

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

Here are the numbers for atob (in node.js):

$ node perf/write-base64.js
BrowserBuffer#write(8192, "base64") x 1,638 ops/sec ±0.50% (93 runs sampled)

Compared to the implementation in #349:

$ node perf/write-base64.js
BrowserBuffer#write(8192, "base64") x 24,542 ops/sec ±0.02% (97 runs sampled)

This was the code used (with the optimized version of asciiWrite):

function base64Write(buf, string, offset, length) {
  return asciiWrite(buf, atob(string), offset, length);
}

If you're scratching your head like I was, look no further than the insanely broken implementation of atob in node.js itself.

This line in particular is the problem:

    const index = ArrayPrototypeIndexOf(
      kForgivingBase64AllowedChars,
      StringPrototypeCharCodeAt(input, n));

In other words, the node.js atob function runs with O(n^2) time complexity! How code this irresponsible made it into the node.js codebase is beyond me.

I'll have to benchmark this in an actual web browser to get some real numbers.

@jonkoops
Copy link
Author

jonkoops commented Feb 5, 2024

In other words, the node.js atob function runs with O(n^2) time complexity! How code this irresponsible made it into the node.js codebase is beyond me.

Oh damn, nice find! I wonder how other run-times such as Bun and Deno compare in this regard. Did you log an issue with the Node.js team?

@dcousens
Copy link
Collaborator

dcousens commented Feb 5, 2024

What is the browser performance like? We could probably still rely on atob and btoa and then log the nodejs issue upstream?

@vweevers
Copy link

vweevers commented Feb 5, 2024

In other words, the node.js atob function runs with O(n^2) time complexity! How code this irresponsible made it into the node.js codebase is beyond me.

Because Buffer exists as the better alternative (from Node.js' perspective that is). The atob function is marked as Legacy:

Stability: 3 - Legacy. Although this feature is unlikely to be removed and is still covered by semantic versioning guarantees, it is no longer actively maintained, and other alternatives are available.

Which explains this comment:

// The implementation here has not been performance optimized in any way and
// should not be.

That said, you could make the argument that this hurts web compatibility, and still file an issue.

@jonkoops
Copy link
Author

jonkoops commented Feb 5, 2024

So perhaps take the Buffer.from() approach for Node.js and see if the perf for atob in browsers is acceptable? Would prevent the need to roll a bunch of custom code.

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

@vweevers, it's still trivial to optimize it with a lookup table. I don't see "legacy" as an excuse to be lazy.

Interesting that it mentions the performance issue in the code but not the docs (where it might be the most helpful).

In any case, filing an issue to have it optimized now does nothing, because we'll never know whether we're using the optimized version or the unoptimized version. @jonkoops' approach makes more sense.

@dcousens, performance in chromium seems optimal. I'm seeing a 1.4x speedup compared to the JS impl. I don't know if I'll get around to testing firefox anytime soon as I don't have good tooling to run FF headless, but I'll start working on a revision of #349 that uses atob in the browser and Buffer.from -> toString in node.

@dcousens
Copy link
Collaborator

dcousens commented Feb 5, 2024

I have added a comment in nodejs/node#38433 (comment), and I suspect that's where we can leave it. We are targeting browsers, and the fact that atob and btoa is slow in node is only a loss for node, not this library.

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

Just an update: while using atob is faster (in chromium), the same can't be said for btoa.

Current optimized branch:
BrowserBuffer#write(8192, "base64") x 49,551 ops/sec ±0.17% (65 runs sampled)
BrowserBuffer#toString(8192, "base64") x 34,617 ops/sec ±0.43% (65 runs sampled)
With atob/btoa:
BrowserBuffer#write(8192, "base64") x 70,796 ops/sec ±14.85% (61 runs sampled)
BrowserBuffer#toString(8192, "base64") x 24,838 ops/sec ±1.50% (59 runs sampled)

(Note the numbers differ from the ones in my PR because I ran this on a faster machine)

We're basically doing this:

function base64Slice (buf, start, end) {
  return btoa(latin1Slice(buf, start, end))
}

The overhead of Buffer -> binary string -> base64 is one of the things I was worried about. I guess we have to make a decision whether we want to keep the base64 serialization code, or take the perf hit and use btoa.

@dcousens
Copy link
Collaborator

dcousens commented Feb 5, 2024

@chjj I think we can probably say that btoa and atob should be faster and better maintained in the long-term

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

New implementation up at #349

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

I made a PR fixing atob in node.js. Let's see if anything happens.

@chjj
Copy link
Contributor

chjj commented Feb 5, 2024

Wow their linter is strict about whitespace. Okay, I'm out of energy on that. If they accept it, they accept it, if not, we'll figure something out.

@jonkoops
Copy link
Author

jonkoops commented Feb 5, 2024

I have to agree with @dcousens that it's not really an issue that some of these functions are slow in Node.js, it is very clear that compatibility and performance with existing popular web APIs is an afterthought there. And it doesn't really matter anyways, as this library will almost always be used in a web context.

Thanks for your hard work @chjj, I appreciate you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants