Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Events and Styles String Mutation, closes #77 #122

Merged
merged 4 commits into from
Mar 13, 2022

Conversation

TFSThiagoBR98
Copy link
Collaborator

Closes #77

  • Update webidl_binder to latest upstrea;
  • Copy received char* to new pointer and use it;
  • Make an easier way to get styles and events from worker;

@TheOneric

@TheOneric
Copy link
Member

TheOneric commented Feb 1, 2022

For most libass APIs the char* strings remain owned by the caller with libass doing internal copies if needed. This means the caller must if necessary free the memory at some point after the libass API call finished. The track and event modification/creation APIs are different because they work close to library internals (which is also why their usage comes with more restrictions/obligations than other APIs and why they don't sensibly work with ass.h alone). With always implicitly allocating new memory for char*s being passed and as far as I can see never freeing it, this would potentially result in memory leaks. Even if this happens to work for all currently called APIs (haven't checked), this is bound to lead to leaks at some point in the future. (EDIT: Missed that the patch only applies to struct member assignments)
I think it's better to explicitly do the allocations at our JS or C++ code when invoking the affected APIs.

E.g. the following APIs do not aquire ownership over the passed char*s and char**s (incomplete list):

  • ass_set_fonts_dir
  • ass_set_style_overrides
  • ass_set_fonts
  • ass_process_data
  • ass_process_codec_private
  • ...

As for using Promise: is this available in all targeted browser engines (cc @dmitrylyzo)? Afaict we only use Promises inside the lossy render's rendering method right now and this mode is documented to not properly work with some(?) engines.

@TFSThiagoBR98
Copy link
Collaborator Author

As for using Promise: is this available in all targeted browser engines (cc @dmitrylyzo)? Afaict we only use Promises inside the lossy render's rendering method right now and this mode is documented to not properly work with some(?) engines.

Promise API is available in this browsers

@TFSThiagoBR98
Copy link
Collaborator Author

TFSThiagoBR98 commented Feb 1, 2022

For most libass APIs the char* strings remain owned by the caller with libass doing internal copies if needed. This means the caller must if necessary free the memory at some point after the libass API call finished. The track and event modification/creation APIs are different because they work close to library internals (which is also why their usage comes with more restrictions/obligations than other APIs and why they don't sensibly work with ass.h alone). With always implicitly allocating new memory for char*s being passed and as far as I can see never freeing it, this would potentially result in memory leaks. Even if this happens to work for all currently called APIs (haven't checked), this is bound to lead to leaks at some point in the future.

This changes and bug only affects struct manipulation. So the function calls remains the same.
The problem I have now is i can't free the original pointer, because it is been used in somewhere, it crashes (memory violation) on a fribidi function when i run free() on it.

I think it's better to explicitly do the allocations at our JS or C++ code when invoking the affected APIs.

Well, the original version do exactly that, it allocate the new string in the memory (JS Part) and set this pointer in the struct parameter. But, for some reason, it set the same address to all pointers in the struct.

@TFSThiagoBR98 TFSThiagoBR98 marked this pull request as draft February 1, 2022 02:28
@TFSThiagoBR98
Copy link
Collaborator Author

TFSThiagoBR98 commented Feb 1, 2022

I did some debug in the WASM file to check why it overwrite all pointers, but it just allocate the address in the memory position. As it should

PS: Soon I will send a PR to make it compile in debug mode with some docs

@dmitrylyzo
Copy link
Contributor

dmitrylyzo commented Feb 1, 2022

As for using Promise: is this available in all targeted browser engines (cc @dmitrylyzo)? Afaict we only use Promises inside the lossy render's rendering method right now and this mode is documented to not properly work with some(?) engines.

In webOS 1.2 (WebKit ~ Chrome 27 or even 26), there is no Promise in the worker out of box:
jellyfin/jellyfin-web#3144 (comment)
jellyfin/jellyfin-web#3335
But there is a possibility that Emscripten polyfills it. Can check when I get home.

webOS 2 has Promise in the main thread (not sure for the worker), but it is a little buggy - it requires an argument for Promise.resolve (at least passing undefined), iirc. And more jellyfin/jellyfin-web#1119

This will probably not cause an error at the parsing stage, but only at runtime.

⚠️ All of the above is true for emulators, so it may be better in life.

UPD:

from #118 (comment)
https://webostv.developer.lge.com/discover/specifications/web-engine/
webOS 1.2 emulator (WebKit 538.2) - no Promise, no createImageBitmap.
webOS 2.0 emulator (WebKit 537.41) - has Promise (#122 (comment)), no createImageBitmap.
webOS 3.0 emulator (Chrome 38) - has Promise, no createImageBitmap.
webOS 4.0 emulator (Chrome 53) - has Promise, has createImageBitmap.

Emscripten polyfills Promise - the first line in the legacy worker.
It seems to replace the original regardless of support - which is good for webOS 2, but I would prefer not to replace it in 3+.

So the problem is the lack of Promise in subtitle-octopus.js (main thread). Jellyfin uses core-js (?) for polyfill.
You can simply require that Promise be polyfilled by the JSO user himself, since it will probably be required anyway by the JSO user application. In this way, you minimize polyfill duplication.

@TheOneric
Copy link
Member

This changes and bug only affects struct manipulation. So the function calls remains the same.

Sorry, I thought this applied to all char*s being passed. If it only applies to struct member assignments, it is unlikely we add an API where members of a library struct are not owned by the libass, so this is unlikely to cause memory leaks.
However, I still believe an explicit malloc of the strings is preferable over relying on a patched webidl_binder.py; the need for a patch is not obvious and might be forgotten on future updates thus sneakily reintroducing the bug.

@TFSThiagoBR98
Copy link
Collaborator Author

Sorry, I thought this applied to all char*s being passed. If it only applies to struct member assignments, it is unlikely we add an API where members of a library struct are not owned by the libass, so this is unlikely to cause memory leaks. However, I still believe an explicit malloc of the strings is preferable over relying on a patched webidl_binder.py; the need for a patch is not obvious and might be forgotten on future updates thus sneakily reintroducing the bug.

Technically, we still need this patch for IntPtr support, but if you want, we can remove then from the WebIDL file and manualy inject into a new js/cpp files

ASS_Track.prototype['get_name'] = ASS_Track.prototype.get_name = /** @suppress {undefinedVars, duplicate} @this{Object} */function () {
  var self = this.ptr;
  return UTF8ToString(_emscripten_bind_ASS_Track_get_name_0(self));
};
ASS_Track.prototype['set_name'] = ASS_Track.prototype.set_name = /** @suppress {undefinedVars, duplicate} @this{Object} */function (arg0) {
  var self = this.ptr;
  ensureCache.prepare();
  if (arg0 && typeof arg0 === 'object') arg0 = arg0.ptr;
  else arg0 = ensureString(arg0);
  _emscripten_bind_ASS_Track_set_name_1(self, arg0);
};
Object.defineProperty(ASS_Track.prototype, 'name', { get: ASS_Track.prototype.get_name, set: ASS_Track.prototype.set_name });
char* EMSCRIPTEN_KEEPALIVE emscripten_bind_ASS_Track_get_name_0(ASS_Track* self) {
  return self->name;
}

void EMSCRIPTEN_KEEPALIVE emscripten_bind_ASS_Track_set_name_1(ASS_Track* self, char* arg0) {
  char* copy = (char*)malloc(strlen(arg0) + 1); 
  strcpy(copy, arg0);
  self->name = copy;
}

@ThaUnknown
Copy link

any updates on this? it's the last thing that's needed for this lib to support video streaming rather than static content

@TFSThiagoBR98
Copy link
Collaborator Author

TFSThiagoBR98 commented Feb 13, 2022

@ThaUnknown Currently its "ready", but i need to do some changes in the webidl generator and i need to do more tests.

There is a test implementation here if you want to try it out

@TFSThiagoBR98 TFSThiagoBR98 force-pushed the fix-77 branch 7 times, most recently from 0795461 to 48d5443 Compare February 15, 2022 01:33
@TFSThiagoBR98
Copy link
Collaborator Author

@TheOneric I recreated the webidl_binder history, from the original version in the Emscripten repository and added headers to explain the patches.

@TFSThiagoBR98 TFSThiagoBR98 marked this pull request as ready for review February 15, 2022 01:49
@TheOneric
Copy link
Member

TheOneric commented Feb 17, 2022

I recreated the webidl_binder history, from the original version in the Emscripten repository and added headers to explain the patches.

Thanks! This already makes the patches much clearer.
As i understand it the integer pointer and ByteString patches are already required now to build a working JSO — meaning this series has 4 commits which do not build. Would it be possible to avoid this and eg factor out the patches as patch files? Either with the patches applied at build time like for the libraries or pre-applied but keep the patchfiles (and comment) for documentation (and to speed up patching future upstream versions).
Of course ideally we could do without any patches to WebIDL, but I don't know how feasible if even possible this is for the first two patches.

Regarding the 3rd “malloc/strcpy on struct string setters” patch, it would be nice if the “why doesn't this cause memleaks” could be documented in the patchfile description if we go that route or otherwise at least the commit message.

@TheOneric
Copy link
Member

Regarding Promise here are relevant quotes from libass' IRC-channel:

<Oneric> rcombs: re JSO, does the engine-baseline you want to target include Promise?
         (see: https://github.com/libass/JavascriptSubtitlesOctopus/pull/122#issuecomment-1026548563 )
<rcombs> Oneric: honestly if the current version supports it then we should maintain support until noted otherwise;
         this lib has corporate usage with hard compat requirements (and I intended to add more)
<rcombs> Promise is an easy one to polyfill though

@ThaUnknown
Copy link

why not just use a callback, the user can easily promisify it and you're breaking compat anyways so it's not like that matters

@TFSThiagoBR98
Copy link
Collaborator Author

@ThaUnknown changed to callbacks, thanks.

@TheOneric I squash those commits and add the patches to the webidl_binder directory. Please check if the malloc commit text is correct and well explained.

@TFSThiagoBR98
Copy link
Collaborator Author

If someone want to do a final checkup please say now, if there is no problem i'll merge it on Saturday.

@TheOneric
Copy link
Member

Sorry for my late reply.

Having the patch files is good and none of the commits should break the build. I would still prefer the struct string setters to be explicitly malloced in the relevant code sections instead of patching this to happen unconditionally in the WebIDL binder. It currently seems unlikely that this might cause problems, but that isn't the same as it being impossible. However we can try this way if you prefer.
Please do not @ me in the commit itself, as this will lead to repeated annoying notifications whenever someone backports this commit to eg a fork.

A more concise commit and patch description for the struct setter could be eg:

Currently we use `char*` struct setters only with libass API where the
pointer will then be owned by the library, but we did not make sure the
pointers even remained valid after the assignment.
Allocating new memory and copying the content fixes this.
<insert reason why we don't do this explicitly in the code,
 but with a WebIDL patch>

fixes https://github.com/libass/JavascriptSubtitlesOctopus/issues/77

@TFSThiagoBR98
Copy link
Collaborator Author

TFSThiagoBR98 commented Mar 2, 2022

Please do not @ me in the commit itself, as this will lead to repeated annoying notifications whenever someone backports this commit to eg a fork.

Sorry for that.

Having the patch files is good and none of the commits should break the build. I would still prefer the struct string setters to be explicitly malloced in the relevant code sections instead of patching this to happen unconditionally in the WebIDL binder. It currently seems unlikely that this might cause problems, but that isn't the same as it being impossible. However we can try this way if you prefer.

After further investigation, i get where the bug occurs, in the WebIDL generated glue code, in the ensureCache, the pointer stays the same in every setter.

The prepare function, make use of the same pointer in every run.

SubOctpInterface.js

var ensureCache = {
  buffer: 0,  // the main buffer of temporary storage
  size: 0,   // the size of buffer
  pos: 0,    // the next free offset in buffer
  temps: [], // extra allocations
  needed: 0, // the total size we need next time

  prepare: function() {
    if (ensureCache.needed) {
      // clear the temps
      for (var i = 0; i < ensureCache.temps.length; i++) {
        Module['_free'](ensureCache.temps[i]);
      }
      ensureCache.temps.length = 0;
      // prepare to allocate a bigger buffer
      Module['_free'](ensureCache.buffer);
      ensureCache.buffer = 0;
      ensureCache.size += ensureCache.needed;
      // clean up
      ensureCache.needed = 0;
    }
    if (!ensureCache.buffer) { // happens first time, or when we need to grow
      ensureCache.size += 128; // heuristic, avoid many small grow events
      ensureCache.buffer = Module['_malloc'](ensureCache.size);
      assert(ensureCache.buffer);
    }
    ensureCache.pos = 0;
  },
  alloc: function(array, view) {
    assert(ensureCache.buffer);
    var bytes = view.BYTES_PER_ELEMENT;
    var len = array.length * bytes;
    len = (len + 7) & -8; // keep things aligned to 8 byte boundaries
    var ret;
    if (ensureCache.pos + len >= ensureCache.size) {
      // we failed to allocate in the buffer, ensureCache time around :(
      assert(len > 0); // null terminator, at least
      ensureCache.needed += len;
      ret = Module['_malloc'](len);
      ensureCache.temps.push(ret);
    } else {
      // we can allocate in the buffer
      ret = ensureCache.buffer + ensureCache.pos;
      ensureCache.pos += len;
    }
    return ret;
  },
  copy: function(array, view, offset) {
    offset >>>= 0;
    var bytes = view.BYTES_PER_ELEMENT;
    switch (bytes) {
      case 2: offset >>>= 1; break;
      case 4: offset >>>= 2; break;
      case 8: offset >>>= 3; break;
    }
    for (var i = 0; i < array.length; i++) {
      view[offset + i] = array[i];
    }
  },
};

Override the ensureCache value is easy and not require any patches to webidl_binder

Setter example:

ASS_Style.prototype['set_Name'] = ASS_Style.prototype.set_Name = /** @suppress {undefinedVars, duplicate} @this{Object} */function(arg0) {
  var self = this.ptr;
  ensureCache.prepare();
  if (arg0 && typeof arg0 === 'object') arg0 = arg0.ptr;
  else arg0 = ensureString(arg0);
  _emscripten_bind_ASS_Style_set_Name_1(self, arg0);
};

Add new function fetchFromWorker with callback to allow fetch the response from worker
Add callbacks to getStyles and getEvents functions
@TFSThiagoBR98 TFSThiagoBR98 force-pushed the fix-77 branch 2 times, most recently from a37619b to bcce36d Compare March 11, 2022 18:05
Copy link
Member

@TheOneric TheOneric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept good though I'm not familiar enough with WebIDL to comment on the patch details; I'm trusting in your testing and experience with WebIDL.

Just two minor nits, feel free to ignore them if you disagree: can you also add a comment for the "Owner" patch to WebIDL.py as you did in previous versions? And can you remove the src/SubtitleOctopus.idl bits from the patchfile? Unlike the other parts those won't need to be reapplied when the WebIDL infrastructure gets updated from upstream.

Currently the ensureCache create a temporary pointer that will
transfer the value to the WASM/C part and recycle it with to use in the
next value, but since the libass struct pointers are owned by the
library, the pointers can't be recycled or freed or can lead into
an undefined behavior.

This configure the binder to tranfer the pointer ownership instead of
recycle it.

To avoid creating complex code, I decided to fix it in the webidl
binder.
@TFSThiagoBR98
Copy link
Collaborator Author

can you also add a comment for the "Owner" patch to WebIDL.py as you did in previous versions?

Done

And can you remove the src/SubtitleOctopus.idl bits from the patchfile? Unlike the other parts those won't need to be reapplied when the WebIDL infrastructure gets updated from upstream.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

createEvent broken by mutating strings
4 participants