Segfault when calling nif after ct:print statement

Hello y’all! I’m prototyping something and ended up writing a nif to call tree-sitter, I took the opportunity to learn C and understand how NIFs work.
With a bit of help from copilot and ELisp I ended up writing almost all the APIs from tree-sitter; everything was good until I got a segfault.
After moving pieces around I found that calling ct:pal would “kind of flush” the resources, curiously, not all the resources, but the Node type ones. Here is the repo for the lib GitHub - cfclavijo/erl_ts at develop

I added a test to replicate this problem (initially I thought it was related to the function erl_ts:node_end_byte, thus the name)

node_end_byte_segfault_post_print(_Config) ->
  SC = "fun(A)->1+2.",
  {ok, Parser} = erl_ts:parser_new(),
  {ok, Lang} = erl_ts:tree_sitter_erlang(),
  true = erl_ts:parser_set_language(Parser, Lang),
  Tree = erl_ts:parser_parse_string(Parser, SC),
  RootNode = erl_ts:tree_root_node(Tree),
  ct:print(default, ?LOW_IMPORTANCE, "this ends in segfault wtf", [], []),
  %% erl_ts:tree_language(Tree), %% this makes the subsequent calls to erl_ts work

  %% RootNodeB = erl_ts:tree_root_node(Tree), %% this also makes the subsequent calls to erl_ts work
  FunDeclNode = erl_ts:node_child(RootNode, 0), %% here comes the segfault

running rebar3 ct not always throws a segfault, tho :confused:
If I remove the ct:print or call the nif passing a different type of reference (Like the Tree), everything works.
Any Idea what could be happening?
(Suggestions on how to improve the C code are welcome as well :smile: )

Thanks!

1 Like

The segfault you’re encountering is likely due to resource management or memory handling issues in your NIF implementation. This kind of behavior is common when there’s a mismatch between how resources are allocated, referenced, and freed, particularly in the interaction between C and the Erlang VM.

It seems like the issue arises with how the RootNode is being handled. The ct:print call might be changing the execution context slightly - such as by triggering a garbage collection or altering resource timing - which could explain why it affects the behavior. Similarly, when you pass a different reference (like the Tree) or re-fetch the RootNode, it might “refresh” or reset the resource, avoiding the segfault.

There are a few things to consider here:

First, look into how you’re managing the lifetime of RootNode and Tree in your C code. If the RootNode depends on the Tree in any way, ensure that the Tree is still valid and hasn’t been released when you try to access the RootNode. If Tree gets garbage-collected or invalidated while RootNode is still in use, it could lead to undefined behavior and the segfault you’re seeing.

Also, pay attention to the possibility of resource ownership issues. If you’re using enif_alloc_resource, make sure the corresponding enif_release_resource is being called appropriately. Any mismatch in resource allocation and deallocation can cause problems like dangling pointers or double frees.

Since running “rebar3 ct” doesn’t always throw the segfault, there might also be a concurrency or timing issue. Parallel test execution could expose subtle race conditions in your resource handling code. Tools like Valgrind or AddressSanitizer can be very helpful for catching these types of bugs.

Adding detailed debug output to your NIF can also help track resource allocation and deallocation. Logging every resource’s creation, use, and release will give you a clearer picture of what’s happening under the hood.

Finally, try to reproduce the issue with a minimal, standalone example. Simplifying the problem can help isolate the root cause, making it easier to identify where things are going wrong in the C implementation. If you can pinpoint the exact function or line in the C code where the segfault occurs, it’ll be much easier to debug.