Receiving badarg converting list to reference on GitHub workflow and EUnit test

williamthome · May 21, 2023, 12:16am

Hey all.

I have a falling eunit test on GitHub, but locally all tests are ok.

The GitHub action log:

Run rebar3 do eunit, ct
  
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling changeset
===> Performing EUnit tests...
.............F..
Failures:

  1) changeset_reference_validator:validate_change_test/0: module 'changeset_reference_validator'
     Failure/Error: {error,badarg,
                        [{erlang,list_to_ref,
                             ["#Ref<0.4192537678.4073193475.71181>"],
                             [{error_info,#{module => erl_erts_errors}}]},
                         {changeset_reference_validator,validate_change_test,
                             0,
                             [{file,
                                  "/__w/changeset/changeset/src/validators/changeset_reference_validator.erl"},
                              {line,35}]},
                         {eunit_test,'-mf_wrapper/2-fun-0-',2,
                             [{file,"eunit_test.erl"},{line,273}]},
                         {eunit_test,run_testfun,1,
                             [{file,"eunit_test.erl"},{line,71}]},
                         {eunit_proc,run_test,1,
                             [{file,"eunit_proc.erl"},{line,531}]},
                         {eunit_proc,with_timeout,3,
                             [{file,"eunit_proc.erl"},{line,356}]},
                         {eunit_proc,handle_test,2,
                             [{file,"eunit_proc.erl"},{line,514}]},
                         {eunit_proc,tests_inorder,3,
                             [{file,"eunit_proc.erl"},{line,456}]}]}
     Output: 

Finished in 0.178 seconds
16 tests, 1 failures
===> Error running tests
Error: Process completed with exit code 1.

This is falling

list_to_ref("#Ref<0.4192537678.4073193475.71181>").

but why?

This is my machine log:

rebar3 do eunit, ct
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling changeset
===> Performing EUnit tests...
................
Finished in 0.113 seconds
16 tests, 0 failures
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling changeset
===> Running Common Test suites...
All 0 tests passed.

GitHub workflow and my machine are using the same 25.3.2 OTP version.

The lib repo: GitHub - williamthome/changeset: An OTP library to validate data based on Ecto changeset library (Elixir).

I appreciate any help.

fancycade · May 21, 2023, 1:42am

Not much help but I reproduced your CI issue locally on my local machine (FreeBSD).

~~Maybe a missing dependency?~~

eunit failed, but everything else worked.

fancycade · May 21, 2023, 2:09am

Submitted a PR: Fix list_to_ref string in unit test by fancycade · Pull Request #1 · williamthome/changeset · GitHub

Passing build: build #993137 - success

williamthome · May 21, 2023, 7:24am

Thanks, @fancycade!

make_ref() makes much more sense here.
BTW, I got the reference value from the list_to_ref example in Erlang docs:

> list_to_ref("#Ref<0.4192537678.4073193475.71181>").
#Ref<0.4192537678.4073193475.71181>

Sounds strange to me why this happened. Any idea?

williamthome · May 21, 2023, 9:30am

Oh, there is a big warning in the docs:

This BIF is intended for debugging and is not to be used in application programs.

I will be considering this as the answer to my previous question.

fancycade · May 21, 2023, 4:44pm

I noticed that warning as well, but doesn’t explain why the Erlang docs ref doesn’t work.

list_to_ref(ref_to_list(make_ref()))

As long as you use a different ref it works.

My guess was a breaking change with ref formats since the docs were published, but maybe there is a difference deeper down the stack. What OS are you running?

williamthome · May 21, 2023, 5:34pm

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"

fancycade · May 21, 2023, 9:16pm

Hmmm i wouldnt expect that to cause issues. Maybe someone more experienced can solve the mystery.

gorillainduction · May 23, 2023, 1:51pm

<tl;dr> Yes, this is correct. Use this is for debugging and for logs exclusively. </tl;dr>

This is from memory, so it might not be totally correct, but rather an educated guess.

As I recall it, the first number in the reference is the node id (e.g., 0 for the local node), and then there is a scheduler id embedded in the reference so that you don’t have to aquire a lock when creating a reference in a scheduler. There are other things going on as well, but I think this is the culprit.

So, if you create a reference on a node with a lot of schedulers and you take the textual representation of the ref to another node with fewer schedulers, you might have created a ref that refers to a higher scheduler id than is avaliable on that node, but the 0 at the start tells the node that this should be treated as a ref that was created at the local node.

When sending refs that are not textual inbetween nodes, there will be a new node number in the ref on the remote node, and it will work anyway.

So, If I would hazard a guess, you were running the test on a decent development machine with a bunch of cores (and therefore a bunch of schedulers), and the ci system is running on fewer cores and hence fewer schedulers.

As an experiment and as anecdotal proof

1> list_to_ref("#Ref<0.4192537678.4073193475.71181>").
#Ref<0.4192537678.4073193475.71181>
2> erlang:system_info(schedulers).
3

and

1> list_to_ref("#Ref<0.4192537678.4073193475.71181>").
** exception error: bad argument
     in function  list_to_ref/1
        called as list_to_ref("#Ref<0.4192537678.4073193475.71181>")
        *** argument 1: not a textual representation of a reference
2> erlang:system_info(schedulers).
2

williamthome · May 24, 2023, 12:24am

Interesting! I believe this is correct and will accept it as the solution.
Glad to hear an experienced Erlanger.
Thanks for the informative post, @gorillainduction!