Erlang term type bif - 'is_...' vs 'erts_internal:term_type/1'

SisMaker · June 13, 2022, 10:37am

A long time ago I wanted to get the type term() in Erlang, and I wrote the following code

dataType(Data) when is_list(Data) -> list;
dataType(Data) when is_integer(Data) -> integer;
dataType(Data) when is_binary(Data) -> binary;
dataType(Data) when is_function(Data) -> function;
dataType(Data) when is_tuple(Data) -> tuple;
dataType(Data) when is_map(Data) -> map;
dataType(Data) when is_atom(Data) -> atom;
%%dataType(Data) when is_boolean(Data) -> boolean;
dataType(Data) when is_bitstring(Data) -> bitstring;
dataType(Data) when is_float(Data) -> float;
dataType(Data) when is_number(Data) -> number;
dataType(Data) when is_reference(Data) -> reference;
dataType(Data) when is_pid(Data) -> pid;
dataType(Data) when is_port(Data) -> port;
dataType(_Data) -> not_know.

But today I found a new function erts_internal:term_type/1
But I saw it come true

BIF_RETTYPE erts_internal_term_type_1(BIF_ALIST_1) {
    Eterm obj = BIF_ARG_1;
    switch (primary_tag(obj)) {
        case TAG_PRIMARY_LIST:
            BIF_RET(ERTS_MAKE_AM("list"));
        case TAG_PRIMARY_BOXED: {
            Eterm hdr = *boxed_val(obj);
            ASSERT(is_header(hdr));
            switch (hdr & _TAG_HEADER_MASK) {
                case ARITYVAL_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("tuple"));
                case FUN_SUBTAG:
                    {
                        ErlFunThing *funp = (ErlFunThing *)fun_val(obj);
                        if (is_local_fun(funp)) {
                            BIF_RET(ERTS_MAKE_AM("fun"));
                        } else {
                            ASSERT(is_external_fun(funp) && funp->next == NULL);
                            BIF_RET(ERTS_MAKE_AM("export"));
                        }
                    }
                case MAP_SUBTAG:
                    switch (MAP_HEADER_TYPE(hdr)) {
                        case MAP_HEADER_TAG_FLATMAP_HEAD :
                            BIF_RET(ERTS_MAKE_AM("flatmap"));
                        case MAP_HEADER_TAG_HAMT_HEAD_BITMAP :
                        case MAP_HEADER_TAG_HAMT_HEAD_ARRAY :
                            BIF_RET(ERTS_MAKE_AM("hashmap"));
                        case MAP_HEADER_TAG_HAMT_NODE_BITMAP :
                            BIF_RET(ERTS_MAKE_AM("hashmap_node"));
                        default:
                            erts_exit(ERTS_ABORT_EXIT, "term_type: bad map header type %d\n", MAP_HEADER_TYPE(hdr));
                    }
                case REFC_BINARY_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("refc_binary"));
                case HEAP_BINARY_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("heap_binary"));
                case SUB_BINARY_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("sub_binary"));
                case BIN_MATCHSTATE_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("matchstate"));
                case POS_BIG_SUBTAG:
                case NEG_BIG_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("bignum"));
                case REF_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("reference"));
                case EXTERNAL_REF_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("external_reference"));
                case EXTERNAL_PID_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("external_pid"));
                case EXTERNAL_PORT_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("external_port"));
                case FLOAT_SUBTAG:
                    BIF_RET(ERTS_MAKE_AM("hfloat"));
                default:
                    erts_exit(ERTS_ABORT_EXIT, "term_type: Invalid tag (0x%X)\n", hdr);
            }
        }
        case TAG_PRIMARY_IMMED1:
            switch (obj & _TAG_IMMED1_MASK) {
                case _TAG_IMMED1_SMALL:
                    BIF_RET(ERTS_MAKE_AM("fixnum"));
                case _TAG_IMMED1_PID:
                    BIF_RET(ERTS_MAKE_AM("pid"));
                case _TAG_IMMED1_PORT:
                    BIF_RET(ERTS_MAKE_AM("port"));
                case _TAG_IMMED1_IMMED2:
                    switch (obj & _TAG_IMMED2_MASK) {
                        case _TAG_IMMED2_ATOM:
                            BIF_RET(ERTS_MAKE_AM("atom"));
                        case _TAG_IMMED2_CATCH:
                            BIF_RET(ERTS_MAKE_AM("catch"));
                        case _TAG_IMMED2_NIL:
                            BIF_RET(ERTS_MAKE_AM("nil"));
                        default:
                            erts_exit(ERTS_ABORT_EXIT, "term_type: Invalid tag (0x%X)\n", obj);
                    }
                default:
                    erts_exit(ERTS_ABORT_EXIT, "term_type: Invalid tag (0x%X)\n", obj);
            }
        default:
            erts_exit(ERTS_ABORT_EXIT, "term_type: Invalid tag (0x%X)\n", obj);
    }
}

if the return result BIF_RET(ERTS_MAKE_AM(“XXX”)) write to as BIF_RET(am_xxx) is more better

rickard · June 13, 2022, 1:11pm

ERTS_MAKE_AM() will create the atom when called while the am_xxx atoms are created when the system starts. Once an atom is created it cannot be removed. We typically use ERTS_MAKE_AM() when creating atoms that are not frequently used, since we do not want to unnecessarily create atoms.

rickard · June 13, 2022, 1:14pm

By the way, you typically do not want to use erts_internal:term_type() since it is erts internal. It may be changed, or removed at any time without any notice.

SisMaker · June 13, 2022, 2:49pm

Add the term type atom definition It should be acceptable and not too much count， all term types should be explained in the document

SisMaker · June 13, 2022, 2:51pm

i see the Erlang/OTP 22 highlight blog say:

Blockquote
Added the NIF function enif_term_type, which helps avoid long sequences of enif_is_xyz by returning the type of the given term. This is especially helpful for NIFs that serialize terms, such as JSON encoders, where it can improve both performance and readability.

I think this reason also applies to Erlang code, in the erlang code also need base the term type to do something such as in the io_lib.erl

Blockquote
write1(_Term, 0, _E) → “…”;
write1(Term, _D, _E) when is_integer(Term) → integer_to_list(Term);
write1(Term, _D, _E) when is_float(Term) → io_lib_format:fwrite_g(Term);
write1(Atom, _D, latin1) when is_atom(Atom) → write_atom_as_latin1(Atom);
write1(Atom, _D, _E) when is_atom(Atom) → write_atom(Atom);
write1(Term, _D, _E) when is_port(Term) → write_port(Term);
write1(Term, _D, _E) when is_pid(Term) → pid_to_list(Term);
write1(Term, _D, E) when is_reference(Term) → write_ref(Term);
write1(<</bitstring>>=Term, D, _E) → write_binary(Term, D);
write1(, _D, _E) → “”;
write1({}, _D, _E) → “{}”;
write1([H|T], D, E) →
if
D =:= 1 → “[…]”;
true →
[$[,[write1(H, D-1, E)|write_tail(T, D-1, E)],$]]
end;
write1(F, _D, _E) when is_function(F) →
erlang:fun_to_list(F);
write1(Term, D, E) when is_map(Term) →
write_map(Term, D, E);
write1(T, D, E) when is_tuple(T) →
if
D =:= 1 → “{…}”;
true →
[${,
[write1(element(1, T), D-1, E)|write_tuple(T, 2, D-1, E)],
$}]
end.

also like this:

Blockquote
toBinary(Value) when is_integer(Value) → integer_to_binary(Value);
toBinary(Value) when is_list(Value) → list_to_binary(Value);
toBinary(Value) when is_float(Value) → float_to_binary(Value, [{decimals, 6}, compact]);
toBinary(Value) when is_atom(Value) → atom_to_binary(Value, utf8);
toBinary(Value) when is_binary(Value) → Value;
toBinary([Tuple | PropList] = Value) when is_list(PropList) and is_tuple(Tuple) →
lists:map(fun({K, V}) → {toBinary(K), toBinary(V)} end, Value);
toBinary(Value) → term_to_binary(Value).
toInteger(undefined) → undefined;
toInteger(Value) when is_float(Value) → trunc(Value);
toInteger(Value) when is_list(Value) → list_to_integer(Value);
toInteger(Value) when is_binary(Value) → binary_to_integer(Value);
toInteger(Value) when is_tuple(Value) → toInteger(tuple_to_list(Value));
toInteger(Value) when is_integer(Value) → Value.

and so on

rickard · June 13, 2022, 3:21pm

Not when it is not intended to be used frequently

Not sure I understand. All term types are documented. Internal functionality should not be documented in the API.

Erlang code is another thing than when it comes to the NIF API accessing functionality in the VM from a dynamically loaded library. In the Erlang code case a term_type() function is seldom useful, you only get more tests using it than when testing the type directly using the guard BIFs.

SisMaker · June 13, 2022, 3:51pm

Term_type /1 and Guard bifs belong to two different functions, and term_type/1 is slower than a guard_bifs. However, if you use guard bifs to test different types and then process them, it should be slower than using trem_type/1 directly. And I suggest that term_type /1 C code use BIF_RET(am_xxx) return is to make it execute as fast as possible

rickard · June 13, 2022, 4:01pm

I don’t see why using an optimized term_type() BIF should be faster except in strange cases such as when just printing the term type. Rewriting the io_lib code, that you referred to, to use a term_type() would just get slower since you need to test the type twice. First in the term_type() BIF and then in Erlang code testing the result from the call to term_type().

SisMaker · June 13, 2022, 4:22pm

i wrrite the test code,you are right:

Blockquote
-define(types, [1, 1.1, , [1], {1}, #{}, <<“123”>>, self()]).
type1() →
[dataType(One) || One ← ?types],
ok.
dataType(Data) when is_list(Data) → list_do;
dataType(Data) when is_integer(Data) → integer_do;
dataType(Data) when is_binary(Data) → binary_do;
dataType(Data) when is_function(Data) → function_do;
dataType(Data) when is_tuple(Data) → tuple_do;
dataType(Data) when is_atom(Data) → atom_do;
dataType(Data) when is_float(Data) → float_do;
dataType(Data) when is_number(Data) → number_do;
dataType(Data) when is_pid(Data) → pid_do;
dataType(Data) when is_port(Data) → port_do;
dataType(Data) → not_know_do.
type2() →
[dataType(erts_internal:term_type(One), One) || One ← ?types],
ok.
dataType(nil, Data) when is_list(Data) → list_do;
dataType(list, Data) when is_list(Data) → list_do;
dataType(fixnum, Data) when is_integer(Data) → integer_do;
dataType(bignum, Data) when is_integer(Data) → integer_do;
dataType(refc_binary, Data) when is_binary(Data) → binary_do;
dataType(heap_binary, Data) when is_binary(Data) → binary_do;
dataType(sub_binary, Data) when is_binary(Data) → binary_do;
dataType(tuple, Data) when is_tuple(Data) → tuple_do;
dataType(atom, Data) when is_atom(Data) → atom_do;
dataType(hfloat, Data) when is_float(Data) → float_do;
dataType(pid, Data) when is_pid(Data) → pid_do;
dataType(pid, Data) when is_pid(Data) → pid_do;
dataType(external_pid, Data) when is_port(Data) → port_do;
dataType(, _Data) → not_know_do.

the test result:

Blockquote
26> utTc:ts(1000000, funTest, type1, ).
=====================
execute Args:
execute Fun :type1
execute Mod :funTest
execute LoopTime:1000000
MaxTime: 3263392(ns) 0.003263(s)
MinTime: 114(ns) 0.0(s)
SumTime: 159286844(ns) 0.159287(s)
AvgTime: 159.286844(ns) 0.0(s)
Grar : 20396(cn) 0.02(%)
Less : 979604(cn) 0.98(%)
=====================
ok
27> utTc:ts(1000000, funTest, type2, ).
=====================
execute Args:
execute Fun :type2
execute Mod :funTest
execute LoopTime:1000000
MaxTime: 6471031(ns) 0.006471(s)
MinTime: 654(ns) 0.000001(s)
SumTime: 736976879(ns) 0.736977(s)
AvgTime: 736.976879(ns) 0.000001(s)
Grar : 27301(cn) 0.03(%)
Less : 972699(cn) 0.97(%)
=====================
ok

the term_type/1 is much slower, That was a bit of a surprise to me