Variable shadowing constraint of match spec generated using ms_transform

Looks like that is a variable shadowing constraint of match spec that is generated using ms_transform. For example,

get_match_spec(Id) ->
    ets:fun2ms(fun({Id, Content} = Item) -> {Id, [mark | Content]} end).
Warning: variable 'Id' shadowed in ms_transform fun head

get_match_spec(123).
Output: [{{'$1','$2'},[],[{{'$1',[mark|'$2']}}]}]

Thus ets:select_replace can’t efficiently be used to select_replace an item with the given Id. One way is to use guard

get_match_spec(Id) ->
    ets:fun2ms(fun({Id1, Content} = Item) when Id1 == Id -> {Id, [mark | Content]} end).

test_fun2ms:get_match_spec(123).
Output: [{{'$1','$2'},
  [{'==','$1',{const,123}}],
  [{{{const,123},[mark|'$2']}}]}]
But then it needs to scan the whole ets table because the match of the Id is not in the MatchHead.

Is there any way to solve this variable shadowing issue? Otherwise this constraint makes match spec select_replace of little usage.

2 Likes

Why use ets:fun2ms at all?

I’ve always viewed fun2ms as a useful tool for the shell when developing, but never ever used it in production code.

1 Like

Because if you include:

-include_lib("stdlib/include/ms_transform.hrl").

It converts your very readable fun into the matchspec for you at compile time for free, right?

1 Like

I think the problem is the Id in the function, you need to use something (untested) more like:

get_match_spec(Id) ->
  ets:fun2ms(fun({I, Content} = Item) when I == Id -> {I, [mark | Content]} end).

Updated: …now tested:

17> c(test).                 
{ok,test}
18> test:get_match_spec(moo).
[{{'$1','$2'},
  [{'==','$1',{const,moo}}],
  [{{'$1',[mark|'$2']}}]}]

Contents of test.erl is:

-module(test).

-include_lib("stdlib/include/ms_transform.hrl").

-export([get_match_spec/1]).

get_match_spec(Id) ->
  ets:fun2ms(fun({I, Content} = _Item) when I == Id -> {I, [mark | Content]} end).
1 Like

This is a general language quirk with funs. You cannot refer and match already bound variables in the fun’s function head.

Workaround: As @LeonardB hinted, paste the matchspec from fun2ms in your code and substitute '$1' with your Id variable. Maybe keep the fun syntax in a comment for improved readability.

1 Like

Just a small example to show the performance difference between a $X bound match and a value match in the match spec.

Eshell V13.0  (abort with ^G)
1> ets:new(test, [named_table, ordered_set, {keypos, 1}]).
test
2> [ets:insert(test, {X,X}) || X <- lists:seq(1,1_000_000)].
[true,true,true,true,true,true,true,true,true,true,true,
 true,true,true,true,true,true,true,true,true,true,true,true,
 true,true,true,true,true,true|...]
3> ets:fun2ms(fun({I, Content}) when I == 1_000_000 -> {I, [mark | Content]} end).
[{{'$1','$2'},[{'==','$1',1000000}],[{{'$1',[mark|'$2']}}]}]
4> ManualMs = [{{1_000_000,'$1'},[],[{{1_000_000,[mark|'$1']}}]}].
[{{1000000,'$1'},[],[{{1000000,[mark|'$1']}}]}]
5> Ms = [{{'$1','$2'},[{'==','$1',1000000}],[{{'$1',[mark|'$2']}}]}].
[{{'$1','$2'},[{'==','$1',1000000}],[{{'$1',[mark|'$2']}}]}]
6> [io:format("~p~n", [timer:tc(fun() -> ets:select(test, Ms) end)]) || _ <- lists:seq(1,10)].
{132734,[{1000000,[mark|1000000]}]}
{126449,[{1000000,[mark|1000000]}]}
{121694,[{1000000,[mark|1000000]}]}
{122800,[{1000000,[mark|1000000]}]}
{120502,[{1000000,[mark|1000000]}]}
{115237,[{1000000,[mark|1000000]}]}
{117500,[{1000000,[mark|1000000]}]}
{114709,[{1000000,[mark|1000000]}]}
{125850,[{1000000,[mark|1000000]}]}
{128768,[{1000000,[mark|1000000]}]}
[ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
7> [io:format("~p~n", [timer:tc(fun() -> ets:select(test, ManualMs) end)]) || _ <- lists:seq(1,10)].
{19,[{1000000,[mark|1000000]}]}
{4,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
{4,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
{3,[{1000000,[mark|1000000]}]}
[ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
8> 

And in case you were wondering about duplicate_bag

Eshell V13.0  (abort with ^G)
1> ets:new(test, [named_table, duplicate_bag, {keypos, 1}]).
test
2> [ets:insert(test, {case (X rem 1000) =:= 0 of true -> moo; false -> X end, X}) || X <- lists:seq(1,1_000_000)].
[true,true,true,true,true,true,true,true,true,true,true,
 true,true,true,true,true,true,true,true,true,true,true,true,
 true,true,true,true,true,true|...]
3> Ms = ets:fun2ms(fun({I, Content}) when I == moo -> {I, [mark | Content]} end). 
[{{'$1','$2'},[{'==','$1',moo}],[{{'$1',[mark|'$2']}}]}]
4> 
4> ManualMs = [{{moo,'$1'},[],[{{moo,[mark |'$1']}}]}].            
[{{moo,'$1'},[],[{{moo,[mark|'$1']}}]}]
5>            
5> {MsTime, MsTimes} = lists:foldl(fun(_, {T0, Times}) -> {T1, _} = timer:tc(fun() -> ets:select(test, Ms) end), {T0 + T1, [T1 | Times]} end, {0, []}, lists:seq(1,10)).
{2060486,
 [205817,203358,210674,201202,198532,200487,207024,208889,
  205530,218973]}
6> lists:foreach(fun(T) -> io:format("~w~n", [T]) end, MsTimes).
205817
203358
210674
201202
198532
200487
207024
208889
205530
218973
ok
7> io:format("MsTime Avg: ~w~n", [MsTime / 10]).
MsTime Avg: 206048.6
ok
8> {ManualMsTime, ManualMsTimes} = lists:foldl(fun(_, {T0, Times}) -> {T1, _} = timer:tc(fun() -> ets:select(test, ManualMs) end), {T0 + T1, [T1 | Times]} end, {0, []}, lists:seq(1,10)).
{1140,"jkj{abe[ZÝ"}
9> lists:foreach(fun(T) -> io:format("~w~n", [T]) end, ManualMsTimes).
106
107
106
123
97
98
101
91
90
221
ok
10> io:format("ManualMsTime Avg: ~w~n", [ManualMsTime / 10]).
ManualMsTime Avg: 114.0
ok
1 Like

That’s the “fully bound key” optimization:
Traversals using match and select functions may not need to scan the entire table depending on how the key is specified. A match pattern with a fully bound key (without any match variables) will optimize the operation to a single key lookup without any table traversal at all.

1 Like

Is there a technical edge case why the constant (I am only thinking of the case of a single =:= guard) is not lifted into the bound key by ms_transform or is it just a case of “patches welcome?”

This seems like a candidate for a common pattern, at least for me it is.

1 Like

There already exist a pendng pull request that solves this problem.

and it’s my fault as a slow reviewer that it did not make it into OTP 26.0.

3 Likes

Hi folks, I made the PR @sverker referred to for optimising ets:fun2ms as OP wants. I made that PR, then go distracted with a million other things (have you seen feature: String interpolation by TD5 · Pull Request #7343 · erlang/otp · GitHub :smiley: ), but I’ll revisit this now and get the noise I left in the commit cleaned up.

If there’s any further feedback or requests, see free to comment on the pull request and I’ll see what I can do.

Why use ets:fun2ms at all?

Historically, I suppose you wouldn’t if you were performance sensitive, but it does make for quite readable queries, imo. As you can see from the PR though, we can have the best of both words and just write a query optimised for ets:fun2ms :smiley:

5 Likes

Didn’t know you already had a PR to solve this specific problem. Thank you very much for doing this. It looks like there are always pleasant surprises when you ask in this forum.
Look forward to seeing it merged into mainline.

3 Likes

Stumble across this pull request Add a new operator ^ for pinning of pattern variables by richcarl · Pull Request #2951 · erlang/otp · GitHub and the corresponding EEP: eep/eep-0055.md at master · erlang/eep · GitHub. If this pinning operation is added, it will fix this variable shadowing issue nicely.

2 Likes

I think this is a nice alternative, and before making the optimiser PR, I looked into getting variable pinning merged instead, but it looks like there’s quite a lot of discussion still ongoing around the merits of that change (although I am very much in favour of it!), so it didn’t seem to offer an easy solution to the issues with ms_transform.

1 Like

Yes. It looks like this pinning operation may not be accepted in a short time. It will be faster to get your optimizer PR instead of waiting for it.

2 Likes