Hi all
I wrote a library of Aho Corasick’s algorithm in Erlang here. i tried every means to optimize its performance but did not get satisfactory results. can someone give me some advice please.
Add {d, 'DEBUG'}
to rebar.config
and compile.
Call the following function.
aho_corasick:gen_acs_by_filename(" keywords",asc).
{ok, acs}.
an acs.erl
file with many function clauses is generated like this:
…
success(342) → {20250,343};
success(174) → {21806,175};
success(257) → {113,258};
success(142) → #{22920 => 143,29240 => 145};
success(182) → {35265,183};
success(159) → {30005,160};
…
success(228) → {36733,229};
success(328) → #{21608 => 329,30701 => 339};
success(313) → {21806,314};
success(105) → {25506,106};
success(263) → {113,264};
success() → false.
failure(234) → {239, undefined};
failure(318) → {288, undefined};
failure(174) → {313, undefined};
failure(142) → {260, undefined};
…
failure(265) → {0, 3};
failure(11) → {0, 3};
failure(205) → {0, 3};
failure(62) → {0, 2};
failure(343) → {0, 3};
failure() → {0, undefined}.
last
aho_corasick:matches(acs, SoneSensitiveWords).
[{4,2},{4,1},{3,1},{2,1},…]
I looked at how the ‘select_val’ instruct is implemented so i have a few questions
- Will a large number of function clauses match faster than Maps? (>100000)
- Will a large number of function clauses can jump by looking up the table?
- Do you have any other optimization scheme?
respect !