Apologies if my tone was accusatory. I believe your intent.
It just scares me seeing things like this posted as there’s always the possibility someone will stumble across it and treat is otherwise
Apologies if my tone was accusatory. I believe your intent.
It just scares me seeing things like this posted as there’s always the possibility someone will stumble across it and treat is otherwise
We actually have a layer over epgsql which does something very similar to this as large queries with many arguments (we have some with 50+ arguments) are extremely difficult to maintain.
Our implementation extracts names from the query and replaces them with $X offset tokens, while the arguments (tagged tuples) are sorted according to the final positions. (cached as a fun using a hash of the sql as a key)
Just one point on iolists - people vastly overestimate their advantage over flat binaries. In theory, they are vastly superior, in practice, modern computers are really, really fast at copying consecutive runs of bytes, and really, really slow at dereferencing pointers. This makes flat binaries frequently way faster, despite extra copying, compared to iolists.
There’s also a further difference in terms of GC - a binary is visited by the GC just once and the work involved is independent of its size. For iolist, each cell has to be visited separately and copied separately - this further negatively impacts performance of iolists, if they’re held across GCs. This means, iolists often don’t even avoid the extra copying, but in fact create even more of it because of GC.
That’s my experience as well, hence focussing on flat binaries rather than iolists in my string interpolation PR.
iolists can balloon in memory usage due to the references in cons cells to the next element in the list: For iolists made up of many nested small lists, just references to the empty list can start to noticeably add up.
Well, now we’re firmly in the off-topic territory. Let’s say, performance of binaries vs. iolists is not a trivial topic, and your comment does not give a full picture. There are many factors to performance. Consider a crude benchmark:
-module(foo).
-export([test1/0, test2/0]).
test1() ->
Cnt = counters:new(1, []),
_ = [spawn(fun() -> test1_entry(Cnt) end) || _ <- lists:seq(1, 1000)],
stats(Cnt).
test1_entry(Cnt) ->
FD = file:open("/dev/null", [write]),
test1_loop(Cnt, FD, 0).
test1_loop(Cnt, FD, N) ->
L = [<<"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.">>,
integer_to_binary(N),
<<"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.">>
],
file:write(FD, L),
counters:add(Cnt, 1, 1),
test1_loop(Cnt, FD, N + 1).
test2() ->
Cnt = counters:new(1, []),
_ = [spawn(fun() -> test2_entry(Cnt) end) || _ <- lists:seq(1, 1000)],
stats(Cnt).
test2_entry(Cnt) ->
FD = file:open("/dev/null", [write]),
test2_loop(Cnt, FD, 0).
test2_loop(Cnt, FD, N) ->
L = <<"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.",
(integer_to_binary(N))/binary,
"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.">>,
file:write(FD, L),
counters:add(Cnt, 1, 1),
test2_loop(Cnt, FD, N + 1).
stats(Cnt) ->
counters:put(Cnt, 1, 0),
timer:sleep(1000),
Mem = erlang:memory(total),
io:format("Throughput: ~p/s, memory: ~p~n", [counters:get(Cnt, 1), Mem]),
stats(Cnt).
Results are the following:
Erlang/OTP 26 [erts-14.2.1] [emqx] [64-bit] [smp:32:16] [ds:32:16:10] [async-threads:1] [jit:ns]
Eshell V14.2.1 (press Ctrl+G to abort, type help(). for help)
1> foo:test1().
Throughput: 35134765/s, memory: 61209080
Throughput: 36291150/s, memory: 61982736
Throughput: 36185643/s, memory: 62633400
Throughput: 36368691/s, memory: 63591624
Throughput: 36867562/s, memory: 64061656
Throughput: 36516506/s, memory: 63795704
Throughput: 36972384/s, memory: 64809536
Throughput: 37141779/s, memory: 65154584
...
1> foo:test2().
Throughput: 26663659/s, memory: 69840232
Throughput: 27589949/s, memory: 71139528
Throughput: 27328361/s, memory: 71466504
Throughput: 27383143/s, memory: 72615664
Throughput: 27765711/s, memory: 72960424
Throughput: 27513227/s, memory: 72635704
Throughput: 27595808/s, memory: 74103488
Throughput: 27808763/s, memory: 73523152
Throughput: 28193113/s, memory: 76144808
Throughput: 27893063/s, memory: 75400704
Throughput: 27843999/s, memory: 76065376
Throughput: 28471117/s, memory: 77018216
Throughput: 27874904/s, memory: 77206560
Throughput: 28052121/s, memory: 76884808
Throughput: 28460747/s, memory: 78394864
If I had to make a wild guess why iolists clearly beat flat binaries in this example, it’s probably due to literals. Though I am not 100% sure whether BEAM runtime makes the best use of the fact that some parts of the template are static literals that can be shared, it’s most likely the case. Then the problem of computers being “really, really slow at dereferencing pointers” (a.k.a memory latency) is mitigated by the cache.
P.S. I am not trying to say that iolists are always better, and I myself can easily come up with the scenarios when the opposite is true, so let’s not derail the thread any further.
Can I suggest taking a step back, and instead of trying to implement a string formatting sigil directly, consider bringing sigils into the Erlang syntax tree?
Right now it seems like the lexer does the actual string transformation and sigils disappear from the syntax tree. Lifting them into the syntax tree would allow processing sigils in a parse_transform
.
Personally, I would have liked a sql
sigil that generates a {sql, StatementText, StatementParams}
tuple, but today this is impossible to implement in a parse_transform
.
I was thinking about this today and I have decided to develop an additional proposal in response to the suggestions provided by @wojtekmach and @asabil: Allow user-defined sigils via parse transforms.
The implementation is pretty simple and should have its cons, but it allows the developer to implement any sigil it wants and does not break the existing ones.
Please refer to the PR and the example project for the full description and examples.
The developer can use the built-in sigils, like s
, S
, b
, and B
, but it can also use any sigil it wants. However, it is obligated to handle the custom sigil with parse transform, otherwise, an invalid sigil
exception will be raised at the compile time.
It was the first idea and seems to be the simplest one to implement this feature.
-module(example).
-compile({parse_transform, json_sigil}).
-compile({parse_transform, sql_sigil}).
%% Sigils can be defined globally in rebar3.config (applies to all .erl files), e.g.:
%{erl_opts, [
% {parse_transform, json_sigil},
% {parse_transform, sql_sigil}
%]}.
-export([json/0]).
-export([sql/2]).
json() ->
~json"""
{"key": "value"}
""".
sql(Name, MinAge) when is_binary(Name); is_integer(MinAge) ->
~sql"""
SELECT *
FROM users
WHERE name = {Name}
AND age >= {MinAge}
""".
%% Uncomment this function to raise an invalid sigil exception.
%invalid() ->
% ~invalid"".
1> c(json_sigil), c(sql_sigil), c(example).
{ok,example}
2> example:json().
#{<<"key">> => <<"value">>}
3> example:sql(~"Bob", 18).
{sql,<<"SELECT *\n FROM users\n WHERE name = $1\n AND age >= $2">>,
[<<"Bob">>,18]}
4> ~invalid"".
* 1:1: invalid sigil
As per the PR…
I have no idea what the core team has planned for sigils.
This is just to initiate a discussion about the future of sigils.
Any criticism or suggestion is welcome
sql would probably need to handle tuples for {Value, type()} since afaik most of the erlang postgresql libraries use the provided type for encoding values
As there has been a lot of talk about SQL sigils, performance and safety. Then I’m going to put this here: GitHub - elixir-dbvisor/sql: Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries.
Feel free to ask me any question you might have. Would be great to generete an Erlang SQL sigil at some point, or even the lexer and parser.