# How to write kmrs better?

``````-module(kmer).
-export([calculate/1]).

% Run this in erlang shell:
% c("kmer.erl").
% kmer:calculate(Number).

get_timestamp() ->
{Mega, Sec, Micro} = os:timestamp(),
(Mega*1000000 + Sec)*1000 + round(Micro/1000).

calculate(N) ->
Now = get_timestamp(),
Kmers = calculate(N, string:copies("A", N), string:copies("T", N)),
Delta = get_timestamp() - Now,
io:fwrite("Nummer of generated k-mers: ~p - took ~pms~n", [Kmers, Delta]).

calculate(N, Start, Stop) -> calculate(N, Start, Stop, 1).

calculate(N, Start, Stop, C) -> if
Start == Stop -> C;
true -> calculate(N, next(N, Start), Stop, C + 1)
end.

next(N, Start) -> next(N, 1, Start, "T").

next(N, I, Start, "T") ->
C = string:sub_string(Start, I, I),
New = convert(C),
String = string:left(Start, I - 1) ++ New ++ string:right(Start, N - I),
next(N, I + 1, String, C);
next(_, _, Start, _) ->
% io:fwrite("~p~n", [Start]),
Start.

convert(S) -> case S of
"A" -> "C";
"C" -> "G";
"G" -> "T";
"T" -> "A";
_Else -> " "
end.
``````

can I somehow write this better/faster?

2 Likes

What is this supposed to do?

2 Likes

My 2 cents:

• You are measuring time by hand with `get_timestamp/0`. You can user `timer:tc/2` to do this for you.
• In `next/4` you should utilize pattern matching on the first character of Start and than construct a list recursively. Extracting n-th charachter by hand, and than recreating the whole string with `string:left(Start, I - 1) ++ New ++ string:right(Start, N - I),` is O(n), so basically next is O(n^2).
• Last parameter of next is always 1 character long. Do you even need it to be a list?
• Same for convert, why not using just a character? I guess this is some genetic computing, so if you donâ€™t expect input that is not A, C, G or T, just let it crash. Also, you can use pattern matching in the function argument.
• Also, you are using some obsolete functions from `string` module. This surely works, but take a look into the documentation for substitute functions.
• you can use pattern matching in calculate/4 in a following way:
``````calculate(_, Stop, Stop, C) -> C; %% This ensures 2nd and 3rd argument is equal
calculate(N, Start,Stop,C) -> calcualate(N, next(N,Start), Stop, C+1).
``````

So next function can be something like this:

``````next(N, Start) -> next(N,1,Start,'T').

next(N, I, [C|Rest], 'T') ->
[convert(C) | next(N, I+1,Rest,C)];
next(_, _, Start, _) ->
Start.

convert('A') -> 'C';
convert('C') -> 'G';
convert('G') -> 'T';
convert('T') -> 'A'.
``````

edit: Also, you can get rid of the first argument in next/2, next/4, calculate/3 and calculate/4 if you write it like this. Suggested function is not tail recursive, so a really big lenght of the input would probably crush it, but can easily be rewrited to tail recursive function by additng the accumulator argument.

3 Likes

Define â€śfasterâ€ť, and also OTP version youâ€™re using (also, shameless plug, you probably want to use erlperf to answer a question â€śwhich version of my code is fasterâ€ť).

Very obvious thing is not to use `string` though.

3 Likes