dosrank.tex 2.74 KB
Newer Older
yorn's avatar
yorn committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

To test Giraph, two simple algorithms, DOSRank and ReverseDOSRank are created, which count the amount of respectively incoming and outgoing edges from or to a vertex.
This proof of concept will count the amount of flows per \gls{service}.
It can be used to detect if a machine is executing or suffering from a DoS attack,
 as these are typically identified by a large number of flows.
These two algorithms will count respectively the amount of outgoing and the amount of incoming edges.
DOSRank (equation~\ref{eq:dosrank}, algorithm~\ref{alg:dosrank}), which will calculate outgoing edges (incoming connections) is trivial; it will set its value during the first superstep and vote to halt.
ReverseDOSRank (equation~\ref{eq:reversedosrank}, algorithm~\ref{alg:reversedosrank}) is a bit more complex;
 since incoming edges are not visible in Giraph, the first superstep is used to send a message over every edge.
During the second superstep, every vertex will receive an amount of messages that is equal to its amount of incoming edges.

	f(x) = N^{-}(x).
	f(y) = N^{+}(y).

	result = vertex.getNumEdges();
	if superstep = 0 then
	    result = 0;

\section{Expected results}
The algorithm will find IP addresses that are associated with large amounts of flows.
Opening a large amount of flows is different from sending large amounts of data,
 as sending a large amount of data is often a completely legitimate thing to do:
For example, sharing files via peer-to-peer protocols or serving large files over HTTP.
A large amount of flows, however, may be an indication that something is amiss.
It can indicate port scanning, or even targeted attacks like SYN flooding.
However, a large amount of flows can also simply indicate a very active or popular host.

These algorithms are just a proof-of-concept to show that graph systems can be used for traffic analysis.
The algorithm itself is most likely easier to implement using other systems.
A simple MapReduce algorithm could probably yield the same results a smaller amount of time~\cite{Morken352472}.
However, this algorithm does show that Giraph can be used for traffic analysis,
 and it opens the door for more complex algorithms, like SpreadRank.