dosrank.tex 2.74 KB
 yorn committed Jul 03, 2014 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 \chapter{DOSRank} \label{chp:dosrank} To test Giraph, two simple algorithms, DOSRank and ReverseDOSRank are created, which count the amount of respectively incoming and outgoing edges from or to a vertex. This proof of concept will count the amount of flows per \gls{service}. It can be used to detect if a machine is executing or suffering from a DoS attack, as these are typically identified by a large number of flows. These two algorithms will count respectively the amount of outgoing and the amount of incoming edges. DOSRank (equation~\ref{eq:dosrank}, algorithm~\ref{alg:dosrank}), which will calculate outgoing edges (incoming connections) is trivial; it will set its value during the first superstep and vote to halt. ReverseDOSRank (equation~\ref{eq:reversedosrank}, algorithm~\ref{alg:reversedosrank}) is a bit more complex; since incoming edges are not visible in Giraph, the first superstep is used to send a message over every edge. During the second superstep, every vertex will receive an amount of messages that is equal to its amount of incoming edges. \label{eq:dosrank} f(x) = N^{-}(x). \label{eq:reversedosrank} f(y) = N^{+}(y). \begin{algorithm}[H] \label{alg:dosrank} \caption{DOSRank} \begin{verbatim} result = vertex.getNumEdges(); vertex.voteToHalt(); \end{verbatim} \end{algorithm} \begin{algorithm}[H] \label{alg:reversedosrank} \caption{ReverseDOSRank} \begin{verbatim} if superstep = 0 then begin result = 0; sendMessageToAllEdges(); end else begin while(messages.hasNext) do message.next(); result++; end; vertex.voteToHalt(); end. \end{verbatim} \end{algorithm} \section{Expected results} The algorithm will find IP addresses that are associated with large amounts of flows. Opening a large amount of flows is different from sending large amounts of data, as sending a large amount of data is often a completely legitimate thing to do: For example, sharing files via peer-to-peer protocols or serving large files over HTTP. A large amount of flows, however, may be an indication that something is amiss. It can indicate port scanning, or even targeted attacks like SYN flooding. However, a large amount of flows can also simply indicate a very active or popular host. \section{Purpose} These algorithms are just a proof-of-concept to show that graph systems can be used for traffic analysis. The algorithm itself is most likely easier to implement using other systems. A simple MapReduce algorithm could probably yield the same results a smaller amount of time~\cite{Morken352472}. However, this algorithm does show that Giraph can be used for traffic analysis, and it opens the door for more complex algorithms, like SpreadRank.