Commit 9c2ed2fd authored by 's avatar
Browse files

Updated tutorial

git-svn-id: file:///home/svn/mapi/trunk@331 8d5bb341-7cf1-0310-8cf6-ba355fef3186
parent b3abd25d
No preview for this file type
......@@ -866,6 +866,71 @@ Finally, we close the network flow in order to free all the
resources allocated in every monitoring sensor.
\subsection{Using Anonymization}
This is a simple application that shows some basic anonymization features of MAPI.
\begin{Verbatim}[numbersep=12pt, numbers=left, baselinestretch=1.1, fontsize=\small]
#include <stdio.h>
#include <mapi.h>
int main(int argc, char *argv[]) {
int fd;
fd=mapi_create_flow("eth0");
if(fd==-1) {
printf("Flow cannot be created. Exiting..\n");
exit(-1);
}
//Anonymization of TCP packets
mapi_apply_function(fd,"BPF_FILTER","tcp");
//map IP addresses to sequential integers (1-to-1 mapping)
mapi_apply_function(fd,"ANONYMIZE", IP, SRC_IP, MAP);
mapi_apply_function(fd,"ANONYMIZE", IP, DST_IP, MAP);
//replace with zero, tcp and ip options
mapi_apply_function(fd,"ANONYMIZE", IP, OPTIONS, ZERO);
mapi_apply_function(fd,"ANONYMIZE", TCP, TCP_OPTIONS, ZERO);
//remove payload
mapi_apply_function(fd,"ANONYMIZE", TCP, PAYLOAD,
STRIP, 0);
//checksum fix in IP fixes checksums in TCP and UDP as well
mapi_apply_function(fd,"ANONYMIZE", IP, CHECKSUM,
CHECKSUM_ADJUST);
mapi_apply_function(fd, "TO_BUFFER");
/* connect to the flow */
connect_status = mapi_connect(fd);
if(connect_status < 0)
{
printf("Connect failed");
exit(0);
}
while(1) { /* forever, wait for matching packets */
pkt = mapi_get_next_pkt(fd, fid);
printf("\nAnonymized tcp packet captured!\n");
print_IP_pkt(pkt);
}
return 0;
}
\end{Verbatim}
In the above example, we create a network flow that captures only tcp packets. Then we apply anonymization
on IP addresses, TCP/IP options, TCP payload and finally we fix TCP/IP checksums. A complete list of
protocols and anonymization functions supported, is provided in Appendix \ref{anon_appendix}.
\appendix
......@@ -1410,4 +1475,61 @@ void print_IP_pkt(struct mapipkt *rec) {
}
\end{Verbatim}
\newpage
\section{Anonymization}\label{anon_appendix}
In order to anonymize a network flow the ANONYMIZE function should be applied.
\begin{code}
mapi_apply_function(flow_descriptor,"ANONYMIZE", PROTOCOL,
FIELD, ANONYMIZATION_FUNCTION, FUNCTION_PARAM_1,...);
\end{code}
\subsection{Predefined Protocol Field Names}
The following is the list of predefined names that can be used as
the {\tt field\_description} parameter:
\begin{itemize}
\item {\bf Common to all protocols}: PAYLOAD
\item {\bf Common to IP, TCP, UDP, ICMP}: CHECKSUM
\item {\bf IP}: SRC\_IP, DST\_IP, TTL, TOS, ID, IP\_PROTO, VERSION, IHL, OPTIONS, FRAGMENT\_OFFSET, PACKET\_LENGTH
\item {\bf Common to TCP and UDP}: SRC\_PORT, DST\_PORT
\item {\bf TCP}: SEQUENCE\_NUMBER, ACK\_NUMBER, FLAGS, WINDOW, TCP\_OPTIONS, URGENT\_POINTER, OFFSET\_AND\_RESERVED
\item {\bf UDP}: UDP\_DATAGRAM\_LENGTH
\item {\bf ICMP}: TYPE, CODE
\item {\bf HTTP}: HTTP\_VERSION, METHOD, URI, USER\_AGENT, ACCEPT, ACCEPT\_CHARSET, ACCEPT\_ENCODING, ACCEPT\_LANGUAGE, ACCEPT\_RANGES, AGE, ALLOW, AUTHORIZATION,CACHE-CONTROL, CONNECTION\_TYPE, CONTENT\_ENCODING, CONTENT\_TYPE, CONTENT\_LENGTH, CONTENT\_LOCATION, CONTENT\_MD5, CONTENT\_RANGE, COOKIE, DATE, ETAG, EXPECT, EXPIRES, FROM . HOST, IF\_MATCH, IF\_MODIFIED\_SINCE, IF\_NONE\_MATCH, IF\_RANGE, IF\_UNMODIFIED\_SINCE, LAST\_MODIFIED, LOCATION, KEEP\_ALIVE, MAX\_FORWRDS, PRAGMA, \\PROXY\_AUTHENTICATE, PROXY\_AUTHORIZATION, RANGE, REFERRER, RETRY\_AFTER, SET\_COOKIE, SERVER, TE, TRAILER, TRANSFER\_ENCODING, UPGRADE, USER\_AGENT, VARY, VIA, WARNING, WWW\_AUTHENTICATE, X\_POWERED\_BY, RESPONSE\_CODE, RESP\_CODE\_DESCR
\item {\bf FTP}: USER, PASS, ACCT, FTP\_TYPE, STRU, MODE, CWD, PWD, CDUP, PASV, RETR, REST, PORT, LIST, NLST, QUIT, SYST, STAT, HELP, NOOP, STOR, APPE, STOU, ALLO, MKD, RMD, DELE, RNFR, RNTO, SITE, FTP\_RESPONSE\_CODE, FTP\_RESPONSE\_ARG
\end{itemize}
\subsection{Complete List of the Protocol Field Anonymization Functions}
The following is the complete list of useful functions that could be
applied to the various protocol fields.
\begin{itemize}
\item {\bf UNCHANGED}: leaves field unchanged. This function takes no arguments.
\item {\bf MAP}: maps a field to an integer. Each field will have different mapping except SRC\_IP and DST\_IP which share common mapping as well as SRC\_PORT and DST\_PORT. The rest of the fields share a common mapping based on their length: fields with length 4 have a common mapping, fields with length 2 have their own and finally fields with length 1 share their own mapping. Mapping cannot be applied to payload and IP/TCP options, only in header fields. This function takes no arguments.
\item {\bf MAP\_DISTRIBUTION}: field is replaced by a value extracted from a distribution like uniform or Gaussian, with user-supplied parameters. The first parameter defines the type of distribution and can be UNIFORM or GAUSSIAN. If type is UNIFORM the next 2 arguments specify the range inside which the distribution selects uniformly numbers. If type is GAUSSIAN the next 2 arguments specify the median and standard deviation. Similarly to MAP function, MAP\_DISTRIBUTION can only be applied to IP, TCP, UDP and ICMP header fields, except IP and TCP options.
\item {\bf STRIP}: removes the field from the packet. Optionally, STRIP may not remove the whole field but can keep a portion of it. The user defines the number of bytes to be kept. STRIP cannot be applied to IP, TCP, UDP and ICMP headers except IP and TCP options and can be fully applied to all HTTP and FTP fields.
\item {\bf RANDOM}: replaces the field with a random number. This function takes no arguments.
\item {\bf FILENAME\_RANDOM}: a sub-case of RANDOM. If the field is in a filename format, e.g. ``picture.bmp'' then the extension is left untouched while the filename is replaced by random characters
\item {\bf HASH}: field is replaced by a hash value. Supported hash functions are MD5, SHA, SHA\_2, CRC32 and AES and TRIPLE\_DES for encryption. Note that MD5, SHA, SHA\_2 and CRC32 may generate values with less or greater length than the original field. The hash functions when applied to IP, TCP, UDP and ICMP header fields, their last bytes are used to replace the field. For all the other fields, the padding behavior is supplied as a parameter. If the hashed value has less length, the user can pad the rest bytes with zero by defining PAD\_WITH\_ZERO or can strip the remaining bytes by defining STRIP\_REST as an argument to the function. If the hashed values has length greater than the original field, then the rest of packet contents are shifted accordingly. In all cases, the packet length in protocol headers is adjusted to the new length.
\item {\bf PATTERN\_FILL}: field is repeatedly filled with a pattern . The pattern can be an integer or string. This function takes as a parameter the type of pattern, INTEGER for integer and STR for strings, and the pattern to be used.
\item {\bf ZERO}: a sub-case of pattern fill where field is set to zero. This function takes no arguments
\item {\bf REPLACE}: field is replaced by a single value (a string). The packet length is reduced accordingly, based on the length of the replace pattern. The final length cannot exceed the maximum packet size. This function takes the pattern to be used as an argument.
\item {\bf PREFIX\_PRESERVING}: can only be applied to source and destination IP addresses and performs a key-hashing, preserving the prefixes of IP addresses.
\item {\bf REGEXP}: field is transformed according to regular expression. As an example, performing anonymize(p, TCP, PAYLOAD, REGEXP, ``(.*) password:(.*) (.*)'',{NULL,\\``xxxxx'',NULL}) in a packet p we can substitute the value of a ``password:'' field with the ``xxxxx'' string. Each ``(.*)'' in the regular expression indicates a match and the last argument is a set of replacements for each match (NULL leaves match unmodified).
\item {\bf CHECKSUM\_ADJUST}: if we want the anonymized packet stream to be used by other applications, the anonymization modifications to each packet requires careful treatment of the checksum. This function can be only applied to CHECKSUM field.
\item {\bf SUBFIELD}: with this function we can apply any of the
functions defined above to a \emph{subfield} of the given
field. Therefore the arguments of {\bf SUBFIELD} are the
two offsets over the identified protocol field, which are the bounds of the
subfield, followed by any of the above field anonymization functions
with their parameters. The identified field anonymization
function which is passed as parameter to {\bf SUBFIELD} will be
applied to the \emph{subfield} that is bounded by the given offsets.
\end{itemize}
\end{document}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment