Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
S
spark_apps
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Service Desk
Milestones
Iterations
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Incidents
Environments
Analytics
Analytics
CI / CD
Code Review
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Gurvinder Singh
spark_apps
Commits
c655f949
Commit
c655f949
authored
Jun 27, 2014
by
Sigmund Augdal
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Test netflow analysis app
parent
6524f2c8
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
37 additions
and
0 deletions
+37
-0
pythonApp/netflowTest.py
pythonApp/netflowTest.py
+37
-0
No files found.
pythonApp/netflowTest.py
0 → 100644
View file @
c655f949
from
pyspark.conf
import
SparkConf
from
pyspark
import
SparkContext
conf
=
SparkConf
()
conf
.
setAppName
(
"Netflow test"
).
set
(
"spark.executor.memory"
,
"1g"
).
set
(
"spark.default.parallelism"
,
4
)
sc
=
SparkContext
(
conf
=
conf
)
def
add
(
x
,
y
):
return
x
+
y
#path = 'hdfs://daas/user/hdfs/trd_gw1_12_01_normalized.csv'
path
=
'hdfs://daas/user/hdfs/trd_gw1_12_normalized.csv/*'
csv
=
sc
.
textFile
(
path
).
map
(
lambda
x
:
x
.
split
(
","
)).
cache
()
def
top_ips
(
csv
,
num
=
10
):
ips
=
csv
.
flatMap
(
lambda
x
:
x
[
1
:
3
])
ip_count
=
ips
.
map
(
lambda
x
:
(
x
,
1
)).
reduceByKey
(
add
)
return
ip_count
.
map
(
lambda
x
:
(
x
[
1
],
x
[
0
])).
sortByKey
(
False
).
take
(
num
)
def
top_ports
(
csv
,
num
=
10
):
ports
=
csv
.
map
(
lambda
x
:
x
[
3
])
port_count
=
ports
.
map
(
lambda
x
:
(
x
,
1
)).
reduceByKey
(
add
)
return
port_count
.
map
(
lambda
x
:
(
x
[
1
],
x
[
0
])).
sortByKey
(
False
).
take
(
num
)
# print "Finding top ports"
# top = top_ports(csv)
# print "Port Count"
# for count, port in top:
# print port, count
print
"Finding active ssh ips"
ssh_ips
=
csv
.
filter
(
lambda
x
:
x
[
3
]
==
'22'
)
print
top_ips
(
ssh_ips
,
15
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment