Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
Gurvinder Singh
spark_apps
Commits
3e8434ea
Commit
3e8434ea
authored
Jul 07, 2014
by
Sigmund Augdal
Browse files
Replace two maps and a union with a flatMap. Seems to be faster
parent
0ac1f5ac
Changes
1
Hide whitespace changes
Inline
Side-by-side
pythonApp/netflowAlgs.py
View file @
3e8434ea
...
...
@@ -31,7 +31,9 @@ def ports_count_by_ip3(csv):
def
ports_count_by_ip
(
csv
):
srcs
=
csv
.
map
(
lambda
x
:
((
x
[
DEST_PORT
],
x
[
SRC_IP
]),
1
))
dsts
=
csv
.
map
(
lambda
x
:
((
x
[
DEST_PORT
],
x
[
DEST_IP
]),
1
))
ips
=
srcs
.
union
(
dsts
).
reduceByKey
(
add
)
# srcs = csv.map(lambda x: ((x[DEST_PORT], x[SRC_IP]), 1))
# dsts = csv.map(lambda x: ((x[DEST_PORT], x[DEST_IP]), 1))
# ips = srcs.union(dsts).reduceByKey(add)
ips
=
csv
.
flatMap
(
lambda
x
:
(((
x
[
DEST_PORT
],
x
[
DEST_IP
]),
1
),
((
x
[
DEST_PORT
],
x
[
SRC_IP
]),
1
))).
reduceByKey
(
add
)
return
ips
.
map
(
lambda
x
:
(
x
[
1
],
x
[
0
])).
sortByKey
(
False
).
take
(
20
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment