README.md 3.97 KB
Newer Older
Jordan Sissel's avatar
-  
Jordan Sissel committed
1
2
# lumberjack

Brandon Burton's avatar
Brandon Burton committed
3
4
o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~

Jordan Sissel's avatar
Jordan Sissel committed
5
6
7
8
9
10
11
12
13
## QUESTIONS?

If you have questions and cannot find answers, please join the #logstash irc
channel on freenode irc or ask on the logstash-users@googlegroups.com mailing
list.

## What is this?

A tool to collect logs locally in preparation for processing elsewhere!
Jordan Sissel's avatar
Jordan Sissel committed
14
15
16

Problem: logstash jar releases are too fat for constrained systems.

Jordan Sissel's avatar
Jordan Sissel committed
17
18
19
20
Solution: lumberjack

## Building it

sgzijl's avatar
sgzijl committed
21
22
Make sure you have installed FPM (rubygem) and have outgoing FTP access (ftp.openssl.org).

Jordan Sissel's avatar
Jordan Sissel committed
23
24
25
26
27
28
29
30
31
32
33
34
* compile: make 
* rpm package: make rpm
* deb package: make deb

Packages install to /opt/lumberjack. Lumberjack builds all necessary
dependencies itself, so there should be no run-time dependencies you
need.

## Running it

Generally: `lumberjack.sh --host somehost --port 12345 /var/log/messages`

Jordan Sissel's avatar
Jordan Sissel committed
35
See `lumberjack.sh --help` for all the flags
Jordan Sissel's avatar
Jordan Sissel committed
36

Jordan Sissel's avatar
Jordan Sissel committed
37
38
39
40
41
42
43
44
Key points:

* You'll need an ssl ca to verify the server (host) with.
* You can specify custom fields with the '--field foo=bar'. Any number of these
  may be specified. I use them to set fields like 'type' and other custom
  attributes relevant to each log.
* Any non-flag argument after is considered a file path. You can watch any
  number of files.
Jordan Sissel's avatar
Jordan Sissel committed
45

Jordan Sissel's avatar
Jordan Sissel committed
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## Use with logstash

In logstash, you'll want to use the [lumberjack](http://logstash.net/docs/latest/inputs/lumberjack) input, something like:

    input {
      lumberjack {
        # The port to listen on
        port => 12345

        # The paths to your ssl cert and key
        ssl_certificate => "path/to/ssl.crt"
        ssl_key => "path/to/ssl.key"

        # Set this to whatever you want.
        type => "somelogs"
      }
    }

Nils Landt's avatar
Nils Landt committed
64
## Goals
Jordan Sissel's avatar
Jordan Sissel committed
65

66
67
* minimize resource usage where possible (cpu, memory, network)
* secure transmission of logs
68
* configurable event data
69
* easy to deploy with minimal moving parts.
Jordan Sissel's avatar
Jordan Sissel committed
70
71
72
73

Simple inputs only:

* follow files, respect rename/truncation conditions
Jordan Sissel's avatar
Jordan Sissel committed
74
* stdin, useful for things like 'varnishlog | lumberjack ...'
Jordan Sissel's avatar
Jordan Sissel committed
75

76
77
78
79
80
## Implementation details 

Below is valid as of 2012/09/19

### Minimize resource usage
Jordan Sissel's avatar
Jordan Sissel committed
81

Jordan Sissel's avatar
Jordan Sissel committed
82
83
* sets small resource limits (memory, open files) on start up based on the
  number of files being watched
84
* cpu: sleeps when there is nothing to do
Nils Landt's avatar
Nils Landt committed
85
* network/cpu: sleeps if there is a network failure
86
* network: uses zlib for compression
87

88
### secure transmission
89

90
91
* uses openssl to transport logs. Currently supports verifying the server
  certificate only (so you know who you are sending to).
92

93
### configurable event data
94

95
96
* the protocol lumberjack uses supports sending a string:string map
* the lumberjack tool lets you specify arbitrary extra data with `--field name=value`
Jordan Sissel's avatar
.    
Jordan Sissel committed
97
98
99

## easy deployment

Jordan Sissel's avatar
Jordan Sissel committed
100
* all dependencies are built at compile-time (openssl, jemalloc, etc) because many os distributions lack these dependencies.
Jordan Sissel's avatar
.    
Jordan Sissel committed
101
* 'make deb' (or make rpm) will package everything into a single deb (or rpm)
Jordan Sissel's avatar
Jordan Sissel committed
102
103
104
105
106
107
* bin/lumberjack.sh makes sure the dependencies are found when run in production

## future functional features

* re-evaluate globs periodically to look for new log files
* track position of in the log
108

Jordan Sissel's avatar
Jordan Sissel committed
109
## future protocol discussion
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

I would love to not have a custom protocol, but nothing I've found implements
what I need, which is: encrypted, trusted, compressed, latency-resilient, and
reliable transport of events.

* redis development refuses to accept encryption support, would likely reject
  compression as well.
* zeromq lacks authentication, encryption, and compression.
* thrift also lacks authentication, encryption, and compression, and also is an
  RPC framework, not a streaming system.
* websockets don't do authentication or compression, but support encrypted
  channels with SSL. Websockets also require XORing the entire payload of all
  messages - wasted energy.
* SPDY is still changing too frequently and is also RPC. Streaming requires
  custom framing.
* HTTP is RPC and very high over head for small events (uncompressable headers,
  etc). Streaming requires custom framing.