README.md 6.99 KB
Newer Older
Jordan Sissel's avatar
-  
Jordan Sissel committed
1
2
# lumberjack

Brandon Burton's avatar
Brandon Burton committed
3
4
o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~

5
## Questions and support
Jordan Sissel's avatar
Jordan Sissel committed
6
7
8
9
10
11
12
13

If you have questions and cannot find answers, please join the #logstash irc
channel on freenode irc or ask on the logstash-users@googlegroups.com mailing
list.

## What is this?

A tool to collect logs locally in preparation for processing elsewhere!
Jordan Sissel's avatar
Jordan Sissel committed
14

15
Problem: logstash jar releases are too fat for constrained systems. Until we can comfortably promise logstash executing with less resource usage...
Jordan Sissel's avatar
Jordan Sissel committed
16

Jordan Sissel's avatar
Jordan Sissel committed
17
18
Solution: lumberjack

19
20
## Configuring

atwardowski's avatar
atwardowski committed
21
lumberjack is configured with a json file you specify with the -config flag:
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

`lumberjack -config yourstuff.json`

Here's a sample, with comments in-line to describe the settings. Please please
please keep in mind that comments are technically invalid in JSON, so you can't
include them in your config.:

    {
      # The network section covers network configuration :)
      "network": {
        # A list of downstream servers listening for our messages.
        # lumberjack will pick one at random and only switch if
        # the selected one appears to be dead or unresponsive
        "servers": [ "localhost:5043" ],

        # The path to your client ssl certificate (optional)
        "ssl certificate": "./lumberjack.crt",
        # The path to your client ssl key (optional)
        "ssl key": "./lumberjack.key",

        # The path to your trusted ssl CA file. This is used
        # to authenticate your downstream server.
Jordan Sissel's avatar
Jordan Sissel committed
44
45
46
47
48
49
50
51
        "ssl ca": "./lumberjack_ca.crt",

        # Network timeout in seconds. This is most important for lumberjack
        # determining whether to stop waiting for an acknowledgement from the
        # downstream server. If an timeout is reached, lumberjack will assume
        # the connection or server is bad and will connect to a server chosen
        # at random from the servers list.
        "timeout": 15
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
      },

      # The list of files configurations
      "files": [
        # An array of hashes. Each hash tells what paths to watch and
        # what fields to annotate on events from those paths.
        {
          "paths": [ 
            # single paths are fine
            "/var/log/messages",
            # globs are fine too, they will be periodically evaluated
            # to see if any new files match the wildcard.
            "/var/log/*.log"
          ],

          # A dictionary of fields to annotate on each event.
          "fields": { "type": "syslog" }
        }, {
          # A path of "-" means stdin.
          "paths": [ "-" ],
          "fields": { "type": "stdin" }
        }, {
          "paths": [
            "/var/log/apache/httpd-*.log"
          ],
77
          "fields": { "type": "apache" }
78
79
80
81
        }
      ]
    }

82
83
84
85
86
87
88
89
90
91
### Goals

* Minimize resource usage where possible (CPU, memory, network).
* Secure transmission of logs.
* Configurable event data.
* Easy to deploy with minimal moving parts.
* Simple inputs only:
  * Follows files and respects rename/truncation conditions.
  * Accepts `STDIN`, useful for things like `varnishlog | lumberjack...`.

Jordan Sissel's avatar
Jordan Sissel committed
92
93
## Building it

94
95
96
97
1. Install [FPM](https://github.com/jordansissel/fpm)

        $ sudo gem install fpm

Jordan Sissel's avatar
Jordan Sissel committed
98
2. Install [go](http://golang.org/doc/install)
99

Jordan Sissel's avatar
Jordan Sissel committed
100
101

3. Compile lumberjack
102
103
104
105
106
107
108
109
110
111

        $ git clone git://github.com/jordansissel/lumberjack.git
        $ cd lumberback
        $ make

4. Make packages, either:

        $ make rpm

    Or:
sgzijl's avatar
sgzijl committed
112

113
        $ make deb
Jordan Sissel's avatar
Jordan Sissel committed
114

115
116
117
## Installing it

Packages install to `/opt/lumberjack`. Lumberjack builds all necessary
Jordan Sissel's avatar
Jordan Sissel committed
118
119
120
121
122
dependencies itself, so there should be no run-time dependencies you
need.

## Running it

123
124
Generally:

125
    $ lumberjack.sh -config lumberjack.conf
Jordan Sissel's avatar
Jordan Sissel committed
126

127
128
129
See `lumberjack.sh -help` for all the flags

The config file is documented further up in this file.
Jordan Sissel's avatar
Jordan Sissel committed
130

131
### Key points
Jordan Sissel's avatar
Jordan Sissel committed
132

133
* You'll need an SSL CA to verify the server (host) with.
134
135
136
* You can specify custom fields for each set of paths in the config file. Any
  number of these may be specified. I use them to set fields like `type` and
  other custom attributes relevant to each log.
Jordan Sissel's avatar
Jordan Sissel committed
137

138
139
140
141
142
143
144
145
146
147
148
149
150
### Generating an ssl certificate

Logstash supports all certificates, including self-signed certificates. To generate a certificate, you can run the following command:

    $ openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout lumberjack.key -out lumberjack.crt

This will generate a key at `lumberjack.key` and the certificate at `lumberjack.crt`. Both the server that is running lumberjack as well as the logstash instances receiving logs will require these files on disk to verify the authenticity of messages.

Recommended file locations:

- certificates: `/etc/pki/tls/certs`
- keys: `/etc/pki/tls/private`

Jordan Sissel's avatar
Jordan Sissel committed
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
## Use with logstash

In logstash, you'll want to use the [lumberjack](http://logstash.net/docs/latest/inputs/lumberjack) input, something like:

    input {
      lumberjack {
        # The port to listen on
        port => 12345

        # The paths to your ssl cert and key
        ssl_certificate => "path/to/ssl.crt"
        ssl_key => "path/to/ssl.key"

        # Set this to whatever you want.
        type => "somelogs"
      }
    }

169
170
171
172
173
## Implementation details 

Below is valid as of 2012/09/19

### Minimize resource usage
Jordan Sissel's avatar
Jordan Sissel committed
174

175
176
177
178
179
* Sets small resource limits (memory, open files) on start up based on the
  number of files being watched.
* CPU: sleeps when there is nothing to do.
* Network/CPU: sleeps if there is a network failure.
* Network: uses zlib for compression.
180

181
### Secure transmission
182

183
184
185
* Uses OpenSSL to verify the server certificates (so you know who you
  are sending to).
* Uses OpenSSL to transport logs.
186

187
### Configurable event data
188

189
* The protocol lumberjack uses supports sending a `string:string` map.
Jordan Sissel's avatar
.    
Jordan Sissel committed
190

191
### Easy deployment
Jordan Sissel's avatar
.    
Jordan Sissel committed
192

193
194
195
196
197
* All dependencies are built at compile-time (OpenSSL, jemalloc, etc) because many os distributions lack these dependencies.
* The `make deb` or `make rpm` commands will package everything into a
  single DEB or RPM.
* The `bin/lumberjack.sh` script makes sure the dependencies are found
  when run in production.
Jordan Sissel's avatar
Jordan Sissel committed
198

199
### Future functional features
Jordan Sissel's avatar
Jordan Sissel committed
200

201
202
* Re-evaluate globs periodically to look for new log files.
* Track position of in the log.
203

204
### Future protocol discussion
205
206
207
208
209

I would love to not have a custom protocol, but nothing I've found implements
what I need, which is: encrypted, trusted, compressed, latency-resilient, and
reliable transport of events.

210
* Redis development refuses to accept encryption support, would likely reject
211
  compression as well.
212
213
* ZeroMQ lacks authentication, encryption, and compression.
* Thrift also lacks authentication, encryption, and compression, and also is an
214
  RPC framework, not a streaming system.
215
* Websockets don't do authentication or compression, but support encrypted
216
217
218
219
  channels with SSL. Websockets also require XORing the entire payload of all
  messages - wasted energy.
* SPDY is still changing too frequently and is also RPC. Streaming requires
  custom framing.
220
* HTTP is RPC and very high overhead for small events (uncompressable headers,
221
  etc). Streaming requires custom framing.
222
223
224
225
226

## License 

See LICENSE file.