README.md 6.16 KB
Newer Older
Jordan Sissel's avatar
-  
Jordan Sissel committed
1 2
# lumberjack

Brandon Burton's avatar
Brandon Burton committed
3 4
o/~ I'm a lumberjack and I'm ok! I sleep when idle, then I ship logs all day! I parse your logs, I eat the JVM agent for lunch! o/~

5
## Questions and support
Jordan Sissel's avatar
Jordan Sissel committed
6 7 8 9 10 11 12 13

If you have questions and cannot find answers, please join the #logstash irc
channel on freenode irc or ask on the logstash-users@googlegroups.com mailing
list.

## What is this?

A tool to collect logs locally in preparation for processing elsewhere!
Jordan Sissel's avatar
Jordan Sissel committed
14

15
Problem: logstash jar releases are too fat for constrained systems. Until we can comfortably promise logstash executing with less resource usage...
Jordan Sissel's avatar
Jordan Sissel committed
16

Jordan Sissel's avatar
Jordan Sissel committed
17 18
Solution: lumberjack

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
## Configuring

lumberjack is configured with a json file you specify with thei -config flag:

`lumberjack -config yourstuff.json`

Here's a sample, with comments in-line to describe the settings. Please please
please keep in mind that comments are technically invalid in JSON, so you can't
include them in your config.:

    {
      # The network section covers network configuration :)
      "network": {
        # A list of downstream servers listening for our messages.
        # lumberjack will pick one at random and only switch if
        # the selected one appears to be dead or unresponsive
        "servers": [ "localhost:5043" ],

        # The path to your client ssl certificate (optional)
        "ssl certificate": "./lumberjack.crt",
        # The path to your client ssl key (optional)
        "ssl key": "./lumberjack.key",

        # The path to your trusted ssl CA file. This is used
        # to authenticate your downstream server.
        "ssl ca": "./lumberjack_ca.crt"
      },

      # The list of files configurations
      "files": [
        # An array of hashes. Each hash tells what paths to watch and
        # what fields to annotate on events from those paths.
        {
          "paths": [ 
            # single paths are fine
            "/var/log/messages",
            # globs are fine too, they will be periodically evaluated
            # to see if any new files match the wildcard.
            "/var/log/*.log"
          ],

          # A dictionary of fields to annotate on each event.
          "fields": { "type": "syslog" }
        }, {
          # A path of "-" means stdin.
          "paths": [ "-" ],
          "fields": { "type": "stdin" }
        }, {
          "paths": [
            "/var/log/apache/httpd-*.log"
          ],
Jordan Sissel's avatar
Jordan Sissel committed
70
          "fields": { "type:" "apache" }
71 72 73 74
        }
      ]
    }

75 76 77 78 79 80 81 82 83 84
### Goals

* Minimize resource usage where possible (CPU, memory, network).
* Secure transmission of logs.
* Configurable event data.
* Easy to deploy with minimal moving parts.
* Simple inputs only:
  * Follows files and respects rename/truncation conditions.
  * Accepts `STDIN`, useful for things like `varnishlog | lumberjack...`.

Jordan Sissel's avatar
Jordan Sissel committed
85 86
## Building it

87 88 89 90
1. Install [FPM](https://github.com/jordansissel/fpm)

        $ sudo gem install fpm

Jordan Sissel's avatar
Jordan Sissel committed
91
2. Install [go](http://golang.org/doc/install)
92

Jordan Sissel's avatar
Jordan Sissel committed
93 94

3. Compile lumberjack
95 96 97 98 99 100 101 102 103 104

        $ git clone git://github.com/jordansissel/lumberjack.git
        $ cd lumberback
        $ make

4. Make packages, either:

        $ make rpm

    Or:
sgzijl's avatar
sgzijl committed
105

106
        $ make deb
Jordan Sissel's avatar
Jordan Sissel committed
107

108 109 110
## Installing it

Packages install to `/opt/lumberjack`. Lumberjack builds all necessary
Jordan Sissel's avatar
Jordan Sissel committed
111 112 113 114 115
dependencies itself, so there should be no run-time dependencies you
need.

## Running it

116 117 118
Generally:

    $ lumberjack.sh --host somehost --port 12345 /var/log/messages
Jordan Sissel's avatar
Jordan Sissel committed
119

Jordan Sissel's avatar
Jordan Sissel committed
120
See `lumberjack.sh --help` for all the flags
Jordan Sissel's avatar
Jordan Sissel committed
121

122
### Key points
Jordan Sissel's avatar
Jordan Sissel committed
123

124 125 126
* You'll need an SSL CA to verify the server (host) with.
* You can specify custom fields with the `--field foo=bar`. Any number of these
  may be specified. I use them to set fields like `type` and other custom
Jordan Sissel's avatar
Jordan Sissel committed
127 128 129
  attributes relevant to each log.
* Any non-flag argument after is considered a file path. You can watch any
  number of files.
Jordan Sissel's avatar
Jordan Sissel committed
130

Jordan Sissel's avatar
Jordan Sissel committed
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
## Use with logstash

In logstash, you'll want to use the [lumberjack](http://logstash.net/docs/latest/inputs/lumberjack) input, something like:

    input {
      lumberjack {
        # The port to listen on
        port => 12345

        # The paths to your ssl cert and key
        ssl_certificate => "path/to/ssl.crt"
        ssl_key => "path/to/ssl.key"

        # Set this to whatever you want.
        type => "somelogs"
      }
    }

149 150 151 152 153
## Implementation details 

Below is valid as of 2012/09/19

### Minimize resource usage
Jordan Sissel's avatar
Jordan Sissel committed
154

155 156 157 158 159
* Sets small resource limits (memory, open files) on start up based on the
  number of files being watched.
* CPU: sleeps when there is nothing to do.
* Network/CPU: sleeps if there is a network failure.
* Network: uses zlib for compression.
160

161
### Secure transmission
162

163 164 165
* Uses OpenSSL to verify the server certificates (so you know who you
  are sending to).
* Uses OpenSSL to transport logs.
166

167
### Configurable event data
168

169 170 171
* The protocol lumberjack uses supports sending a `string:string` map.
* The lumberjack tool lets you specify arbitrary extra data with
  `--field name=value`.
Jordan Sissel's avatar
.  
Jordan Sissel committed
172

173
### Easy deployment
Jordan Sissel's avatar
.  
Jordan Sissel committed
174

175 176 177 178 179
* All dependencies are built at compile-time (OpenSSL, jemalloc, etc) because many os distributions lack these dependencies.
* The `make deb` or `make rpm` commands will package everything into a
  single DEB or RPM.
* The `bin/lumberjack.sh` script makes sure the dependencies are found
  when run in production.
Jordan Sissel's avatar
Jordan Sissel committed
180

181
### Future functional features
Jordan Sissel's avatar
Jordan Sissel committed
182

183 184
* Re-evaluate globs periodically to look for new log files.
* Track position of in the log.
185

186
### Future protocol discussion
187 188 189 190 191

I would love to not have a custom protocol, but nothing I've found implements
what I need, which is: encrypted, trusted, compressed, latency-resilient, and
reliable transport of events.

192
* Redis development refuses to accept encryption support, would likely reject
193
  compression as well.
194 195
* ZeroMQ lacks authentication, encryption, and compression.
* Thrift also lacks authentication, encryption, and compression, and also is an
196
  RPC framework, not a streaming system.
197
* Websockets don't do authentication or compression, but support encrypted
198 199 200 201
  channels with SSL. Websockets also require XORing the entire payload of all
  messages - wasted energy.
* SPDY is still changing too frequently and is also RPC. Streaming requires
  custom framing.
202
* HTTP is RPC and very high overhead for small events (uncompressable headers,
203
  etc). Streaming requires custom framing.
204 205 206 207 208

## License 

See LICENSE file.