mycoursenotes: Filebeat Udemy

filebeat.yml

Explore filebeat.reference.yml

Comment out the "Elasticsearch Output"
Uncomment Logstash Output and hosts
Default port of logstash is 5044 and you get in this setting.

Enabling Apache Module

./filebeat modules enable apache

Go to modules folder not modules.d
Go to module/apache2 and open manifest.yml and we have the default location here. We should not modify the directory in this file.
Go to modules.d/apache2.yml. put enabled: false for error logs. In access log section:

var.paths: ["/path/for/log/*log"]

This enables apache module. Now setup logstash pipeline

Logstash Pipeline:

Open pipelines.yml file under config/pipelines. Add one pipeline for access log

pipeline.id: access_logs

path.config: "/Users/andy/desktop/logstash/config/pipelines/access.conf"

Now open the access.conf file

input{

beats {

port => 5044

host => "0.0.0.0"

}

output{

stdout{

codec => rubydebug{

metadata => true

}

Now run logstash bin/logstash

Starting up filebeat and processing logs

Start filebeat. It will harvest the logs. If we investigate further we can see it is connecting to logstash. "Connecting to backoff" in pipeline/output

./filebeat -e

Some information that are passed by filebeat also will be useful.

@metadata => { beat, type, version, pipeline, ip_address}

log => { path} - DON’T USE THIE FIELD TO DO FILTERING. It just increases the risk and it is tightly coupled. Instaed there is configuration option within filebeat.yml -> TAGS. We can specify number of option and will be sent to logstash. You can also use module to filter

module => apache

dataset => apache.access

Another field name fileset which will have "name => access "

Adding elasticsearch template index

Before enabling elasticsearch plugin in logstash we need few setup. It is only required if we want to use default configuration like Kibana dashboard.

index template -> is the configuaration about how our data should be mapped.

Here we are looking at loading index template in elasticsearch.

Refer this link -> https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-template.html

./filebeat setup
--index-management -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'

We invoke setup command setup template flag. This is temporarily changed. It says "Index setup complete". To check, we can use Kibana console tools. After the port number put this URL (/app/kibana#/dev_tools)

GET /_template/filebeat-*

Apache will have only few fields and it is Elastic Common Fields. These fields come from fileds.yml in filebeat. Search apache and you will see the same fields.

Take away:

Index templates specify settings and mappings for indices matching patterns
Filebeat version is included in the index template name and patterns to make upgarding filebeat easy
When disabling the Elasticsearch output, you must run the command to add the index template after upgrading filebeat.
ECS (Elastic Common schema) is a specification of common fields
The fields.yml file contains the fields for Filebeat modules. Used for both the index template & the Filebeat documentation.

If our machine where we are running filebeat don’t have access to elasticsearch, we need to use alternative method.

./filebeat export
template > filebeat.template.json

Adding Kibana Dashboard:

open filebeat.yml
Got to kibana and uncomment the host. Default port is 5601
To setup dashboard

./filebeat setup --dashboards

Enable Dashboards. Dashboards also have versions. So if filebeat upgrades the dashboard also need to be upgraded. However it requires filebeat to have access to Kibana at startup. We can have kibana running isolated from filebeat (like my org). In that case you can do setup from anymachine that has access to kibana. Then we need to do this again only after upgrading to new filebeat.

setup.dashboards.enabled: true

Finishing the pipeline:

The configuration is now complete, we just need to do some event processing

open access.conf and implement the ingest_pipeline.json as logstash pipleline.

NOTE: Useragent filter plugin is not updated to support ECS so we need to do it on our own

input{

beats {

port => 5044

host => "0.0.0.0"

}

filter{

if[event][dataset] =! "apache.access" {

drop { }

}

grok {

match => {

"message"=> [

Copy the Pattern from ingest pipleline. What we copied is json form, so change double escape (\\) to \. Add square bracket to all. For example: in json we will have just source then change to [source]

]

}

//Adding after match option

remove_field => "message"

}

if "_grokparsefailure" in [tags]{

drop { }

}

grok{

match => {

"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"

}

mutate{

add_field => { "[event][created]" => "%{@timestamp}"}

}

date {

match => [ "[apache][access][time]", "dd/MMM/yyyy:H:m:s Z" ]

remove_field => "[apache][access][time]"

}

mutate{

remove_field => ["ua_tmp"]

}

geoip{

source => "[source][ip]"

target => "[source][geo]"

}

output{

elasticsearch{

hosts => "localhost"

manage_template => false

index => "%{[@metadata][beat]-%{[@metadata][version]}-%{+yyyy-MM-dd}}"

}

In the above code @metadata should be the event sent from filebeat

Open module/apache/access/manifest.yml

Check the value of ingest_pipeline and open ingest/default.json. Then open the json, you can see the definition of ingest pipeline. Since we have logstash, we need to use the same pipeline in logstash.

Key takeaway:

Porting elasticsearch ingent pipeline to logstash ensures that Elasticsearch receives the data it expects

Feel free to do additional filtering within the pipeline

The useragent filter plugin add fields in a format that is not ECS compatible; therefore we must manually restructure the data for now

Make sure that the Elasticsearch index matches the index pattern defined within the index template.

How Filebeat Works:

Filebeat also maintains sincedb files like logstash.

Harvester -> Read files and Keeps track of how much of the file is read. It store how much of the file is read in registry file. Since there are too many files it will track all the files. It will send the data to libbeat.

Libbeat -> Receives file from harvester, it is shared library. It aggregates data and send to the output that is configured. Elasticsearch and libbeat can group multiple lines together to single line, it uses bulk API which is more efficeint that sending multiple requests.

Registry -> Stored in memory, but flushed to disk when filebeat is shutdown. It is formatted as json with file path and offset. It is byte offset. Filebeat keep tracks of each event.

Preventing sending events twice -> Filebeat always wait for acknowledgement from output. If the filebeat is shutdown before acknowledgement is received then it will send duplicate. In that case, we can add a option shutdown_timeout to 5 sec so that it waits. It is not 100%, but reduces risk.

Takeaway:

An input starts up harvester for each file matched by its file pattern
Harversters read files and keep track of their offsets
The file offsets and other information is persisted to a registry

Clearing the registry:

It is under data/registry/filebeat.

data.json -> contains information about registry. It will be present only if harvester starts reading. Typically don’t modify the registry. You can either reset the offset value to 0 or jus delete it.

Processing More Logs:

First deleting the document that we loaded from Kibana Dev Tools

DELETE /filebeat-*

Manual Input Configuration:

Disabling apache module

./filebeat modules disable apache

Open filebeat.yml

enabled: true

paths:

<log path>

fields:

events:

module: apache

dataset: "apache.access"

fileds_under_root: true

Uncomment exclude _files

//We are adding fields to match the logstash pipeline.

//[fields][event][dataset] instead of [event][dataset]

No changes needed in logstash pipeline.

Evaluation of modules:

Using modules with Logstash requires additional configuration

Reduces simplicity and convenience, but increases flexibility

In some cases you save a lot of config with modules

Automatically benefit from boilerplate config

Reduce the risk of mistakes. Example fogetting excluding gz file.

Make use of module as much as possible.

Prefer either using modules or not using modules at all; a combination is harder to maintain.

Tagging Events:

It alows tag field to be sent along, can be used for filtering.

tags: ["web-server"]

We cannot always use tags, as we have multiple inputs using same tags. It is better to use fields

Takeaway:

Use tags option to add one or more tags to Filebeat

These tags can then be used withing the configured output (eg:logstash)

Tags can be used for filtering in logstash, exactly as with any other tas

Tags are added to all events, Ie for all inputs

Approaches for handling multiple log types:

We cannot have two inputs listening on same port. It is limitation of OS, two process cannot listen to same port. So we can have different ports. It is possible by enabling logstash loadbalancer. We can enable multiple enpoints for logstash. But it is just workaround and exploiting the load balancer.

Solution 1: Start multiple instance of filebeat. So we have two installation of filebeat and run side by side. Then configure data to send to different endpoint.

Solution 2: Managing everything in single logstash pipeline. It will add complexity to pipeline. Running in singlepipeline means need to process lots of log and it is bottleneck

Processing Apache Error Logs:

When using filebeat, we get multiple logs to the port so either we need to have multiple file beat instance or have one filebeat and handle in a single pipeline. He choosed single pipeline.

Go to filebeat.yml and disable filebeat.input. In above example he showed using filebeat without default apache module so he turned it off.

enabled: false (under filebeat.inputs)

Enable apache module

./filebeat modules enable apache

Open apache.yml under modules.d and enable error log. And copy the path.

var.path:["/users/andy/desktop/error.log"]

Create new file apache.conf under /config/pipelines/
Add the reference in pipleline.yml, previously we had access_logs now he edited the same to have just this pipeline.

pipeline.id: apache_logs

path.config: "/Users/andy/desktop/logstash/config/pipelines/apache.conf"

In apache.conf

input{

beats {

port => 5044

host => "0.0.0.0"

}

filter{

if[event][module] != "apache"{

drop { }

}

if[event][dataset] == "apache.access" {

//NOTE: All code from previous section for acces.log is now copied under this

grok {

match => {

"message"=> [

]

}

//Adding after match option

remove_field => "message"

}

if "_grokparsefailure" in [tags]{

drop { }

}

grok{

match => {

"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"

}

mutate{

add_field => { "[event][created]" => "%{@timestamp}"}

}

date {

match => [ "[apache][access][time]", "dd/MMM/yyyy:H:m:s Z" ]

remove_field => "[apache][access][time]"

}

mutate{

remove_field => ["ua_tmp"]

}

geoip{

source => "[source][ip]"

target => "[source][geo]"

}

} else if [event][dataset] == "apache.error"

{

// Convert the pipline.json from ingest directory under apache module.

grok {

match => {

"message" => [

//Copy the GROK pattern from ingest pipeline, convert // to / and add square bracket.

]

}

pattern_definitions => {

"APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"

}

date {

match => [ "[apache][error][timestamp]", "EEE MMM dd H:m:s yyyy", "EEE MMM dd H:m:s.SSSSSS yyyy" ]

remove_field => [apache][error][timestamp]

}

grok{

match => {

"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"

}

geoip{

source => "[source][ip]"

target => "[source][geo]"

}

output{

elasticsearch{

hosts => "localhost"

manage_template => false

index => "%{[@metadata][beat]-%{[@metadata][version]}-%{+yyyy-MM-dd}}"

}

Handling Multiline Using Filebeat:

Now he is creating new logstash pipeline and he not using access.config because it will be too complicated.

Create new config java-errors.conf. To avoid conflict he is increasing the port by one.

input{

beats {

port => 5045

host => "0.0.0.0"

}

output{

stdout{ }

}

Add this pipeline to pipeline.yml:

pipeline.id: java_logs

path.config: "/Users/andy/desktop/logstash/config/pipelines/java_errors.conf"

Now he has fresh installation of filebeat which runs side by side:
Disable elasticsearch output and enable logstash. Use the port here as 5045 for logstash.
Configure input. Under filebeat.inputs:

enabled: true

paths:

/Users/Andy/Desktop/java_errors.log

Multiline options in filebeat.

uncomment multiline.negate: false

In logstash we configured as previous, and the equivalent is after.

Just uncomment multiline.match: after

Uncomment multiline.pattern: '^(\s+|\t)|(Caused by:)'

Start logstash.
Start filebeat

Handling multiline - V2

He now wants to match the file line of stack trace. So far we were matching the next line.

Set negate to true

multiline.negate: true

multiline.pattern: <Copy from GIT it is so big>

Start filebeat

mycoursenotes

Sunday, 3 May 2020

Filebeat Udemy

No comments:

Post a Comment

Golang - Email - Secure code warrior

Report Abuse