Sunday, 3 May 2020

Filebeat Udemy


filebeat.yml
Explore filebeat.reference.yml

  1. Comment out the "Elasticsearch Output"
  2. Uncomment Logstash Output and hosts
  3. Default port of logstash is 5044 and you get in this setting.


Enabling Apache Module
./filebeat modules enable apache

  1. Go to modules folder not modules.d
  2. Go to module/apache2 and open manifest.yml and we have the default location here. We should not modify the directory in this file.
  3. Go to modules.d/apache2.yml. put enabled: false for error logs. In access log section:
var.paths: ["/path/for/log/*log"]
  1. This enables apache module. Now setup logstash pipeline

Logstash Pipeline:
  1. Open pipelines.yml file under config/pipelines. Add one pipeline for access log
    • pipeline.id: access_logs
path.config: "/Users/andy/desktop/logstash/config/pipelines/access.conf"
  1. Now open the access.conf file
input{
beats {
port => 5044
host => "0.0.0.0"
}
}

output{
stdout{
codec => rubydebug{
metadata => true
}

}

}
  1. Now run logstash  bin/logstash

Starting up filebeat and processing logs
  1. Start filebeat. It will harvest the logs. If we investigate further we can see it is connecting to logstash. "Connecting to backoff" in pipeline/output
./filebeat -e
  1. Some information that are passed by filebeat also will be useful.
@metadata => { beat, type, version, pipeline, ip_address}

log => { path} - DON’T USE THIE FIELD TO DO FILTERING. It just increases the risk and it is tightly coupled. Instaed there is configuration option within filebeat.yml -> TAGS. We can specify number of option and will be sent to logstash. You can also use module to filter
module => apache
dataset => apache.access

Another field name fileset which will have "name => access "

Adding elasticsearch template index
Before enabling elasticsearch plugin in logstash we need few setup. It is only required if we want to use default configuration like Kibana dashboard.

index template -> is the configuaration about how our data should be mapped.

Here we are looking at loading index template in elasticsearch.

./filebeat setup --index-management -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'

We invoke setup command setup template flag. This is temporarily changed. It says "Index setup complete". To check, we can use Kibana console tools. After the port number put this URL (/app/kibana#/dev_tools)

GET /_template/filebeat-*

Apache will have only few fields and it is Elastic Common Fields. These fields come from fileds.yml in filebeat. Search apache and you will see the same fields.

Take away:
  1. Index templates specify settings and mappings for indices matching patterns
  2. Filebeat version is included in the index template name and patterns to make upgarding filebeat easy
  3. When disabling the Elasticsearch output, you must run the command to add the index template after upgrading filebeat.
  4. ECS (Elastic Common schema) is a specification of common fields
  5. The fields.yml file contains the fields for Filebeat modules. Used for both the index template & the Filebeat documentation.

If our machine where we are running filebeat don’t have access to elasticsearch, we need to use alternative method.

./filebeat export template > filebeat.template.json



Adding Kibana Dashboard:
  1. open filebeat.yml
  2. Got to kibana and uncomment the host. Default port is 5601
  3. To setup dashboard
./filebeat setup --dashboards
  1. Enable Dashboards. Dashboards also have versions. So if filebeat upgrades the dashboard also need to be upgraded. However it requires filebeat to have access to Kibana at startup. We can have kibana running isolated from filebeat (like my org). In that case you can do setup from anymachine that has access to kibana. Then we need to do this again only after upgrading to new filebeat.
setup.dashboards.enabled: true

Finishing the pipeline:
The configuration is now complete, we just need to do some event processing
  1. open access.conf and implement the ingest_pipeline.json as logstash pipleline.
NOTE: Useragent filter plugin is not updated to support ECS so we need to do it on our own
input{
beats {
port => 5044
host => "0.0.0.0"
}
}

filter{
if[event][dataset] =! "apache.access" {
drop { }
}

grok {
match => {
"message"=> [
Copy the Pattern from ingest pipleline. What we copied is json form, so change double escape (\\) to \. Add square bracket to all. For example: in json we will have just source then change to [source]
]
}

//Adding after match option
remove_field => "message"

}
}
if "_grokparsefailure" in [tags]{
drop { }
}

grok{
match => {
"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"
}
}

mutate{
add_field => { "[event][created]" => "%{@timestamp}"}
}

date {
match => [ "[apache][access][time]", "dd/MMM/yyyy:H:m:s Z" ]
remove_field => "[apache][access][time]"
}

<There is big code for Useragent because of non-compatible with ECS>

mutate{
remove_field => ["ua_tmp"]
}

geoip{
source => "[source][ip]"
target => "[source][geo]"
}
}
output{
elasticsearch{
hosts => "localhost"
manage_template => false
index => "%{[@metadata][beat]-%{[@metadata][version]}-%{+yyyy-MM-dd}}"
}


}

In the above code @metadata should be the event sent from filebeat
  1. Open module/apache/access/manifest.yml
Check the value of ingest_pipeline and open ingest/default.json. Then open the json, you can see the definition of ingest pipeline. Since we have logstash, we need to use the same pipeline in  logstash.
  1. Key takeaway:
Porting elasticsearch ingent pipeline to logstash ensures that Elasticsearch receives the data it expects
Feel free to do additional filtering within the pipeline
The useragent filter plugin add fields in a format that is not ECS compatible; therefore we must manually restructure the data for now
Make sure that the Elasticsearch index matches the index pattern defined within the index template.


How Filebeat Works:
Filebeat also maintains sincedb files like logstash.

Harvester -> Read files and  Keeps track of how much of the file is read. It store how much of the file is read in registry file. Since there are too many files it will track all the files. It will send the data to libbeat.

Libbeat -> Receives file from harvester, it is shared library. It aggregates data and send to the output that is configured. Elasticsearch and libbeat can group multiple lines together to single line, it uses bulk API which is more efficeint that sending multiple requests.

Registry -> Stored in memory, but flushed to disk when filebeat is shutdown. It is formatted as json with file path and offset. It is byte offset. Filebeat keep tracks of each event.

Preventing sending events twice -> Filebeat always wait for acknowledgement from output. If the filebeat is shutdown before acknowledgement is received then it will send duplicate. In that case, we can add a option shutdown_timeout to 5 sec so that it waits. It is not 100%, but reduces risk.

Takeaway:
  1. An input starts up harvester for each file matched by its file pattern
  2. Harversters read files and keep track of their offsets
  3. The file offsets and other information is persisted to a registry

Clearing the registry:
It is under data/registry/filebeat.

data.json -> contains information about registry. It will be present only if harvester starts reading. Typically don’t modify the registry. You can either reset the offset value to 0 or jus delete it.

Processing More Logs:
First deleting the document that we loaded from Kibana Dev Tools

DELETE /filebeat-*

Manual Input Configuration:
  1. Disabling apache module
./filebeat modules disable apache
  1. Open filebeat.yml
enabled: true
paths:
  • <log path>

fields:
events:
module: apache
dataset: "apache.access"

fileds_under_root: true

Uncomment exclude _files

//We are adding fields to match the logstash pipeline.
//[fields][event][dataset] instead of [event][dataset]
  1. No changes needed in logstash pipeline.


Evaluation of modules:
Using modules with Logstash requires additional configuration
Reduces simplicity and convenience, but increases flexibility
In some cases you save a lot of config with modules
Automatically benefit from boilerplate config
Reduce the risk of mistakes. Example fogetting excluding gz file.
Make use of module as much as possible.
Prefer either using modules or not using modules at all; a combination is harder to maintain.

Tagging Events:
It alows tag field to be sent along, can be used for filtering.
tags: ["web-server"]

We cannot always use tags, as we have multiple inputs using same tags. It is better to use fields

Takeaway:
Use  tags option to add one or more tags to Filebeat
These tags can then be used withing the configured output (eg:logstash)
Tags can be used for filtering in logstash, exactly as with any other tas
Tags are added to all events, Ie for all inputs


Approaches for handling multiple log types:
We cannot have two inputs listening on same port. It is limitation of OS, two process cannot listen to same port. So we can have different ports. It is possible by enabling logstash loadbalancer. We can enable multiple enpoints for logstash.  But it is just workaround and exploiting the load balancer.

Solution 1: Start multiple instance of filebeat. So we have two installation of filebeat and run side by side. Then configure data to send to different endpoint.

Solution 2: Managing everything in single logstash pipeline. It will add complexity to pipeline. Running in singlepipeline means need to process lots of log and it is bottleneck

Processing Apache Error Logs:
When using filebeat, we get multiple logs to the port so either we need to have multiple file beat instance or have one filebeat and handle in a single pipeline. He choosed single pipeline.

  1. Go to filebeat.yml and disable filebeat.input. In above example he showed using filebeat without default apache module so he turned it off.
enabled: false (under filebeat.inputs)
  1. Enable apache module
./filebeat modules enable apache
  1. Open apache.yml under modules.d and enable error log. And copy the path.
var.path:["/users/andy/desktop/error.log"]

  1. Create new file apache.conf under /config/pipelines/
  2. Add the reference in pipleline.yml, previously we had access_logs now he edited the same to have just this pipeline.
    • pipeline.id: apache_logs
path.config: "/Users/andy/desktop/logstash/config/pipelines/apache.conf"

  1. In apache.conf
input{
beats {
port => 5044
host => "0.0.0.0"
}
}

filter{
if[event][module] != "apache"{
drop { }
}
if[event][dataset] == "apache.access" {

//NOTE: All code from previous section for acces.log is now copied under this
grok {
match => {
"message"=> [
Copy the Pattern from ingest pipleline. What we copied is json form, so change double escape (\\) to \. Add square bracket to all. For example: in json we will have just source then change to [source]
]
}

//Adding after match option
remove_field => "message"

}
}
if "_grokparsefailure" in [tags]{
drop { }
}

grok{
match => {
"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"
}
}

mutate{
add_field => { "[event][created]" => "%{@timestamp}"}
}

date {
match => [ "[apache][access][time]", "dd/MMM/yyyy:H:m:s Z" ]
remove_field => "[apache][access][time]"
}

<There is big code for Useragent because of non-compatible with ECS>

mutate{
remove_field => ["ua_tmp"]
}

geoip{
source => "[source][ip]"
target => "[source][geo]"

}

} else if [event][dataset] == "apache.error"
{
// Convert the pipline.json from ingest directory under apache module.
grok {
match => {
"message" => [
//Copy the GROK pattern from ingest pipeline, convert // to / and add square bracket.
]
}

pattern_definitions => {

 "APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
}
}

date {
match => [ "[apache][error][timestamp]", "EEE MMM dd H:m:s yyyy", "EEE MMM dd H:m:s.SSSSSS yyyy" ]
remove_field => [apache][error][timestamp]
}

grok{
match => {
"[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"
}
}

geoip{
source => "[source][ip]"
target => "[source][geo]"

}

}


}
output{
elasticsearch{
hosts => "localhost"
manage_template => false
index => "%{[@metadata][beat]-%{[@metadata][version]}-%{+yyyy-MM-dd}}"
}


}



Handling Multiline Using Filebeat:
Now he is creating new logstash pipeline and he not using access.config because it will be too complicated.

  1. Create new config java-errors.conf. To avoid conflict he is increasing the port by one.
input{
beats {
port => 5045
host => "0.0.0.0"
}
}

output{
stdout{ }

}
  1. Add this pipeline to pipeline.yml:
    • pipeline.id: java_logs
path.config: "/Users/andy/desktop/logstash/config/pipelines/java_errors.conf"
  1. Now he has fresh installation of filebeat which runs side by side:
  2. Disable elasticsearch output and enable logstash. Use the port here as 5045 for logstash.
  3. Configure input. Under filebeat.inputs:
enabled: true

paths:
  • /Users/Andy/Desktop/java_errors.log
  1. Multiline options in filebeat.
uncomment multiline.negate: false

In logstash we configured as previous, and the equivalent is after.

Just uncomment multiline.match: after

Uncomment multiline.pattern: '^(\s+|\t)|(Caused by:)'
  1. Start logstash.
  2. Start filebeat

Handling multiline - V2
He now wants to match the file line of stack trace. So far we were matching the next line.

  1. Set negate to true
multiline.negate: true

multiline.pattern: <Copy from GIT it is so big>
  1. Start filebeat


No comments:

Post a Comment

Golang - Email - Secure code warrior

 package mail import ( "net/smtp" "gobin/config" ) var ( emailConfig config.Email ) type Mail struct { Destinati...