ELK Tutorial-Efficiently discover, analyze and visualize your data

ELK Tutorial-Efficiently discover, analyze and visualize your data

[Note] This article is translated from: www.edureka.co/blog/elk-st... As more and more IT infrastructures turn to cloud computing, the demand for public cloud security tools and log analysis platforms is also increasing rapidly. Regardless of the size of the organization, large amounts of data are generated every day. A considerable part of this data is composed of the company's Web server logs. Logs are one of the most important sources of information, but they are often overlooked. Each log file contains some valuable information, most of which is unstructured and meaningless. Without a detailed analysis of this log data, companies may ignore the opportunities and threats around them. This is where the log analysis tool is very useful. ELK Stack or Elastic Stack is a complete log analysis solution, which helps in-depth search, analysis and visualization of logs generated by different machines. Through this tutorial, I will provide you with relevant insights. 1. let us list the topics to be discussed:

  • What is ELK Stack?
  • ELK Stack architecture
  • ELK Stack installation
  • Elasticsearch tutorial
  • Logstash tutorial
  • Kibana tutorial

This tutorial will help you understand the basics of Elasticsearch, Logstash and Kibana together, and help you lay a solid foundation in the ELK Stack. First let us understand what is ELK Stack.

What is ELK Stack?

The well-known ELK Stack was recently renamed Elastic Stack. It is a powerful collection of three open source tools: Elasticsearch, Logstash, and Kibana. These three different products are most commonly used together for log analysis in different IT environments. With ELK Stack, you can perform centralized logging, which helps to identify problems with web servers or applications. It allows you to search all logs in one place and identify issues across multiple servers by correlating the logs of multiple servers within a specific time frame. Let us now discuss these tools in detail.

Logstash

Logstash is a data collection pipeline tool. It is the first component of ELK Stack, which collects data input and feeds it into Elasticsearch. It can collect various types of data from different sources at once and provide it immediately for future use.

Elasticsearch

Elasticsearch is a NoSQL database based on Lucene search engine and built using RESTful API. It is a highly flexible distributed search and analysis engine. In addition, it provides simple deployment, maximum reliability, and easy-to-manage features through horizontal scalability. It provides advanced queries to perform detailed analysis, and centrally stores all data to quickly search for documents.

Kibana

Kibana is a data visualization tool. It is used to visualize Elasticsearch documents and help developers understand it immediately. The Kibana dashboard provides various interactive charts, geospatial data, timelines, and charts to visualize complex queries done with Elasticsearch. With Kibana, you can create and save custom graphics according to your specific needs. The next part will discuss the ELK Stack architecture and the data flow in it.

ELK Stack architecture

The following is the architecture of ELK Stack, showing the correct sequence of log streams in ELK. Here, Logstash will collect and process logs generated from various sources according to the provided filter conditions. Then, Logstash pipes these logs to Elasticsearch, and Elasticsearch analyzes and searches the data. Finally, using Kibana, logs can be visualized and managed as required.

ELK Stack installation

**Step I:** Open www.elastic.co/downloads . **Step II:** Select and download Elasticsearch. **Step III:** Select and download Kibana. **Step IV:** Select and download Logstash. **Step V:** Unzip all three files to get the files in the corresponding folder.

Install Elasticsearch

**Step VI:** Now open the elasticsearch folder and go to the bin folder . **Step VII: ** Double-click the elasticsearch.bat file to start the elasticsearch server. **Step VIII: **Wait for the elasticsearch server to start. **Step IX:** To check if the server is started, go to your browser and type localhost:9200 .

Install Kibana

**Step X:** Now open the kibana folder and go to the bin folder . **Step XI: ** Double-click the kibana.bat file to start the kibana server. **Step XII: **Wait for the kibana server to start. **Step XIII:** To check if the server is started, go to your browser and type localhost:5601 .

Install Logstash

**Step XIV: ** Now open the logstash folder . Step XV: To test your logstash installation, open a command prompt and go to the logstash folder. Now enter:

binlogstash -e 'input {stdin {} } output {stdout {}}' copy the code

**Step XVI: **Wait until "Pipeline main started" appears on the command prompt. **Step XVII: **Now, enter a message at the command prompt and press Enter. **Step XVIII: **Logstash appends the timestamp and IP address information to the message and displays it in the command prompt. Now that we have completed the installation, let us now delve deeper into these tools. Let's start with Elasticsearch.

Elasticsearch

As mentioned earlier, Elasticsearch is a highly scalable search engine that runs on the Java-based Lucene engine. It is basically a NoSQL database. This means that it will store data in an unstructured format and cannot perform SQL queries on any type of transaction. In other words, it stores data in documents instead of tables and schemas. To get a better image, check the table below, which shows what is in Elasticsearch compared to the database. Now let us get familiar with the basic concepts of Elasticsearch. When using Elasticsearch, you need to follow three main steps:

  1. index
  2. Mapping
  3. search for

Let's talk about them in detail one by one.

index

Indexing is the process of adding data to Elasticsearch. It is called an "index" because after data is entered into Elasticsearch, it will be placed in the Apache Lucene index. Then, Elasticsearch uses these Lucene indexes to store and retrieve data. Indexing is similar to the creation and update process of CRUD operations. Indexing scheme by the id name/type/ , where the names and types is required. If you don't provide any ID, Elasticsearch will provide one by itself. Then, append the entire query to the HTTP PUT request, and the final URL is as follows: PUT name/type/id along with the HTTP payload, will also send a JSON document containing fields and values. The following is an example of creating a document for a US customer, the document and the details in its fields.

PUT/customer/US/1 { "ID": 101, "FName": "James", "LName": "Butt", "Email": "jbutt@gmail.com", "City": "New Orleans", "Type": "VIP" } Copy code

It will give you the following output: This shows that the document has been created and added to the index. Now, if you try to change the field details without changing the ID, Elasticsearch will overwrite the existing document with the current details.

PUT/customer/US/1 { "ID": 101, "FName": "James", "LName": "Butt", "Email": "jbutt@yahoo.com", "City": "Los Angeles", "Type": "VVIP" } Copy code

This shows that the document has been updated with the new details of the index.

Mapping

Mapping is the process of setting the index mode. Through mapping, you can tell Elasticsearch the data type of the attributes in your schema. If there is no mapping for a specific object during pre-indexing, Elasticsearch will dynamically add the generic type to the field. But these generic types are very basic, and most of the time they cannot meet the expectations of the query. Now let us try to map the query.

PUT/customer/ { "mappings": { "US": { "properties": { "ID": { "type": "long" }, "FName": { "type": "text" }, "LName": { "type": "text" }, "Email": { "type": "text" }, "City": { "type": "text" }, "Type": { "type": "text" } } } } } Copy code

When you execute a query, you will get this type of output.

search for

General search queries with specific indexes and types are as follows:

POST index/type/_searchCopy code

Now, let's try to search the details of all customers that exist in the "customer" index.

POST/customer/US/_searchCopy code

When you execute this query, the following results will be generated: However, when you want to search for specific results, Elasticsearch provides three methods:

Use query

Using query, you can search for some specific documents or items. For example, let's perform a search query on customers belonging to the "VVIP" category.

POST/customer/US/_search { "query": { "match": { "Type": "VVIP" } } } Copy code

Use filter

Using filters, you can further narrow your search. The following is an example of searching for VVIP customers with ID "101":

POST/customer/_search { "query": { "match": { "Type": "VVIP" } }, "post_filter": { "match": { "ID": 101 } } } Copy code

If you execute this query, you will get the following results:

Use aggregation

Aggregation is a framework that helps aggregate data through search queries. Small aggregates can be combined to build a complex summary of the data provided. Let's perform a simple summary to check how many types of customers are in the index:

POST/customer/_search { "size": 0, "aggs": { "Cust_Types": { "terms": { "field": "Type.keyword" } } } } Copy code

Now let us see how to retrieve the data set from the index.

retrieve data

To check the list of documents contained in the index, you only need to send an HTTP GET request in the following format:

GET index/type/idCopy code

Let's try to retrieve the details of customers whose "id" is equal to 2:

GET/customer/US/ 2Copy code

After successful execution, it will provide you with the following types of results. With Elasticsearch, you can not only browse data, but also delete or delete documents.

delete data

Using delete conventions, you can easily delete unwanted data from the index and free up memory space. To delete any document, you need to send an HTTP DELETE request in the following format:

DELETE index/type/id.Copy code

Now let's try to delete the details of the customer with ID 2.

DELETE/customer/US/2 copy the code

When you execute this query, you will get the following types of results. So far, we have explained the basics of CRUD operations using Elasticsearch. Understanding these basic operations will help you perform different types of searches. Now let's start learning Logstash, the next tool of ELK Stack.

Logstash

As I have already discussed, Logstash is a pipeline tool, usually used to collect and forward logs or events. It is an open source data collection engine that can dynamically integrate data from various sources and standardize it to a designated target location. Using multiple inputs, filters and output plugins, Logstash can easily transform various events. At the very least, Logstash needs the input and output plugins specified in its configuration file to perform the conversion. The following is the structure of the Logstash configuration file:

input { ... } filter { ... } output { ... } Copy code

As you can see, the entire configuration file is divided into three parts, and each part contains configuration options for one or more plugins. The three parts are:

  1. input
  2. filter
  3. output

You can also apply multiple filters in the configuration file. In this case, its application sequence will be the same as the canonical sequence in the configuration file. Now, let's try to configure the US customer data set file in CSV file format.

file { path => "E:/ELK/data/US_Customer_List.csv" start_position => "beginning" sincedb_path => "/dev/null" } } filter { csv { separator => "," columns => ["Cust_ID", "Cust_Fname", "Cust_Lname", "Cust_Email", "Cust_City", "Cust_Type"] } mutate { convert => ["Cust_ID", "integer"] } } output { elasticsearch { hosts => "localhost" index => "customers" document_type => "US_Based_Cust" } stdout {} } Copy code

To insert this CSV file data into elasticsearch, you must notify the Logstash server. To do this, follow these steps:

  1. Open command prompt
  2. Enter the bin directory of Logstash
  3. Type: logstash -f X:/foldername/config_filename.config and press Enter. Once your logstash server is up and running, it will start transferring the data in the file to Elasticsearch.

If you want to check whether the data has been inserted successfully, go to the Sense plugin and type:

GET/customers/
It will give you the number of documents created. Now, if you want to visualize this data, you must use the last tool of ELK Stack, which is Kibana. Therefore, in the next part of this tutorial, I will discuss Kibana and how to use it to visualize your data.

Kibana

As mentioned earlier, Kibana is an open source visualization and analysis tool. It helps to visualize the data that the Logstash pipeline transmits and stores in Elasticsearch. You can use Kibana to search, view and interact with this stored data, and then visualize it in various charts, tables, and maps. Kibana's browser-based interface simplifies massive amounts of data and reflects real-time changes in Elasticsearch queries. In addition, you can easily create, customize, save, and share dashboards. Once you understand how to use it with Elasticsearch and Logstash, learning Kibana is not a big deal. In this part of the tutorial, I will introduce you to the various functions needed to analyze the data.

Management page

Here, you must perform the runtime configuration of Kibana. On this page, you need to specify some search content. See the following example, in which I have configured entries for the "customer" index. As you can see, in the "Index Patterns" field, you need to specify the index you want to use. Make sure to select it as **@timestamp** in "Time Filter Field Name". Then you can go ahead and click Create to create the index. If the index is created successfully, you will see the following page types: Here, you can select different filters from the drop-down list as needed. In addition, to free up memory, you can also delete specific indexes.

Discovery page

Through the "Discover" page, you can access the documents that exist in each index that matches the selected index pattern. You can easily interact and browse all the data that exists on the Kibana server. In addition, you can view the data existing in the document and perform search queries on it. As you can see below, I am searching for "VIP" customers from "Los Angeles". Therefore, as you can see, we only have one VIP customer from Los Angeles.

Visualization page

The visualization page allows you to visualize the data displayed in the Elasticsearch index in the form of charts, bar charts, pie charts, etc. You can even build dashboards here that will display relevant visualizations based on Elasticsearch queries. Typically, a series of Elasticsearch aggregate queries are used to extract and process data. When you go to the "Visualization" page and search for a saved visualization, or you can create a new visualization. You can summarize data in any form. For the convenience of users, different types of visualization options are provided. Let me show you how to visualize US customer data based on user type. To perform visualization, follow these steps:

  1. Select the visualization type. [Here I use a pie chart]
  2. In the summary field, select "term" from the drop-down list.
  3. In "field", select the type of field to perform the search.
  4. You can also specify the order and size of visualizations.
  5. Now click the execute button to generate the pie chart.

Dashboard page

The "Dashboard" page displays a collection of saved visualizations. Here, you can add new visualizations or use any saved visualizations.

Timelion page

Timelion is a time series data visualization tool that integrates completely independent data sources into one interface. It is driven by a single-line expression language that can be used to retrieve time series data, perform calculations to simplify complex problems, and visualize results.

Development tool page

Kibana's "Development Tools" page contains development tools such as the "Beta Sense" plug-in, which can be used to interact with the data that exists in Elasticsearch. It is often referred to as Kibana's console. The following is an example, in which I used Kibana's Sense plugin to search the "customers" index of type "US_based_cust": This concludes this article. Now you can use Logstash, Elasticsearch, and Kibana to perform various searches and analysis on any data.