Wanted to throw my notes somewhere that I took while I was going through the Splunk Fundamentals 1 course. This course is currently free and I recommend you take it if you want to start using Splunk. This is better used as a review document after going through Fundamentals 1, but you could possibly learn some things by just going through this.
XXX
Splunk structures unstructured data.
Five Main Functions of Splunk
- Indexer collects data from virtually any source. Breaks data into single events by source and normalizes date/time. Events are then stored in an index.
- Searchhead allows you to search for values across multiple data sources.
- Knowledge objects can be added to classify and enrich data.
- Proactively monitor and alert on specific conditions.
- Reports and visualizations can be organized into dashboards.
Splunks organizes events in directories by age. Searches are more efficient by time.
Search Heads handle search requests for users and distribute the requests to the indexers which perform the actual search. Search Heads then consolidate and enrich the data before displaying to the user. They also provide dashboards and reports.
Forwarders are Splunk Enterprise instances that consume data and forward it to the indexers for processing. They consume minimal resources and have little impact on resources so they are usually located at the data source.
Splunk can scale to fit any environment.
Common Splunk commands include:
splunk start | stop | restart splunk help splunk start –accept-license splunk status splunk show splunkd-port | web-port | servername | default-hostname splunk enable boot-start -user
Default Splunk Ports
splunkweb 8000 splunkd 8089 forwarders 9997
Apps & Roles
Apps are workspaces built to solve a specific use case. Proconfigured environments that sit on top of Splunk Enterprise that extend the prebuilt knowledge and capabilities.
Roles determine what a user can see, do, and interact with.
Admin can install apps, create knowledge objects for all users, and ingest data. Power users can create and share knowledge objects for users of an app and do real time searches. Users can only see their own knowledge objects and those that have been shared with them.
Admin users can add data sources from Settings -> Add Data. There are guides to follow, or you can Upload, Monitor or Forward data.
Upload is good for data sources that won’t change and only needs to be indexed once. Monitor allows us to watch files and directories, http events, tcp/udp ports and scripts on a local system. Windows instances can also monitor event logs, file system changes, active directory, and network info. The Forward option receives data from external forwarders. Forwarders are the main source of input in production environments.
Splunk uses sourcetypes to categorize the types of data being indexed. Host name is a unique identifier of where the events originated. Source is the name of the file, stream or other input type. Sourcetype is the specific type of data or data format.
Indexes are directories where data is stored. Having separate indexes makes searching more efficient. Using an index in a search string limits the amount of data Splunk searches and only returns events from that index. Multiple indexes also allow limiting user access by role, controlling who sees what data as well as allowing custom retention policies.
The Search and Reporting app is the default interface for searching and analyzing data. It allows you to create knowledge objects, reports, dashboards and more. The data summary button gives you a breakdown of the data indexed by host, source and sourcetype. Search history allows you to view and rerun past searches.
Limiting a search by time is key to faster results and is a best practice.
The patterns tab allows you to see patterns in the data, giving you a better understanding of what’s happening.
Commands that create statistics and visualizations are called transforming commands. They transform event data into data tables.
By default a search job will remain active for 10 minutes after it is run. Shared search jobs remain active for 7 days.
Export allows you to save results as raw, csv, xml or json format.
Field discovery is disabled in fast mode and only returns info on default fields. Verbose returns as much field and event info as possible. Smart mode toggles behavior based on the search being run.
Selecting or zooming into events uses the original search job. When you zoom out, Splunk runs a new search to return newly selected events.
Text searched for is highlighted and events are displayed in reverse chronological order. Time displayed in the event is based off of the timezone set in your account. Clicking on text in an event will allow you to add to a search, exclude from search or perform a new search.
By default, events are shown in a list view, but can also be set to raw or table.
AND OR NOT are the three booleans used in Splunk. asterisk is a wildcard. fail* is equivalent to fail, failure, failed, etc. Boolean operations have an order of evaluation. 1. NOT 2. OR 3. AND Parenthesis can be used to adjust the order. Quotation marks are used to search for exact phrases. Backslashes () are used to escape quotation marks and special characters.
The fields bar shows all of the fields that were extracted at searchtime. They are broken down into selected fields and interesting fields. Selected fields are most important and show up at the top of the list and at the bottom of events.
host, source and sourcetype are the default selected fields. index is in interesting fields by default.
Interesting fields have values in at least 20% of events. a denotes an alphanumeric (string) value, # denotes a numeric value
You can add a field to selected fields and run reports on a field by clicking on it in the fields sidebar. Clicking on a report will create a transforming search and display results as statistical data. Adding a field to the selected fields will cause it to be displayed with each event and persists for subsequent searches.
You can refine and run more efficient searches by using fields in them. Field names are case sensitive, but field values are not. = (equals) and != (not equals) can be used with numeric or string values. >, >=, <, <= can be used with numerical values. Wildcards can be used with field values.
status != 200 returns searches where does not equal 200 NOT status=200 returns all values without status=200, including events without a status field at all. status IN (“500”, “503”, “505”) returns all events with status fields matching the 3 values.
orange = Booleans and command modifiers blue = commands green = command arguments purple = functions
Splunk Best Practices
Using time to limit events searched is the most efficient way to filter events. The less data you have to search, the faster Splunk will be. After time, the default fields index, source, host and sourcetype are the most powerful. These are extracted at index time and do not need to be extracted at each search. The more you tell the search engine, the more likely you will get good results. Filtering early limits the number of events and makes future manipulations of the data faster.
Inclusion is better than exclusion. “access denied” is better than NOT “access granted”
Time abbreviations can tell Splunk when to search
-30s for seconds, m minutes, h for hours, d for days, w for weeks, mon for months, and y for years @ is used to round down -30m@h run at 9:37 will return events from 9:00 on earliest= and latest= can be used in the search string absolutes like earliest=01/08/2018:12:00:00 can also be used
The best way to filter early in a search is by using indexes. Searching only the indexes that have the data you need can make searches more efficient. Multiple indexes can be searched using the OR operator as well as wildcards.
Search Terms Commands - tell Splunk what to do with results Functions - explain how to compute/evaluate results Arguments - variables to apply to the function Clauses - how we want results grouped or defined
sourcetype=acc* status=200 | stats list(product_name) AS “Product Name” |
sourcetype=acc* stats=200 are search terms stats is a command list(product_name) is a function(and_variable) AS is a clause
Pipes pass current results to the next component
ctrl + \ will separate pipes to new lines
search command | command/function | command/function — search flow —>
Once an item is removed by a command, it is no longer accessible to subsequent search commands.
fields command allows you to include or exclude fields from search results. Useful to limits fields displayed and search faster. Field inclusion happens before extraction and can improve performance.
table command returns searched data in a tabulated format
rename command is used to rename fields. Once renamed, original name is no longer available to subsequent commands.
dedup command removes events with duplicate values. Can be used on single or multiple fields.
sort command displays results in ascending or descending order. + for ascending, - for descending. Sort can use limit to limit the results shown. limit=0 for unlimited results.
Transforming commands order search results into a data table for statistical purposes. Needed to create visualizations.
top find the most common values for a given field. Automatically returns count and percentage, defaults to 10. limit=0 for all.
limit = integer showcount/showperc = True/False countfield = string percentfield = string showother = True/False otherstr = string
by clause to split top values by field.
top product_name by Vendor limit=3 displays the top 3 products for each vendor
rare shows least common values of a field. Functions similar to top.
stats command produces statistics. Common functions include count, distinct count, sum, average, min, max, list and values.
count returns a count of events matching search criteria distinct_count (dc) returns the count of unique values sum returns the sum of all numerical values in a field avg/min/max only work on numeric values in fields list will list all values for a given field value returns unique values for a given field
Reports make saving and sharing searches easy. When a report is run, a fresh set of results is shown. Quick reports can be generated by clicking on a field name and picking a report.
Visualizations are used to tell a better story. They can be saved as reports or dashboard panels.
A dashboard is a collection of reports compiled into one place allowing quick visual access to data.
Pivot allows user to craft reports in a simple to use interface without ever having to craft a search string.
Data models - knowledge objects that provide the data structure that drives pivots. They are created by admin and power users.
Instant pivot works without having a data model.
pivot command is a generating command and must be first in a search pipeline. It requires a large number of inputs: datamodel, datamodel object, and pivot elements.
Entering a non-transforming command into the search bar will put a pivot button in the statistics and visualizations search tabs. This will allow you to choose which fields to use as the data model.
Datasets help users find the data they need and get answers faster.
Lookups allow you to add other fields and values to your events not included in the index data. You can combine fields from sources external to the index with searched events based on paired fields present in the events. This can include csv files, scripts and geospatial data. A lookup is categorized as a dataset. First define the lookup table, then define the lookup. Can optionally configure the lookup to run automatically. Lookup field values are case-sensitive by default.
Batch index query improves performance on large lookup files by grouping index queries.
Additional lookup options include populating lookup table with search results, define lookup based on external script or command, use Splunk DB Connect application to create lookups based on external databases, use geospatial lookups to create queries that can be used to generate choropleth map visualizations, and populate events with KV store fields.
Scheduled reports run at a scheduled interval and can trigger an action each time they run. Useful for dashboards and automatically sending reports via email.
Start with the search we want our report based on, save as report, turn off time ranger picker since it will be scheduled. After saving, select the schedule from additional settings.
Running concurrent reports, and the searches behind them, can put a big demand on your system hardware, even if everything is configured to the recommended specs. Admin users can set higher priorities on scheduled reports. We can use the schedule window to set a time frame the report can run during. If the system is busy, it can delay the report during the window.
Embedding a report makes it available to people who do not have access to the Splunk instance. Embedded reports are viewable by anyone with access to the webpage. An embedded report will not show data until the scheduled report is run. Once enabled, we can no longer edit attributes.
Scheduled report can also be added to dashboards.
Group_Object_Description for naming reports.
Alerts are based on searches that run on scheduled intervals or in real-time. You can be notified when the results of a search meet defined conditions. They are triggered when a search is complete. Alerts can list in interface, log events, output to a lookup, send to a telemetry endpoint, trigger scripts, send emails, use a webook or run a custom alert.
Real-time alerts run continuously and can place more overhead on system performance.
By default, everyone has read access and power users have write access to alerts.
Log events are sent to Splunk deployments for indexing.
Output results to lookup will create or update a csv lookup table.