One place for hosting & domains

      Understanding

      Understanding Cryptography’s Meaning and Function


      Cryptography is a cornerstone of modern secure communication practices. From digital signatures to disk encryption, these everyday applications of cryptography enable users of the Internet, developers, and business to keep sensitive data private. This guide provides an overview of what cryptography is, a brief history of cryptography, and the differences between symmetric asymmetric encryption.

      What Is Cryptography?

      The discipline of cryptography includes the study and practice of transforming data from its original format into an unintelligible format. The goal of cryptography is to keep information secure at rest and during its transfer. In the context of computer science, cryptography focuses on the mathematical concepts and algorithms that keep communications hidden from unauthorized viewers. There are three basic types of cryptographic algorithms that are used: secret key, public key, and hash function algorithms. Data encryption applies the principles of cryptography and refers to the method used to encode data into an unintelligible format.

      Cryptography enables cybersecurity professionals to
      secure sensitive company information
      . Well-known examples of cryptographic techniques used in cybersecurity are digital signatures, time stamping, the SSL protocol, and
      public key authentication with secure shell (SSH)
      .

      History of Cryptography

      While the use of cryptography in network communications began with the advent of computers, the origins of cryptography extends much further back into history. The earliest known use to date is in an inscription that belonged to a nobleman’s tomb in Egypt in 1900 B.C. The inscriber inserted unusual symbols in place of more common hieroglyphic symbols to transform the inscription. It is widely theorized that this behavior was not intended to hide the inscription, but to make it appear more dignified and educated. However, the original text was transformed much in the same way that cryptography seeks to transform text to keep its original meaning secret.

      Early uses of cryptography intended to hide a message date back to numerous early civilizations. Keeping information private has been a consistent need for human societies. One early stage example,
      Arthashastra
      , is a classic on statecraft written circa 350-275 BCE. It includes mentions of India’s early espionage service and the “secret writings” used to communicate with spies. Julius Caesar was known to use cryptography to communicate with his army generals in 100 BC, as did numerous other leaders with armies and wars to fight.

      According to Britannica, there
      are three distinct stages in the development of cryptography
      over time. The first is manual cryptography, the second is mechanized cryptography, and the third is digital cryptography.

      The first cipher requiring a decryption key was developed in the 16th century. It is known as the
      Vigenere cipher
      which is described as “a poly-alphabetic substitution system that uses a key and a double-entry table.”

      An example of the second stage, that is the mechanization of cryptography, is the
      Hebern rotor machine
      which was developed after electricity became available in the 18th century. It embedded a secret key on a rotating disk. Another example is the famous
      Enigma machine
      which was invented at the end of World War II. It used multiple rotors that rotated at different rates while the user typed. The key was the initial setting of the rotors.

      Cryptography was used almost exclusively for military purposes for most of its history. That changed substantially in the early 1970s when IBM customers demanded additional security when using computers. For this reason, IBM developed a cipher called
      Lucifer
      .

      As computer usage increased within government agencies, the demand for less militarized applications of cryptography increased. This began the era of digital cryptography which sought to counter the growing cybersecurity attacks. In 1973, the U.S. National Bureau of Standards (NIST) sought a block cipher to become the national standard. Lucifer was accepted and dubbed the
      Data Encryption Standard (DES)
      . However, it failed to withstand intensifying brute force attacks as computing and cyber attacks became more powerful. In response, NIST solicited a new block cipher in 1997 and received 50 submissions of possible contenders. NIST chose Rijndael in 2000 and renamed it the
      Advanced Encryption Standard (AES)
      .

      Although encryption standards exist today, cryptography continues to evolve. The cryptography of the present is anchored to computer science algorithms and mathematics, like number theory.

      Symmetric vs. Asymmetric Cryptography

      The two main forms of encryption utilized by cryptography are symmetric and asymmetric. Symmetric cryptography encrypts and decrypts with a single key. Asymmetric cryptography uses two linked keys, one public and the other private.

      Both forms of encryption are used everyday, although most computer users typically don’t notice them. They’re at work in the background every time someone uses their web browser, answers emails, submits a web form, as well as other activities.

      People tend to notice cryptography when they initiate its use or directly observe it in use. One example is when using OpenSSL key management services. Another example is when emailing an encrypted document, like an Adobe PDF file that requires a password in order for it to be opened.

      Symmetric encryption is the most widely used and the oldest form of encryption. It dates back to Julius Caesar’s cipher. Symmetric encryption uses either
      stream
      or
      block cipher
      to encrypt plain text data.

      While symmetric encryption requires the sender and recipient to use the same key, that key’s use is not limited to two people in a linear conversation. Others can also be designated recipients and use the same key. Likewise, any of the recipients can respond to the sender, plus anyone on the approved list of recipients using the same key from the initial encrypted message.

      Thus, if an unauthorized person were to gain the symmetric key, that person could see, read, copy, forward the message to new recipients, and even respond to the original group. Hackers gain access to the key either by pilfering it from a storage space on a device that hasn’t been properly secured, or by extracting it from the message itself.

      The key must be transmitted when the sender and receiver are not in the same location. It is therefore vulnerable if the network or channel are compromised and must be closely protected.

      By comparison, asymmetric cryptography uses two linked keys, one public and the other private, on each side of the conversation or transaction. Both sender and receiver have a private key in their possession alone. Each also has a public key – meaning a unique key of their own made public only by virtue of being exchanged with another person. The sender uses the recipient’s public key to encrypt the file. The recipient then uses their private key to decrypt it. Only the recipient can decrypt the file because no one else has access to that person’s private key. Asymmetric encryption also enables digital signature authentication.

      Examples of asymmetric cryptography in everyday use include
      RSA
      , the
      Digital Signature Standard (DSS/DSA)
      , and the
      TLS/SSL protocol
      .

      Both forms are considered secure, but the level of security in any given encrypted message has more to do with the size of the key(s) than the form of encryption. Just like passwords, keys must be complex, difficult to obtain, decode, or reveal.

      The Objectives of Cryptography

      Cryptography has four major goals: confidentiality, integrity, authentication, and non-repudiation. Put another way, the goals are data privacy (confidential treatment), data authenticity (verified source), and data integrity (original and unaltered message). Non-repudiation refers to the combination of each of these three things to prove undeniable validity of the message or data. One example of non-repudiation in use is a service used to authenticate digital signatures and to ensure that a person cannot reasonably deny having signed a document. Some popular examples are
      DocuSign
      and
      PandaDoc
      .

      Of these goals, confidentiality carries the most weight. The need to ensure that an unauthorized party cannot access the data is the ultimate objective of cryptography. That does not mean that the remaining goals are of less importance.

      Data integrity is vital to ensure that the message has not been altered in some way. Otherwise, the receiving party could be manipulated into taking a wrong or undesirable action. Whether a spy is sending a message to their country’s leadership, or a company is sending instructions to a field office, both sender and receiver need assurance that the message sent is identical to the message received.

      Authenticity is essential to ensure that the user or system is known and trusted. Establishing the identity of the user (sender or recipient) is the crux of this assurance. However, the system must also be known in order to
      prevent ransomware attacks
      that involve phishing (fraudulent emails), vishing (fraudulent voice mails and phone calls), smishing (fraudulent texts), and other deceptive forms of communication.

      Types of Cryptography

      There are three types of cryptography: secret key cryptography, public key cryptography, and hash functions.

      The least complicated and fastest to use is secret key cryptography, also known as symmetric cryptography. This type uses one key to encrypt and decrypt communications. It protects data at rest and data in transit, and is most often used on data at rest. The most well-known algorithms used in secret key cryptography are
      Advanced Encryption Standard (AES)
      ,
      Triple Data Encryption Standard (3DES)
      , and
      Rivest Cipher 4 (RC4)
      .

      Public key cryptography, or asymmetric cryptography, uses two keys on each end of the communication. Each pair consists of a public and a private key. Public keys are exchanged between sender and recipient. The sender then uses the recipient’s public key to encrypt the message. The recipient uses their private key to decrypt the message. Examples of public key use are plentiful in just about any communication over the Internet such as
      HTTPS
      ,
      SSH
      ,
      OpenPGP
      ,
      S/MIME
      , and a
      website’s SSL/TLS certificate
      .

      The math connecting public and private keys makes it impossible to derive the private key from the public key. However, the public key is derived from the private key, which is why private keys should never be shared.

      Hash functions are one-way functions and completely irreversible. This renders the original message unrecoverable. A hashing algorithm produces unique outputs for each input. Examples include
      SHA-256 and SHA3-256
      , both of which change any input into a new and complex 256-bit output. Bitcoin, the largest and best known of the cryptocurrencies, uses SHA-256 cryptographic hash function in its algorithm. Almost all passwords are stored securely as hashed functions which are then used to verify the correct password is being used. A hacker must try every input possible to find the exact same hash, which renders the effort useless.

      What is Cryptography in Cyber Security?

      Modern cryptography is based on mathematical theory and computer science. It continues to evolve as computing becomes more powerful. For example,
      quantum computers will break today’s encryption
      standards in the foreseeable future. Computer scientists are already hard at work developing quantum-safe algorithms and security protocols. Whatever the solutions turn out to be, they’ll be built based on the laws of physics and the rules of mathematics.

      Both now and in the future, cryptography is central to cybersecurity efforts. Whether it is protecting data points and documents across communication channels, or large data sets in transit or at rest in storage and on devices; cryptography is the first line of defense. Nothing is fool-proof, and therefore all things in cybersecurity, including cryptography, must evolve to match increasingly sophisticated threats and evermore powerful computers.

      To understand the necessity of encryption, one need only to look at the headlines. The frequency of data breaches and intercepted or leaked messages is readily apparent. In February 2022 alone, more than 5.1 million records were breached, according to research by
      IT Governance
      .

      The central assumption with cryptography is that other parties are going to try to breach data and many are going to be successful. Encryption is meant to thwart their efforts even if they succeed in reaching the data. It is an essential line of defense in cybersecurity architecture and hinders an attacker’s efforts to access sensitive information.

      Other forms of cybersecurity focus on other fronts such as protecting the network, limiting or stopping access to data, and protecting data from manipulation, i.e. deliberate corruption of meaning or readability.

      Layers of different cybersecurity methods
      work in tandem to provide a better, stronger defense. Even so, encrypting data is a primary defense used across all efforts in protecting data. Its use is of particular value to secure communications which by necessity must be shared with parties beyond secure company walls.

      Conclusion

      Cybersecurity and encryption are tasks that require research, time, and effort in order to be effective. Many companies prefer to leverage the efforts of vendor teams rather than overburden their internal cybersecurity teams to develop these additional layers of protection. However, there are many tools available to encrypt areas of your infrastructure and network. For example, you can use
      LUKS to encrypt a Linux server’s filesystem disk
      . Similarly, you can use
      GPG keys to send encrypted messages via email
      .



      Source link

      Understanding Data Structures: Definition, Uses & Benefits


      Data structures are used to create efficient, clear, and organized programs. Among the best known
      data structures
      are lists and arrays. This guide introduces and defines data structures and explains why they are important. It also highlights the most important data structures and clarifies when and how each one can be used.

      What are Data Structures?

      In computer science, a data structure is a format that organizes, manages, and stores data. For a data structure to be practical, it must be relevant to the task and easy to use. Programmers should be able to quickly and efficiently store and retrieve data using the structure. Additionally, the data must be arranged in a sensible way within the context of the program. A data structure must also support useful and usable algorithms.

      The following three components define a data structure:

      • Relationships: In a data structure, elements are related or connected in some way. They follow a sequence, or are arranged in a certain format. This contrasts with an assortment of different but unrelated variables. For example, three stand-alone integers do not necessarily share any relationship.
      • Operations: Each data structure is associated with a collection of functions that can implement and manipulate it. These integrated algorithms are used to interact with the actual data. Most data structures support operations such as adding or deleting an item. However, other operations are only meaningful in certain situations. For instance, it might make sense to sort an array, but not a hash table.
      • Data Values and Type: The definition of a data structure also includes the values it contains and the type of data it allows. In some data structures, all values must have the same type, while others do not enforce any restrictions.

      Data structures are central to a logical model known as an abstract data type (ADT). While a data structure is designed and analyzed from the developer’s perspective, the ADT model is user-centric. For the user, the values and operations the ADT supports and its behavior in various situations are more important. These two perspectives do not always align, and the user is almost never aware of the internal implementation.

      Data structures are also central to Object Oriented Programming (OOP). In OOP, both the data structure and its operations are defined together in the class definition. These operations include the constructor, destructor, and methods of the class, and the API/interface. Each object encapsulates its own set of values, along with the relationships, operations, and type information from the class definition. This means every instance of a class is also a type of user-defined data structure.

      Data structures can be built into a language or developer defined. For instance, Python provides documentation about its
      core data structures
      . Low-level languages usually support only the most basic of built-in structures. But higher-level languages provide libraries for very advanced and complex data structures, including graphs and maps.

      Technically, even primary data types, such as integers, can be considered data structures. However, in everyday use, the term is reserved for compound data types, which might contain multiple items. These data structures are constructed from core data types including integers, characters, Booleans, and pointers. In most cases, a data structure includes a set of pre-defined procedures or methods to manipulate the structure. However, developers can add their own algorithms to the core data structure.

      Many essential data structures take advantage of pointers and pointer arithmetic to store and retrieve data. This aligns with the internal hardware model of the underlying system and is a very fast and efficient way to manage data. Depending on the programming language, some data types can be accessed using either a more user-friendly array subscript or the underlying memory space. A good example of this design is an array in the C programming language.

      Why are Data Structures Important?

      Data structures are essential whenever primary data types are insufficient to organize and process the data. For example, if a program is only using three pieces of data, then three separate variables are sufficient. However, if the program must handle thousands of values, it is impractical to use a different variable for each one. The program would be messy, confusing, and unmanageable. In this case, a compound data structure such as a list, array, or hash table must be used to manage the data. To generalize, data structures encourage more formal programming techniques and are indispensable for larger programs. Here are some of the main advantages of data structures.

      • Data structures allow developers to organize their data and present it in a logical manner. Because the data is handled cleanly and clearly, this results in more manageable programs containing fewer variables.
      • Data structures are programmatic building blocks that naturally encourage modular program design and professional coding practices. It is easy to develop interfaces and to pass data between functions using data structures.
      • It is often faster and more memory-efficient to use data structures. The algorithms associated with each data structure are optimized for speed and utility. For example, certain types of trees allow developers to quickly find a specific entry based on a search key.
      • Data structures lend themselves to standardized and established solutions. In many cases, the best known algorithm for a given task requires a specific data structure. The best routing algorithms all use some type of graph data structures. This reduces development and test efforts.
      • They are a shared convention between programmers. For example, most experienced developers know what trees and hash tables are and when to use them.
      • They are more readable and more maintainable.

      Data structures are widely used in computing, but they are particularly useful in the situations listed below. In certain cases, data structures were designed to solve a particular type of problem.

      • Searching: Many data structures support efficient algorithms to find a specific entry from within a longer list.
      • Storage/Scaling: Using data structures, large amounts of data can be effectively structured, organized, and stored. Several data structures are designed to interact with relational database management systems.
      • Indexing: Hash tables and some tree structures can index a long list of entries.
      • Sorting: A classic data structure called a binary search tree is often used to sort a list of unordered data into alphabetical order or some other arrangement.
      • Listing: Simple data structures such as arrays can retrieve any data items matching a set of criteria.
      • Data Transfer: Data structures are a good choice to share information through an interface or API because both client and server have a common understanding of the data. The transfer might occur between classes or functions, or between a client and a server.

      Types of Data Structures

      A handful of common data structures are used repeatedly in many circumstances, but there are many more specific alternatives. All data structures are built out of the base data types, including integers, floats, characters, pointers, and strings. Data structures can be classified as either linear or non-linear.

      Linear Data Structure Types

      In a linear data type, the individual components within the structure are arranged in sequential order. The items might be sorted based on a particular index or when they were received. Linear data types are usually straightforward to understand and implement. A program can walk through the data structure to retrieve each item. Unfortunately, linear data types are not always the most efficient choice. Their performance can degrade quickly as the size of the data set increases.

      In many cases, programming languages already provide an implementation for many of these types. Libraries can also provide additional algorithms and support for more elaborate data structures. For instance, the core Python package provides support for lists, sets, and dictionaries. However, it also has library files for stacks and queues. Here are some of the essential linear data types:

      • String: Many developers consider a string to be a basic data type, because strings are universally supported. However, they are actually a linear data type formed from individual characters. Strings are very simple to use and essential to almost every program. Although the names of variables are also strings, they are identifiers and are not a data type.
      • Array: An array is a list of items in sequential order. Almost every language supports some type of array, but in most cases, all items must be of the same type. Arrays can be either immutable and fixed, or mutable and able to be changed. Some implementations allow the size of the array to change so items can be added or removed. In most languages, an array subscript is used to access the individual items. This usually involves some type of bracketing mechanism. The brackets enclose the position of the item within the array, for example [6].
      • Stack: A stack is a list that is written and read in a last in first out (LIFO) order. Each new item is “pushed” onto the top of the stack. When the stack is later accessed, the most recent entry is “popped” off the stack. Stacks can be implemented using a basic array, but they can also use another format. They are especially useful for evaluating mathematical expressions and in compilers.
      • Queue: Queues are very similar to stacks, except they follow a first in first out (FIFO) format. The next item to be popped from a queue is always the oldest item. Queues are the best option for scheduling purposes and for processing incoming client requests. Some applications use multiple queues for scheduling items of differing priorities. The high-priority queue is serviced first, even if items in the lower-priority queues have been waiting longer.
      • Linked List: Linked lists are a versatile and memory-efficient data structure. In addition to storing data, each node also points to the next node. This is accomplished through the use of pointers containing the memory address of the next entry. Linked lists can easily expand and shrink to accommodate lists of variable lengths. However, it is difficult and time-consuming to access a specific item. In the worst case, which occurs when the item is the last entry, every node must be visited. Some linked lists are bidirectional, so the items can also be traversed in reverse order.

      Non-Linear Data Structure Types

      Non-linear data structures are organized around the relationships between the individual nodes. The collection of items do not share a consistent sequence. These relationships are often hierarchical. Each parent entry has one or more children. However, some non-linear data types summarize relationships without any hierarchy.

      These data structures are more complicated and are often more difficult to work with. They are not as frequently built into a language, so a library or user-based implementation might be required. However, these data structures can store information efficiently and find individual entries very quickly. The run time for any algorithms operating on these data structures does not change dramatically as the number of items increases. They can therefore be applied to very large data sets. Here are some of the more common non-linear data structures.

      • Graph: Graphs describe the connections between individual entries, known as vertices. Each vertex is connected to one or more other vertices by edges. Related vertices share an edge, while unrelated vertices do not. The edges might have weights, indicating the cost, distance, or intensity of the relationship. The map of vertices and edges fully describes the topography of the domain. Graphs are often used for routing and mapping. A spanning tree is an important internal graphing structure. It connects all nodes in the graph together without any cycles.
      • Tree: A tree is a particular type of graph where there is only one possible path between any two nodes. This means trees cannot contain loops, nor is it possible to visit each node on a tree without extensive backtracking. Trees have a unidirectional hierarchical structure. At the top level, a parent “root” vertex has one or more children. Each of those child nodes can also have children. Trees are frequently used to represent directories or to sort and search data. There are many types of trees including binary search trees, AVL trees, B-trees, and red-black trees. Many domain-specific trees have been developed. For instance, a trie is a type of tree for storing prefixes. It is used for dictionaries and for finding items.
      • Heap: A heap is a special type of tree. The value stored in the parent node is either less or more than all of its children. This relationship holds for each node at every level of the heap. Heaps are a good choice for ordering and organizing data. But they frequently have to be rebalanced when nodes are added or removed. So they are better suited for relatively static data sets.
      • Record: A record is a list of key-value pairs. Each of the keys, or fields, contains a value. The different values are not required to have the same type. However, a value can potentially be another record, leading to a nested structure. Records can potentially support a large number of fields, but they can become inefficient as the number of pairs grow. Records are used in databases and can serve as a container for related information. A good example might be all relevant information about a customer or an employee. In object-oriented programming, each instance of a class is technically a record.
      • Hash Table: A hash table is an associative array. Each key, or index, inside a hash table maps to a set of values, known as a bucket. The key for each entry is calculated using a hash function. Hash functions vary in complexity. A number might be hashed by dividing it by a smaller number and taking the remainder as the hash key. Some hash tables assign each entry to its own bucket, but many implementations allow collisions where more than one key has the same index. There is always a tradeoff between the size of the table and the speed of entry lookups, but certain formulas are known to approximate the best compromise. Hash tables have many uses including database indexing, compilers, and caches.

      How to Choose a Data Structure for Your Next Project

      Each data structure is best suited to a certain set of tasks, and should complement the necessary data operations. A mismatch between the task and the structure might make a program less efficient or more confusing, so it is important to choose correctly. Additionally, the same information can be represented using various data structures. For instance, an ordered list can be structured as an array, or some variant of a tree. The tree is more efficient, but for a small data set, an array might be easier to use.

      Some data structures are very common in certain contexts. For example, compilers almost always use hash tables to look up the type and current value of variables. Some data structures were originally designed for a single task. Here are some issues to consider when choosing a data structure.

      • Storage capacity: Certain data structures are very convenient and easy to use, but use up more memory. In a memory-constrained environment, minimalist data structures like linked lists are more useful.
      • Performance requirements: Arrays are simple, but it can take longer to search for an individual item. Pointer-based tree structures or hash tables are often quicker. Developers must always consider the performance requirements for their application. There are often trade-offs to be made, but inefficient data structures and algorithms can quickly become unusable. This issue is related to
        Big O notation
        . The Big O value of an algorithm describes how the execution time increases as the size of the data set grows.
      • The type of data: The type and nature of the data often dictates the data structure. Ordered and unordered data benefit from different structures. Simple data items such as a series of integers can be stored in an array. However, a graph can better handle a list of connections between items.
      • How the data is used: It is important to consider how the data is used when selecting a data structure. If the data is repeatedly accessed or updated, a more efficient data structure is required.
      • Ease of use: Some data structures are easy for inexperienced programmers to implement and use. For simple internal applications without stringent performance requirements, it is usually better to use straightforward structures.
      • Permanence: Transient information that should not be saved might be stored in an efficient manner. Stored data is best handled using a data structure that promotes easy access.

      A decision about what data structure to use is often based on several factors. In many cases, there is not one perfect choice. Below are a couple of examples demonstrating how a data structure can be chosen.

      If all items from an unordered list must be read and later processed once, then an array is an effective and simple choice. Arrays are quite fast in this context, they can scale, and they are easy to use.

      However, if the relationships between these items are of great significance, then the more complicated graph data structure is the only reasonable choice. Graphs are designed to illustrate the connections between items in a fast and memory-efficient manner. Unfortunately, they are one of the more complicated data structures and it takes some practice to effectively use them.

      Conclusion

      The definition of a data structure is “a data format that helps developers organize, manage, and store information”. Computer data structures are described by the relationships between the items, the operations supported by the structure, and the actual values of the items. Developers often create new data structures and algorithms for an application, but many structures are built into the main programming languages.

      Some data structures are linear. This means the items are arranged in sequential order. Others are non-linear and should be used when the relationships between items is important. The most widely-used data structures include arrays, stacks, queues, records, trees, graphs, linked lists, and hash tables. There are many factors involved in choosing a data structure to use. However, memory use, performance, and ease of use are the most important. If you’d like to try out some of the data structures discussed in this guide, visit our documentation library’s
      Python section
      . This section includes guides on various primary data types and linear data types in Python.

      More Information

      You may wish to consult the following resources for additional information
      on this topic. While these are provided in the hope that they will be
      useful, please note that we cannot vouch for the accuracy or timeliness of
      externally hosted materials.



      Source link

      Understanding Suricata Signatures


      Introduction

      The first tutorial in this series explained how to install and configure Suricata. If you followed that tutorial, you also learned how to download and update Suricata rulesets, and how to examine logs for alerts about suspicious activity. However, the rules that you downloaded in that tutorial are numerous, and cover many different protocols, applications, and attack vectors that may not be relevant to your network and servers.

      In this tutorial you’ll learn how Suricata signatures are structured, and some important options that are commonly used in most rules. Once you are familiar with how to understand the structure and fields in a signature, you’ll be able to write your own signatures that you can combine with a firewall to alert you about most suspicious traffic to your servers, without needing to use other external rulesets.

      This approach to writing and managing rules means that you can use Suricata more efficiently, since it only needs to process the specific rules that you write. Once you have a ruleset that describes the majority of the legitimate and suspicious traffic that you expect to encounter in your network, you can start to selectively drop invalid traffic using Suricata in its active Intrusion Prevention (IPS) mode. The next tutorial in this series will explain how to enable Suricata’s IPS functionality.

      Prerequisites

      For the purposes of this tutorial, you can run Suricata on any system, since signatures generally do not require any particular operating system. If you are following this tutorial series, then you should already have:

      Understanding the Structure of Suricata Signatures

      Suricata signatures can appear complex at first, but once you learn how they are structured, and how Suricata processes them, you’ll be able to create your own rules to suit your network’s requirements.

      At a high level, Suricata signatures consist of three parts:

      1. An Action to take when traffic matches the rule.
      2. A Header that describes hosts, IP addresses, ports, protocols, and the direction of traffic (incoming, or outgoing).
      3. Options, which specify things like the Signature ID (sid), log message, regular expressions that match the contents of packets, classification type, and other modifiers that can help narrow identify legitimate and suspicious traffic.

      The general structure of a signature is the following:

      Generic Rule Structure

      ACTION HEADER OPTIONS
      

      The header and options portions of a signature have multiple sections. For example, in the previous tutorial, you tested Suricata using the rule with sid 2100498. Here is the complete rule for reference:

      sid:2100498

      alert ip any any -> any any (msg:"GPL ATTACK_RESPONSE id check returned root"; content:"uid=0|28|root|29|"; classtype:bad-unknown; sid:2100498; rev:7; metadata:created_at 2010_09_23, updated_at 2010_09_23;)
      

      The alert portion of the signature is the action, the ip any any -> any any section is the header, and the rest of the signature starting with (msg:GPL ATTACK_RESPONSE... contains the rule’s options.

      In the following sections you’ll examine each part of a Suricata rule in detail.

      Actions

      The first part of the sid:2100498 signature is the action, in this case alert. The action portion of a Suricata signature specifies the action to take when a packet matches the rule. An action can be one of the following depending on whether Suricata is operating in IDS or IPS mode:

      • Pass – Suricata will stop scanning the packet and allow it, without generating an alert.
      • Drop – When working in IPS mode, Suricata will immediately stop processing the packet and generate an alert. If the connection that generated the packet uses TCP it will time out.
      • Reject – When Suricata is running IPS mode, a TCP reset packet will be sent, and Suricata will drop the matching packet.
      • Alert – Suricata will generate an alert and log it for further analysis.

      Each Suricata signature has a header section that describes the network protocol, source and destination IP addresses, ports, and direction of traffic. Referring to the example sid:2100498 signature, the header section of the rule is the highlighted ip any any -> any any portion:

      sid:2100498

      alert ip any any -> any any (msg:"GPL ATTACK_RESPONSE id check returned root"; content:"uid=0|28|root|29|"; classtype:bad-unknown; sid:2100498; rev:7; metadata:created_at 2010_09_23, updated_at 2010_09_23;)
      

      The general format of a rule’s header section is:

      Rule Format

      <PROTOCOL> <SOURCE IP> <SOURCE PORT> -> <DESTINATION IP> <DESTINATION PORT>
      

      The Protocol can be one of the following:

      • TCP
      • UDP
      • ICMP
      • IP
      • A number of other application protocols

      The Source and Destination fields can be IP addresses or network ranges, or the special value any, which will match all IP addresses and networks. The -> arrow indicates the direction of traffic.

      Note: Signatures can also use a non-directional marker <> that will match traffic in both directions. However, the Suricata documentation about directional markers notes that most rules will use the -> right matching arrow.

      If you wanted to alert on malicious outbound traffic (that is traffic leaving your network), then the Source field would be the IP address or network range of your system. The Destination could be a remote system’s IP or network, or the special any value.

      Conversely, if you wanted to generate an alert for malicious incoming traffic, the Source field could be set to any, and the Destination to your system’s IP address or network range.

      You can also specify the TCP or UDP port to examine using the Port fields. Generally, traffic originating from a system is assigned a random port, so the any value is appropriate for the left side of the -> indicator. The destination port can also be any if you plan to examine the contents of every incoming packet, or you can limit a signature to only scan packets on individual ports, like 22 for SSH traffic, or 443 for HTTPS.

      The ip any any -> any any header from sid:2100498 is a generic header that will match all traffic, regardless of protocol, source or destination IPs, or ports. This kind of catch all header is useful when you want to ensure inbound and outbound traffic is checked for suspicious content.

      Note that the Source, Destination, and Port fields can also use the special ! negation operator, which will process traffic that does not match the value of the field.

      For example, the following signature would make Suricata alert on all incoming SSH packets from any network that are destined for your network (represented by the 203.0.113.0/24 IP block), that are not destined for port 22:

      Example Header

      alert ssh any any -> 203.0.113.0/24 !22 (sid:1000000;)
      

      This alert would not be that useful, since it does not contain any message about the packet, or a classification type. To add extra information to an alert, as well as match on more specific criteria, Suricata rules have an Options section where you can specify a number of additional settings for a signature.

      Options

      The arguments inside the parenthesis (. . .) in a Suricata signature contain various options and keyword modifiers that you can use to match on specific parts of a packet, classify a rule, or log custom messages. Whereas a rule’s header arguments operate on packet headers at the IP, port, and protocol level, options match on the data contained inside a packet.

      Options in a Suricata rule must be separated by a ; semicolon, and generally use a key:value format. Some options do not have any settings and only the name needs to be specified in a rule.

      Using the example signature from the previous section, you could add the msg option with a value of SSH traffic detected on non-SSH port explaining what the alert is about:

      Example Header

      alert ssh any any -> 203.0.113.0/24 !22 (msg:"SSH TRAFFIC on non-SSH port"; sid:1000000;)
      

      A full explanation of how you can use each option in a Suricata rule is beyond the scope of this tutorial. The Suricata rules documentation beginning in Section 6.2 describes each keyword option in detail.

      However, there are some core options like the content keyword and various Meta keywords that are used in most signatures, which we’ll examine in the following sections.

      The Content Keyword

      One of the most important options for any rule is the content keyword. Recall the example sid:2100498 signature:

      sid:2100498

      alert ip any any -> any any (msg:"GPL ATTACK_RESPONSE id check returned root"; content:"uid=0|28|root|29|"; classtype:bad-unknown; sid:2100498; rev:7; metadata:created_at 2010_09_23, updated_at 2010_09_23;)
      

      The highlighted content:"uid=0|28|root|29|"; portion contains the content keyword, and the value that Suricata will look for inside a packet. In the case of this example signature, all packets from any IP address on any port will be checked to ensure they do not contain the string value uid=0|28|root|29| (which in the previous tutorial was used as an example indicating a compromised host).

      The content keyword can be used with most other keywords in Suricata. You can create very specific signatures using combinations of headers, and options that target specific application protocols, and then check packet contents for individual bytes, strings, or matches using regular expressions.

      For example, the following signature examines DNS traffic looking for any packet with the contents your_domain.com and generates an alert:

      dns.query Example

      alert dns any any -> any any (msg:"DNS LOOKUP for your_domain.com"; dns.query; content:"your_domain.com"; sid:1000001;)
      

      However, this rule would not match if the DNS query used the domain YOUR_DOMAIN.COM, since Suricata defaults to case-sensitive content matching. To make content matches insensitive to case, add the nocase; keyword to the rule:

      Case-insensitive dns.query Example

      alert dns any any -> any any (msg:"DNS LOOKUP for your_domain.com"; dns.query; content:"your_domain.com"; nocase; sid:1000001;)
      

      Now any combination of lower or uppercase letters will still match the content keyword.

      The msg Keyword

      The example signatures in this tutorial have all contained msg keywords with information about a signature. While the msg option is not required, leaving it blank makes it difficult to understand why an alert or drop action has occurred when examining Suricata’s logs.

      A msg option is designed to be a human-readable text description of an alert. It should be descriptive and add context to an alert so that you or someone else who is analyzing logs understand why the alert was triggered. In the [reference Keyword](reference Keyword) section of this tutorial you will learn about the reference option that you can use to link to more information about a signature and the issue it is designed to detect.

      The sid and rev Keywords

      Every Suricata signature needs a unique Signature ID (sid). If two rules have the same sid (in the following example output it is sid:10000000), Suricata will not start and will instead generate an error like the following:

      Example Duplicate sid Error

      . . . 19/11/2021 -- 01:17:40 - <Error> - [ERRCODE: SC_ERR_DUPLICATE_SIG(176)] - Duplicate signature "drop ssh any any -> 127.0.0.0/8 !22 (msg:"blocked invalid ssh"; sid:10000000;)" . . .

      When you create your own signatures, the range 1000000-1999999 is reserved for custom rules. Suricata’s built-in rules are in the range from 2200000-2299999. Other sid ranges are documented on the Emerging Threats SID Allocation page.

      The sid option is usually the last part of a Suricata rule. However, if there have been multiple versions of a signature with changes over time, there is a rev option that is used to specify the version of a rule. For example, the SSH alert from earlier in this tutorial could be changed to only scan for SSH traffic on port 2022:

      Example SSH Signature with rev

      alert ssh any any -> 203.0.113.0/24 2022 (msg:"SSH TRAFFIC on non-SSH port"; sid:1000000; rev:2;)
      

      The updated signature now includes the rev:2 option, indicating it has been updated from a previous version.

      The reference Keyword

      The reference keyword is used in signatures to describe where to find more information about the attack or issue that a rule is meant to detect. For example, if a signature is designed to detect a new kind of exploit or attack method, the reference field can be used to link to a security researcher or company’s website that documents the issue.

      The Heartbleed vulnerability in OpenSSL is an example of a widely publicized and researched bug. Suricata comes with signature that is designed to check for incorrect TLS packets and includes a reference to the main Heartbleed CVE entry :

      /etc/suricata/rules/tls-events.rules

      alert tls any any -> any any (msg:"SURICATA TLS invalid heartbeat encountered, possible exploit attempt (heartbleed)"; flow:established; app-layer-event:tls.invalid_heartbeat_message; flowint:tls.anomaly.count,+,1; classtype:protocol-command-decode; reference:cve,2014-0160; sid:2230013; rev:1;)
      

      Note the highlighted reference:cve,2014-0160; portion of the signature. This reference option tells you or the analyst who is examining alerts from Suricata where to find more information about the particular issue.

      The reference option can use any of the prefixes from the /etc/suricata/reference.config file. For example, url could be used in place of cve in the preceding example, with a link directly to the Heartbleed site in place of the 2014-0160 CVE identifier.

      The classtype Keyword

      Suricata can classify traffic according to a preconfigured set of categories that are included when you install the Suricata package with your Linux distribution’s package manager. The default classification file is usually found in /etc/suricata/classification.config and contains entries like the following:

      /etc/suricata/classification.config

      #
      # config classification:shortname,short description,priority
      #
      
      config classification: not-suspicious,Not Suspicious Traffic,3
      config classification: unknown,Unknown Traffic,3
      config classification: bad-unknown,Potentially Bad Traffic, 2
      . . .
      

      As indicated by the file header, each classification entry has three fields:

      • A short, machine readable name, in the above examples not-suspicious, unknown, and bad-unknown respectively.
      • The description for a classification to be used with alerts, for example Not Suspicious Traffic.
      • A priority field, which determines the order in which a signature will be processed by Suricata. The highest priority is the value 1. Signatures that use a classifier with a higher priority will get checked first when Suricata processes a packet.

      In the example sid:2100498 signature, the classtype is classtype:bad-unknown;, which is highlighted in the following example:

      sid:2100498

      alert ip any any -> any any (msg:"GPL ATTACK_RESPONSE id check returned root"; content:"uid=0|28|root|29|"; classtype:bad-unknown; sid:2100498; rev:7; metadata:created_at 2010_09_23, updated_at 2010_09_23;)
      

      The implicit priority for the signature is 2, since that is the value that is assigned to the bad-unknown classtype in /etc/suricata/classification.config. If you would like to override the default priority for a classtype, you can add a priority:n option to a signature, where n is a value from 1 to 255.

      The target Keyword

      Another useful option in Suricata signatures is the target option. It can be set to one of two values: src_ip and dest_ip. The purpose of this option is to correctly identify the source and target hosts in Suricata’s alert logs.

      For example, the SSH signature from earlier in this tutorial can be enhanced with the target:dest_ip; option:

      Example SSH Signature with target field

      alert ssh any any -> 203.0.113.0/24 2022 (msg:"SSH TRAFFIC on non-SSH port"; target:dest_ip; sid:1000000; rev:3;)
      

      This example uses dest_ip because the rule is designed to check for SSH traffic coming into our example network, so it is the destination. Adding the target oiption to a rule will result in the following extra fields in the alert portion of an eve.json log entry.

      . . .
        "source": {
          "ip": "127.0.0.1",
          "port": 35272
        },
        "target": {
          "ip": "203.0.113.1",
          "port": 2022
        }
      . . .
      

      With these entries in Suricata’s logs, they can be sent to a Security Information and Event Management (SIEM) tool to make it easier to search for alerts that might be originating from a common host, or attacks that are directed to a specific target on your network.

      Conclusion

      In this tutorial you examined each of the main sections that make a complete Suricata signature. Each of the Actions, Header, and Options sections in a rule have multiple options and support scanning packets using many different protocols. While this tutorial did not explore any of the sections in great depth, the structure of rule, and the important fields in the examples should be enough to get started writing your own rules.

      If you want to explore complete signatures that include many more options than the ones described in this tutorial, explore the files in the /etc/suricata/rules directory. If there is a field in a rule that you would like to know more about, the Suricata Rules Documentation is the authoritative resource on what each option and its possible values mean.

      Once you are comfortable reading and testing signatures, you can proceed to the next tutorial in this series. In it you will learn how to enable Suricata’s IPS mode, which is used to drop suspicious traffic as opposed to the default IDS mode that only generates alerts.



      Source link