Nov 10

lua Lesson 2 – Filling In The Blanks

In the first section of this series, I presented a simple Lua script to extract expert data from Tshark. Continuing down this path, a few scripting changes have been made to enhance functionality and introduce new concepts. Specifically:
  • White spacing was modified to improve readability and consistency
  • Additional extractions have been added to provide contextual information
  • A calculated value has been added
  • Script timing logic has been added
  • Output format has been modified to accommodate for the additional data being presented
  • A separate reusable function has been added
After these modifications, our output should resemble the following:
StreamXpert 1_1 output

 Spacing and Indentation

Unlike Python, indentation is not a requirement and generally speaking, LUA only requires spaces between names and keywords.  Thus, the following code snippets are functionally equivalent:
if tcp_analysis_retransmission() ~= nil then l_stream[“retr”] = l_stream[“retr”] + 1 end
if tcp_analysis_retransmission() ~= nil then
l_stream[“retr”] = l_stream[“retr”] + 1
end
For the most part, spacing and indentation are a matter of readability and programmer preference.

Additional Field Extractions

More field extractors have been added to enhance functionality.

  • frame.len is now being extracted. This allows us to track the total frame Bytes per connection. Alternately, if we wanted to look at “just” tcp payload Bytes, we could have chosen tcp.len.
  • Each stream entry now tracks the socket information (ip.src, ip.dst, tcp.srcport, tcp.dstport). This provides greater visual context, than (just) the stream identifier.

 Calculated Value

The total byte calculation and additional fields are logical extensions upon what we did in the 1.0 version of this script. For Bytes, we are adding the current frame length to the current running total stored in bytes and storing this value back into bytes. As you hopefully remember, l_stream is a pointer to the current stream record.
l_stream[“bytes”] = l_stream[“bytes”] + tostring(frame_len())

Script Timing

Within tap.packet() we introduce a new variable called start_time. This variable is initialized to the value returned from the os.time() function the first time tap.packet() is called. As we call tap.packet() (for each frame which contains TCP) we check to see if this variable exists. If it exists (not nil), we don’t overwrite it.

if start_time == nil then
start_time = os.time()
end
As mentioned previously, the aforementioned code could have been written on a single line. I chose to extend this over multiple lines to make its functionality more apparent. At the end of tap.draw() we subtract start_time from the current os.time() to arrive at an elapsed time, which we print using io.write().
 io.write(“\n”, “Elapsed Time = “, os.time() – start_time, ” Seconds”, “\n”)

Output Formatting

  • All print() functions have been replaced with io.write(). Io.write() is very similar to print(). However, it doesn’t place a newline at the end, nor insert tabs between arguments. While the functionality of print() can be a convenience (and necessary in certain instances), we are expliciting formatting output and don’t want unintended tabs. This link describes some of the differences between io.write() and print().
  • Field lengths were relatively short and predictable in our first script. However, as we introduce fields such as Bytes there is a significant amount of variation in field size which can occur. Thus, we explicitly format the output to ensure that our headers and data align. We use the string.format() function to specify field length and justification. String.format(), in conjuntion with io.write(), behaves similarly to the C printf() function.

Round() Function

In the first exercise we created two functions: tap.packet() and tap.draw(). In this exercise we create a new function called round(). Round() is used to calculate the average frame length per stream. Before discussing round(), I want to take a moment to talk about functions. Here are a few simple lines of code that illustrate a simple function and how functions return values for use in other functions.
function test_function(your_value)
 new_value = “The value handed to this function was ” .. your_value
 return (new_value)
end
io.write(“Please enter a value and press return: “)
io.write(test_function(io.read()), “\n”)
  • We initialize our function named test_function. Functions (like other code blocks) are delimited by end statements.
  • We print a line telling the user what to do. “Please enter a value and press return: “
  • The last line is a bit more complex. We need to work from the innermost function outwards.
    • io.read() waits for user input (delimited by a return key). Once the return sequence is detected, the value is passed as an argument to test_function().
    • test_function() accepts this value and stores it in “your_value”, appends it to a string and return(s) it. The “..” (two dots) is the Lua concatenation operator.
    • Lastly, io.write displays this value onscreen. Function arguments are separated by commas. The “\n” is the newline.
Let’s talk about round(). LUA has no built-in mathematical rounding function. To overcome this limitation, we create our own rounding function using the modulus operator and the math.ceiling() and math.floor() functions. Modulus “%” is the remainder from a division operation. Thus, if we divide a number by 1 we will end up with the decimal remainder. If this remainder is greater than or equal to.5, we round to the nearest integer. Conversely, less than .5, we round down.
function round(n)
if n % 1 >= 0.5 then return math.ceil(n) else return math.floor(n) end
end
While we could place the rounding code within tap.draw(), we may use this function again as this script evolves. I am not sure yet 🙂

One last note

Within tap.draw(), there is section of code within one of the io.write() calls which may appear a bit unusual. The full line is a bit long, so I am just going to concentrate on the snippet of interest. Specifically, I am focusing in on the section of code involving the round() function. To get the frame size average we would expect to divide the total bytes by total frames. This would, in turn, be passed to our round() function which would round it off to the nearest integer. Thus, we might anticipate the following:
round(l_stream[“bytes”] / l_stream[“frames”])
However, the code is written a bit differently:
round(l_stream[“bytes”] * (1 / l_stream[“frames”]))
There is a reason for this, has to do with the fact that Lua is much faster at multiplying than dividing. Thus, it’s a performance optimization trick.

Summary

That’s it for now. We will continue to extend our script in future installments. I hope you found this interesting and useful. As always, feedback is appreciated.

Nov 03

lua Lesson 1 – Tapping TCP Expert data

In this blog, I am going to introduce Lua tap scripting for Tshark. Specifically, this blog is intended to provide a conceptual overview and foundation for more complex development tasks, which will be presented in future blogs.

Download LUA script

Introduction

Wireshark is a great tool for analyzing packet captures. However, there are many cases in which Wireshark doesn’t represent the data in a usable and efficient manner for the given task at hand. Thus, I have often found myself exporting the data to text, using a language such as Python or Perl to perform string parsing and/or pulling the data into Excel for processing and correlation. More recently I have been using Lua for these sorts of tasks.

LUA is an embedded, interpreted language which is frequently used in the gaming world. While it bears many similarities to languages such as Python, it has fewer data types and built in functions, etc. The sparse nature of Lua keeps it very small, stable and efficient. Coming from more of a Python and C background, I experienced a period of adjustment, though the basic concepts of programming, e.g. control structures, operators, functions, string parsing, etc. apply. The Lua language/interpreter has direct integration with Wireshark. While this integration allows for the creation of dissectors, we are going to focus on taps as we are processing data that has already been dissected/decoded by Wireshark/Tshark. The Wireshark Wiki provides additional information on the integration of LUA.

To determine whether or not your version of Wireshark is compiled with Lua, examine “About Wireshark” under help. Here is my about page, with the pertinent text highlighted:

Compiled with LUA

I mentioned the terms dissector and tap, as if this all makes sense to everyone reading this blog. As I realize that this may not be the case, let me take a step back and discuss these components.

Wireshark/Tshark has a core engine which processes each frame, one frame at a time. For each frame Wireshark/Tshark invokes dissectors, plugins, filters and taps.

  • dissectors decode the various fields and their values
  • plugins are typically external dissectors (dissectors which are not part of core distribution)
  • filters restrict the data displayed
  • taps work with dissected data, often to create statistics.

When a frame is processed, dissectors and plugins populate an ephemeral “protocol tree” with the various decoded values from the current frame. In a typical scenario, the frame dissector is called which populates the tree with frame related data, such as the arrival time, capture length, etc. If the frame type is Ethernet, the Ethernet dissector is called. If the type is IP,  the IP dissector, called, etc. This occurs until there are no further dissectors available and/or frame data left for dissection. Filters and taps can retrieve information from the tree, but do not have the ability of updating the tree. Filters are used to limit which frames are displayed/tapped and taps are often used to create additional metadata from dissected data.

The following graphic summarizes this relationship:

core

Lua Taps

From the command line, we can run a Lua tap script as follows:

tshark -X lua_script:streamXpert1.lua –r “small.pcapng” -q

This will cause Tshark to initialize the Lua script “streamXpert.lua” and process the trace file named “small.pcapng”. The “q” option prevents Tshark from displaying frames to the console, therefore only printing the output of our script. For each frame, Tshark will look for the .packet() function contained in the Lua script. Tap.packet() “extracts” field data from each packet, stores this data, and optionally performs calculations, etc. After all packets have been processed, Tshark looks for a .draw() function for output. Thus, the components of a useful Tshark Lua script may likely include the following :
  • Tap definition
  • Field extractors
  • array to hold extracted data and potentially other variables for program flow control, etc.
  • .packet() function
  • .draw() function
streamXpert1.lua
streamXpert1.lua examines each TCP segment, keeps a count of the number of total frames, retransmissions, duplicate acks, out of orders and zero windows by connection (stream). Our output should look similar to this: StreamXpert output

Initialization

At the beginning of our script, we define a tap using listener.new(). Specifically, we are going to be tapping tcp traffic and our tap is going to be named “tap”. Subsequently, if we named our tap “my_tap”, our functions would be named “my_tap.draw() and my_tap.packet().
As mentioned previously, when Wireshark/Tshark dissects a packet, it stores the decoded fields of the current packet in a “protocol tree” structure. To extract information from the tree we must create field extractors. Field extractors are essentially function definitions and are defined via Field.new(). These must by defined outside of the tap.packet() and tap.draw() functions.
To obtain the desired functionality, our script creates and maintains metadata for each connection. This metadata is stored in a table structure we create named “stream“. Additionally, we also create a variable to track the number of streams called “nxt_stream“. This variable is initialized to 0, and is incremented each time a new stream is detected. This variable is used for 3 purposes:
  • tap_packet() uses nxt_stream to determine whether the segment received is part of a new stream. Wireshark/Tshark allocates stream numbers sequentially whenever it sees a segment from a socket that it has not seen before. As nxt_stream contains the value of the next stream we expect to be created, when we receive a segment and it’s tcp.stream value is lower than nxt_stream, we can safely assume that this segment is part of an existing stream and update the stream entry already within our stream table. Otherwise, it’s a new stream and we must create a new entry in the table.
  • tap.draw() uses the value of nxt_stream to determine whether any TCP packets have been seen. If the value of nxt_stream = 0, no TCP traffic has been seen and there is no purpose in creating a column header. We exit tap.draw() with a message onscreen indicating that no TCP segments were not found in the trace.
  • tap.draw() uses nxt_stream to  iterate the stream{} table. While it is possible to iterate a data structure without knowing it’s length using the Lua pairs() function, Lua does not necessarily store nor retrieve elements in the order in which they were inserted into the table. We set an iterator to 0 (the first numbered stream) and use nxt_stream as a limit on a while loop. This ensures that our streams are ordered sequentially.

 tap.packet()

From a high level, Tshark feeds each packet through dissectors, plugins, filters, and then calls our tap.packet() function. Thus for each packet, Tshark looks for the function of the listener which we defined named tap.packet(). Tap.packet() extracts the necessary field data and conditionally updates our stream{} table based upon conditional logic. For example, if the segment is out of order, we update the out of order counter.
  • If the segment is the first within a stream, we create a “record” for the stream and populate the record accordingly.
  • If the packet is part of a stream for which we have an existing record, we update the values of this record accordingly.
The following flow chart outlines the programmatic logic contained within our tap.packet() function.tap.packet()

Stream data structure

Lua has no concept of lists, structures, etc. Data structures are based entirely on tables/associative arrays containing key/value pairs. In our example, in order to replicate a record structure we create a table called streams. In the streams table, the key is the stream number and the value is a pointer to the memory location of another table containing the values of a particular stream. Visually, this could be represented as follows:
streamstruct

 tap.draw()

 When all packets have been processed, Tshark looks for tap.draw(). Tap.draw reads our stream structure and outputs to the screen accordingly. The following flow chart outlines the programmatic logic contained within our tap.draw() function.
 tap_draw

Additional Notes

All attempts have been made to use local variables to minimize table lookups and unnecessary function calls. If a global variable or function is referenced more than once within tap.draw() or tap.packet(), we assign it to a local variable.

  • Within tap.packet() and tap.draw() we create a local variable to directly reference the record of the stream to which we are working. This eliminates the need to repetitively search (within the stream table) for a specific key to locate the memory location of a given record. For example, if the frame is retransmission, we need to increment the frame count as well as the retransmission count. Thus, the creation of a local variable pointing directly to the location of the data related to the current stream eliminates the need to look up the key multiple times in the stream table.
  • One interesting line of which I would like call attention, is the following:
local conn=tonumber(tostring(tcp_stream()))
We are storing the current stream in a local variable. Therefore, we only need to call the field extractor once. However, we need to convert the field extractor output to a string before converting to a number because tonumber() expects a string. The return value of tcpstream() is userdata and trying to convert it directly results in a nil value. I have spent many hours trying to figure out why my scripts were not working, only to discover that it was due to a variable type mismatch. For debugging I often add a line such as “print(type(variable))” so that I can determine whether my issue is due to a variable type mismatch.

Summary

The purpose of writing this blog was to provide a introductory tutorial on Lua tapping. While the example script may seem simplistic, it can easily be extended to provide much more value. Future blogs will seek to extend upon the basic logic presented here. I hope you enjoyed this exercise and look forward to any questions, comments, corrections and/or suggestions.

May 12

Just The Facts

Don’t jump to cause. This often leads to wasted resources and is a quick way to lose respect amongst your peers.

Scenario

You receive a trace via email and are told that it illustrates a “network problem”, which is causing slow application performance. In the words of the analyst, there are “tons of bad packets and retransmissions.”

Much can be said about ensuring the methodology employed to collect the capture data actually provides relevant information. Additionally, much can also be said about ensuring that a problem is properly defined and provides the analyst with some sort of idea as to potential areas of interest. Often the first thing that I do when receiving a trace, is ask for additional information regarding the symptom, e.g. relevant IP addresses, application ports, etc. Before you start trying to sort through thousands or even millions of packets, you want to be sure that you are looking in the right haystack and are fairly certain that you are looking for a needle. These sorts of topics will be addressed in future discussions.

The Companion Video walks through the observation and analysis described below.

Observations

Wireshark metadata provides a quick way of assessing what a capture contains and whether this data coincides with the problem being described. Let’s examine some metadata regarding our “network problem”.

The Capture file properties dialog (below) indicates that the trace was conducted at 19:04:08 on 3/25/2016, with a duration of 180ms and contains a total of 14 frames. No frame slice, nor capture filter was in place during the initial capture, though it is very likely that we are looking at a trace that was filtered and saved from a larger capture.

Capture File Properties

Expert information (below) indicates IP and TCP “Bad checksum” errors, “Previous segment not captured”, “Duplicate ACK”s, “suspected retransmission”, and “fast retransmission” events. While bad checksums can result in retransmissions, we see that the number of checksum errors is significantly greater than the number of retransmissions. In other words, If we were seeing this many “real” bad TCP checksums, we would likely expect to see many more retransmission symptoms. However, it is prudent to validate. The “Previous segment not captured”, “Duplicate ACK”s and “suspected retransmission”/”fast retransmission” events logically correlate. For example, we experience a gap in TCP segments due to a dropped frame or segment reordering. This causes Wireshark to generate a previous segment not captured event. The receiver sensing a segment(s), later in the stream (than what it expecting), generates duplicate ack(s) which results in a retransmission (further categorized as fast retransmission). However, we only see two duplicate acks, so whether this is an actual retransmission is questionable. Analysis should help to provide more clarity.

Expert

Tip: The expert provides “hints” as to potential concerns detected by Wireshark. It can get very “busy” in terms of the number and types of events present, many of which are of little importance towards the task at hand. I find that the expert window is much more valuable after I have created a filtered trace, containing just the packets related to the issue that I am troubleshooting. In this case, as we are only looking at 14 frames, it isn’t overwhelming.

Protocol Hierarchy (below) indicates that this trace only contains SSL over TCP. There is no UDP, ICMP, SNMP, or other applications running over TCP.

jtf-protocol_hierarchy

In fact, Conversations (below) indicates a single SSL session between a client (User-1/192.168.1.10) TCP port 57913 to what appears to be an Amazon EC2 instance (server ec2-52-22-153-18.compute-1.amazonaws.com/ 52.22.153.18), located in Wilmington, DE. on TCP port 443 (SSL). This connection has a duration of approx. 180ms, which corresponds to the capture file properties dialog.

Conversations

Analysis

In examining the checksum errors, we see that these are present on every packet/segment generated by 192.168.1.10. This is a pretty clear indicator that this machine was our capture machine and that these were due to checksum offloading. Checksums exist for a reason; detecting corruption. I was once engaged in an issue where sporadic bad TCP checksums led to retransmissions. The assumption of another analyst was that the issue was related to packet loss, but he was having difficulty determining where this loss was occurring. However, careful analysis of checksums indicated that these frames were not actually being lost in the network. They were being dropped by the receiver because of invalid TCP checksums generated by the sender ~ upgrading the driver resolved the issue. Just be mindful.

The following graphic illustrates a checksum analysis, conducted via creating a filter for frames for bad IP and TCP checksums, and then comparing filtered vs. non-filtered transmitted frames from host 192.168.1.10.

checksums

The”Previous segment not captured”, “Duplicate ACK”s, “suspected retransmission”, and “fast retransmission” events are due to packet reordering. We were able to determine this from examining the IP Identification fields, as shown below:

IP Identifier

The specifics of why and where packets were reordered is uncertain. Could this situation represent a potential performance concern? Maybe. However, in our example we only saw two duplicate acks and the conversation progresses to the next application level message. There were no actual retransmission. Thus, reordering of segments in this particular trace did not create a concern though it is something to be mindful of when examining further traces.

Session reuse indicates that the client and server had established a successful SSL session(s) prior to this trace sample and are attempting to reuse the session identifier to reduce overhead. As pointed out by a couple readers (Jin Qian and Sake), session reuse is not occurring in this trace. When using any analysis tool, we should verify the information provided, especially if this information is pivotal to our analysis. In fact, this expert is due to the out of order which occur in this sample. Regardless, in our SSL session we never see the client and server exchange application data. In SSL there is a handshake protocol phase in which the client and server negiotate ciphers, validate identity, and use public key cryptography to securely create a shared master key. After this step is complete, the client and server use this shared master key to symmetrically encrypt data. The last SSL message(s) that we see in our trace is the “Change Cipher Spec, Encrypted Handshake Message” from the server. While this indicates that we have completed our handshake and are entering a secure data phase, we never actually see any application data. SSL will be a topic that I will spend some time on in the future.

Why did the initial SYN/ACK take so much longer than other acks? This is something to keep an eye on in further traces.

slow syn-ack

Discussion

Metadata is information about data. In this context, there are many tools which can analyze a trace or even multiple traces and produce metadata regarding content. Whether I examine a large or small trace, I will almost undoubtedly start by getting a higher level metadata perspective, as examination of this information allows me to make quite a few assessments regarding the contents of a packet capture (without even looking at individual packets). Wireshark creates quite a bit of metadata and much of this same information is available via the command line (tshark) and exposed to LUA, which can create higher level abstractions. Future discussions will address the LUA programming interface and specifically, how to create additional metadata via LUA taps.

I am wrapping a technical concept inside of a larger idea regarding “Impactful Analysis”. It is OK if we don’t immediately have “the answer”. We cannot draw real conclusions when we don’t have enough data. In many cases it is difficult to determine much more than the need for further directed testing and this testing is often iterative. An impactful analyst will define the objectives of a follow up test plan to obtain the necessary information, or at very least ensure that the necessary information is gathered as part of the larger test plan. Throughout the process, we need to recognize that the way we present our observations, e.g. specific content and format, will be influenced by the knowledge level and focus of our audience; we don’t want our observations misconstrued. Anyway, these are future discussion topics.

While we saw some potential concerns in this example, we didn’t find any sort of smoking gun. We don’t have enough information to make any concrete assessments and we want to communicate this clearly. We also want to ensure that we can get the data (and information) to drill deeper and may have to guide others through a test process. However, in the end impactful analysis relies on “Just the Facts.”

Apr 29

Science Meets Art On The Wire

Protocol analysis performs a critical role. While this discipline may not be an absolute requirement for a given organization on a daily basis, and often times there are more efficient ways at arriving at the desired outcome, when it is needed there is no substitute.

Where do I start? What am I looking at? What does it mean?

Beginning the task of learning how to analyze protocols can be as arduous as counting blades of grass. I remember endless hours sitting in front of a Network General Sniffer confused and disheartened as things that we don’t understand appear dry and abstract. However, if an individual is willing to put in the hard work and willing to be open to the idea that it will probably not come overnight, it is highly rewarding and becomes very interesting. It just takes a bit of patience to traverse the metaphorical hump. In my case, it took a lot of reading, re-reading, thinking and conceptualizing before I arrived at any sort of eureka moment.

In this series of blogs, my objective is to present what I have learned about the science (and yes art) of analyzing and understanding exchanges across the wire, as it applies to performance, security and availability management. Specifically, I will seek to demonstrate how wire level information can be used to derive valuable insight regarding behavior of networks, applications, systems and the consumers of these systems. This particular discipline has been a large component of my career over the last 20+ years and something that I desire to share.

Stay tuned!