Protocols are plugins used by Pion's Sniffer reactor to control how raw network traffic captured via packet sniffing is converted into Pion's real-time events.
Pion's HTTP protocol configuration lets you easily extract additional content from your traffic. To configure your HTTP protocol plugin, first click on the "Protocols" tab at the top of Pion's user interface. You should see an accordion with three instances pre-defined:
- HTTP Protocol (no content)
- HTTP Protocol (full content)
- SMTP (Email)
If you have a Sniffer reactor configured already, it is almost certainly using "HTTP Protocol (full content)" (or if not then it likely should). So, click on that since only changes made to it will make a difference.
Note: The "Save Raw Request Headers" and "Save Raw Response Headers" features are really only used by Replay, are enabled by default, and do consume quite a bit of extra resources. So if you're not using Pion's Replay features and know that you never will, you may want to uncheck those boxes while you're in here.
The "Extraction Rules" is where most of the magic happens. There are several terms that the HTTP protocol populates by default, but many are setup via these extraction rules. This is where you can easily pull out and populate additional fields in your traffic.
Select "Add a new rule" at the bottom. This will create a new row at the bottom (you may need to scroll down -- sorry, for some reason the grid widget we use doesn't seem to do this automatically).
The first column lets you select a term that will be populated in each event created. This "Term Selector" lets you choose an existing term or create a new one.
If you haven't done so already, we strongly recommend creating a new vocabulary that you use for all of your custom terms. Creating new terms within or otherwise changing Pion's system vocabularies can create upgrade headaches later on. Just click "Add new vocabulary" with ID = "give-me-a-name", Name = "Any Descriptive Name You'd Like".
To add a new term, click the box again, click your custom vocabulary on the left, then click "Add a new Term." Make the ID's something short but easy to remember, no spaces or funny characters allowed (minus is ok). For "Type" you usually will want to use "small string" which allows up to 255 characters, or medium string if larger. If you're dealing with numbers or dates then you can use other types instead.
Note: make sure that you hit <enter> after changing any value within a configuration grid in Pion. Otherwise the value may not be saved by the UI even though it's displayed (sorry, this is another bug in the widget we're using).
Back to the "Extraction Rules"... After you've selected a term, you have the following sources available to choose from:
- query: stores the value of any GET or POST query string parameter (identified by "Name")
- cookie: stores the value of any HTTP cookie (request or response, identified by "Name")
- cs-cookie: stores the value of any HTTP request cookie (identified by "Name")
- sc-cookie: stores the value of any HTTP response cookie (identified by "Name")
- cs-header: stores the value of any HTTP request header (identified by "Name")
- sc-header: stores the value of any HTTP response header (identified by "Name")
- cs-content: stores the value of the decoded HTTP request content (often used in conjunction with regex's the extract specific bits)
- sc-content: stores the value of the decoded HTTP response content (often used in conjunction with regex's the extract specific bits)
- cs-raw-content: stores the value of the raw HTTP request content (not recommended)
- sc-raw-content: stores the value of the raw HTTP response content (not recommended)
The first 6 sources above require that you populate the Name column with the name of the parameter you would like to extract. For the rest, you can leave this blank.
The (optional) Match and Format columns work together to let you only store specific pieces of content from the original source. Match should be a regular expression with one or more groups identified using parenthesis and Format should be the string stored with the original match groups identified using $1, $2, etc. Please look at the "page-title" field for an example of how Pion uses this to extract these from HTML <TITLE> tags.
The (optional) ContentType column allows you to specify a regular expression that is compared against the corresponding HTTP header. If it fails then Pion will not try to extract this particular field. This is used primarily to optimize Pion's performance by avoiding unnecessary processing work.
Finally, if the term you're extracting content into is a string type, the (optional) MaxSize value is used to truncate any strings that are longer than this value.
After you are done creating new content extraction rules, click "Save Changes" to finish. Pion's Sniffer reactor will immediately start generating events that include your new content fields.
Note that an additional step is required to make these new fields available within page events, which are generated by Pion's Clickstream reactor and most often used for web analytics integrations and storage. Pion's Sniffer reactor generates HTTP events from which page events are derived, and custom fields are not automatically copied over. To add your new fields to page events:
- Open up your Clickstream reactor's configuration by double-clicking on it.
- Scroll down to the "Sticky Page Fields" section.
- Add each term you would like to appear in your page events to this list.
- Click the "Save" button below to save your Clickstream reactor changes.
