[logs] SIM solution - Objectives ?

Mikael Kuisma kuisma at ping.se
Tue Jun 5 06:09:17 PDT 2007


On 6/4/07, Stefano Zanero <zanero at elet.polimi.it> wrote:
> Mikael Kuisma wrote:
>
> > Ok. First of all, ASDIC is actually implemented, not only a paper.
>
> The papers usually come out of "actually implemented" things. The fact
> that they have not evolved into a GPLed software is not by itself a
> novelty of your approach.
>
> In fact, usually the cool and new stuff is in papers first, then slowly
> moves on to GPL code and ends up into proprietary products later :p
>
> I do not mean to demean your effort. Only, it's not something new,
> there's a gazillion of results that you should really take into account
> before designing something like this.

We have developed ASDIC since -99 together with on of the main
internet service providers in Sweden. The reason of this was that we
were not able to find any other tool solving this problem, despite of
the gazillions of papers we took part of.

> > Further, I have not found any paper about aggregating related log
> > entries in this, or a related, way.
> https://www.usenix.org/publications/library/proceedings/lisa98/full_papers/girardin/girardin_html/girardin.html
> http://www.springerlink.com/index/N21LYCG199259GEN.pdf
>
> just a couple you may wish to browse through... there's hundreds,
> actually. Log visualization is its own field of study, lately.

Interesting reading, but unfortunately not related to firewall log
aggregation. Using a SOFM reduces the dimensions to one of a
predetermined number, i.e. the number of neurons on the grid. This is
more a classification problem, and not useful for aggregation. Earlier
versions of ASDIC actually used conventional artificial neural network
in the aggregation process, but it did not scale very well at all, so
we was forced to change the implementation of the competitive learning
system. Been there, tried that. ;-)

The other paper is according to the abstract about anomalies in host
file systems, and has really nothing at all to do with network
monitoring..?!

Let me explain aggregation of firewall log entries (i.e. session
aggregation). An aggregate is a "meaningful group of individual
sessions". For example, all sessions from alice to bob's web server
can be seen as such an aggregate. The concept of aggregates is used by
Argus, RFC2724, Cisco Netflow and conceptual in firewall rule sets
etc, and is nothing new neither special.

The unique feature of ASDIC is its ability to learn those aggregates
by itself, unsupervised. In other systems you need to define the
aggregates yourself, manually.

The benefits of aggregates are obvious; it is not possible to manage
traffic on a session level; configuring (e.g. firewalls) as well as
monitoring.

The benefits of creating aggregates unsupervised are more then first
meets the eye. Not only it scales much better compared to if you have
to define them manually, but if you require that each and every
session (traffic log entry) must belong to at least one aggregate, the
process of creating a new aggregate signals an anomaly in the network,
i.e. the detection a new traffic pattern.

This is what ASDIC traffic aggregation is all about, in a nutshell.

I'm sure there are other nice products and papers about log
visualization and so, but that is a completely different issue and not
what I am talking about. (The visualization tool in ASDIC is quite
modest, gets the work done but is nothing to write home about. :-)

> Anyway, the problem in aggregating log entries is that, well, you cannot
> deal readily with their content. So, you can aggregate them only on the
> dimensions that matter less.

Very true, and in many situations you do not even have access to the
content, like in firewall log files. We simply have to do the best
with what we got.

> > because the Arbor web is more market oriented then technical ... Lots
> > of nice colour brochures, but not so much information about what it's
> > really about.
>
> I think the guys at Arbor are more than capable of defending themselves.
> Suffice it to say that the likes of Jose Nazario, Tomas Ptaced and
> Farnham Jahanian worked on that stuff... so, while their marketing is
> undoubtedly good, I'd think about it a couple of hundred times before
> saying that we don't really know what it's about, as they've been
> speaking and teaching and writing the book on anomaly detection
> techniques for the last ten years at least.

I did not mean to attack the Arbor team in any way; I'm sure they also
are very competent and provide excellent products, but I can not
compare ASDIC with Peekflow because I did not found sufficient
technical documentation on the Arbor web.

Best Regards,
Mikael Kuisma, Ping


More information about the LogAnalysis mailing list