CRM114 - the Controllable Regex Mutilator
January 28th, 2007 by
andreas
Last week I spent quite a lot of time on CRM114 which according to the website is “a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user’s wildest desires. Criteria for categorization of data can be via a host of methods, including regexes, approximate regexes, a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW, Correllation, KNN/Hyperspace, or Bit Entropy ( or by other means- it’s all programmable).”
CRM114’s programming language is not similar to anything I’ve seen before, with it’s declensional syntax instead of the more common ordinary positional syntax, strange keywords and lack of types (Everything is a String), but once you get ahold of it all, the author promises that you’ll be able to “write the filter of your dreams”.
A couple of example programs
First, a ROT13 implementation.
#!/usr/bin/crm
translate /a-zA-Z/ /n-za-mN-ZA-M/
accept
This code takes input from stdin and gives output to stdout.
Example
$ echo "CRM is so great."|./rot13.crm
PEZ vf fb terng.
A Reverse Polish Notation calculator
#!/usr/bin/crm
{
eval (:_dw:) / :@:R:*:_dw: : /
output /:*:_dw:\n/
}
Example usage
$ echo "2 5 2 * + 5 +"|./rpn.crm
17
But the real strengths of CRM114 lies in it’s ability to learn and classify text and other streams of data. At this point, CRM114 has seven different classifiers with different advantages. Some can do N-way choices while others do simple Yes/No-choices.
Posted in Software, MSc |
No Comments »