CRM114 - the Controllable Regex Mutilator

January 28th, 2007 by andreas

Last week I spent quite a lot of time on CRM114 which according to the website is “a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user’s wildest desires. Criteria for categorization of data can be via a host of methods, including regexes, approximate regexes, a Hidden Markov Model, Orthogonal Sparse Bigrams, WINNOW, Correllation, KNN/Hyperspace, or Bit Entropy ( or by other means- it’s all programmable).”

CRM114’s programming language is not similar to anything I’ve seen before, with it’s declensional syntax instead of the more common ordinary positional syntax, strange keywords and lack of types (Everything is a String), but once you get ahold of it all, the author promises that you’ll be able to “write the filter of your dreams”.

A couple of example programs
First, a ROT13 implementation.

#!/usr/bin/crm
translate /a-zA-Z/ /n-za-mN-ZA-M/
accept

This code takes input from stdin and gives output to stdout.
Example

$ echo "CRM is so great."|./rot13.crm
PEZ vf fb terng.

A Reverse Polish Notation calculator

#!/usr/bin/crm
{
eval (:_dw:) / :@:R:*:_dw: : /
output /:*:_dw:\n/
}

Example usage

$ echo "2 5 2 * + 5 +"|./rpn.crm
17

But the real strengths of CRM114 lies in it’s ability to learn and classify text and other streams of data. At this point, CRM114 has seven different classifiers with different advantages. Some can do N-way choices while others do simple Yes/No-choices.

Posted in Software, MSc |

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.