elehack.net

Michael's Blog

Web privacy

Today, I made some changes to our web server code and our privacy policy. The primary effect of these changes are that we no longer record the IP addresses of visitors to elehack.net. This change was prompted by our discovery of the search engine Duck Duck Go and particularly its privacy policy.

As you browse the web, a good deal of information is sent to web sites you view. I want to take this opportunity to provide a run-down of what some of this information is and how it can be used.

Read more...

Tuning the OCaml memory allocator for large data processing jobs

TL;DR: setting OCAMLRUNPARAM=s=4m,i=32m,o=150 can make your OCaml programs run faster. Read on for details and how to see if the garbage collector is thrashing and thereby slowing down your program.

In my research work with GroupLens, I do a most of my coding for data processing, algorithm implementation, etc. in OCaml. Sometimes I have to suffer a bit for this when some nice library doesn’t have OCaml bindings, but in general it works out fairly well. And every time I go to do some refactoring, I am reminded why I’m not coding in Python.

Read more...

My first OCaml syntax extension

Preface: In this post, I describe my adventures figuring out how to write a syntax extension for the OCaml programming language and attempt to provide something of a tutorial on writing a basic extension. I assume that you’re somewhat familiar with basic parsing technology and context-free grammars — if not, a good tutorial on parser construction with a tool like Yacc would be worth a read first.

One of the oft-touted benefits of OCaml is Camlp4, a pre-processor that facilitates extending the OCaml syntax to provide natural support for various constructions. This has been used for a variety of purposes, such as database type-checking, monad sugaring, and logging. In the hands of a capable author, a variety of wonders can be introduced to the OCaml language.

Read more...

An Orbiting Hockey Player Befriends a Treacherous Turn Signal

Occasionally, the spam in my inbox turns out to be downright hilarious. Today, such a spam arrived.

Following is the filler text from a spam trying to sell me renter’s insurance. It is presumably intended to make spam filters think the message is legit — it barely fooled SpamAssassin, but not Thunderbird. It’s clearly generated by a random prose generator with a singularly limited (and odd) vocabulary.

Read more...

Life with Lucid

Since we’re going to be needing to make some computing changes for the summer (moving our server needs to Rackspace Cloud, converting the now-server back to a desktop to replace Jennifer’s dying Eee), I decided to bite the bullet early and upgrade my laptop to the Ubuntu 10.04 Lucid Lynx release candidate (release due out tomorrow).

On a whole, I like it. The new visual theme takes some getting used to, but it’s growing on me. Some elements of it are just gorgeous. Twitter and IM integration is nice too; it’s actually getting me off Psi. Firefox 3.6 and Thunderbird 3 fully-supported are good to have as well.

Read more...

Reflections for Sunday: Polyglot Worship

At our church, we will from time to time sing multilingual songs, typically in English and Spanish. For a while, I didn’t like this; we’re an English-speaking American church. I’d even try to spiritualize it — it’s not good to sing songs as worship when we don’t know what we’re singing, I reasoned, and many (most?) of us don’t know what the Spanish means other than a few words here and there. Most of the time the translation was provided, but sometimes it was absent (or was on a different page and not visible with the Spanish).

I knew about Revelation 7:9, that God is in the business of redeeming a people from every language. I knew that Heaven would be a place of multilingual praise. But in the here and now, and in the practice of the church visible, I wanted it to be in my language.

Read more...

HTML is now really, really valid

Until tonight, our web site has been serving up XHTML content with the text/html MIME type. According to the XHTML spec, this is acceptable, but others strongly disagree. I find the arguments somewhat persuasive, particularly those related to the "HTML-compatible" XHTML profile not really being HTML-compatible.

So, tonight I finally crossed another task off my list and implemented the code to serve up appropriate content based on the HTTP Accept header sent by the browser. If the browser says it can accept application/xhtml+xml, then I send XHTML 1.0 Strict with the proper content type (even if the browser prefers HTML; this might be a mild HTTP violation, but I’m OK with that for now). If the browser does not claim to accept application/xhtml+xml, then it re-renders the page as HTML 4.01 and sends it as text/html. Since my server code passes data through the rendering pathway as XML trees, it was not too difficult to add a final post-processing step that converts an XML dom to an HTML dom using Nethtml and then rendering that to the browser.

Read more...

DVCS selection woes

I’ve been a convert to distributed version control, and in particular a user of Mercurial, for a few years now. I love Mercurial — its user interface is simple, the core concepts make sense, and it generally does an intelligent job of managing my programs (or config files, or homework assignments, or whatever).

I’ve tried to learn Git a few times, and even started converting to it some time last year, but go back to Mercurial for its user-facing simplicity. Internally, Git is simpler, but mapping use cases on to that simplicity is entirely exposed to the user. Mercurial’s UI is much more task-driven, such that there’s really one obvious way to do most useful things. I usually am able to keep my Mercurial repositories in somewhat sane states; Git’s failure mode seems to be leaving your branch refs in an incomprehensible state.

Read more...

Redmine

This week, I installed Redmine to provide ticket tracking and documentation services for our web site, computers, life in the apartment, and other things. So far, I’ve been rather pleased with it.

Redmine is an integrated project manager combining an issue/bug tracker, wiki, document manager, and source repository viewer aimed at managing software projects. It’s rather similar to (and probably heavily inspired by) Trac. I’ve long had an affinity for Trac, and in some ways like it better, but Trac does not support multiple projects. Redmine does. This is fairly crucial for us; it lets us have the web site, computing infrastructure, and apartment each as their own project with their own sets (and types) of trackers while allowing us to view all open tickets in a unified view. With stock Trac, we would need to check multiple instances to see all our open tasks.

Read more...

New blog engine rolling, comments re-enabled

At long (way too long) last, I’ve gone back to OCaml and completed enough of the re-re-re-re-rewrite that we can blog again. Comments now work too :).

If you want the gritty details, check out the colophon.

Read more...

Page 1 of 11 | Next >>>