endafarrell.net

A Simple Chrome extension

I had an itch to be able to easily find the Facebook ID for Place pages that I was visiting. You can - usually - find the ID that the Facebook Graph API uses for a Place page by looking into the HTML source of the Facebook Place page...

PySpark and Jupyter Notebook

There's a lot of crap advice about getting jupyter notebooks to play nicely with pyspark. I guess things have changed a lot over the last couple of years, but here's how I have things. I use conda for my python envs, but I doubt that...

Scrubbing of (poor) data.

I have sensors - a great many - which report numbers daily. There's a long piepline and many processes between these sensorss and the ~daily reports I get, and sometimes weird spikes and dips happen in the numbers: often fixed the very...

Getting started with Pelican

Getting started with Pelican since I'd quite like to have somewhere that I can refer to. Why not use the automatic pages from GitHub? I don't know really - other than I'd quite like to do it myself and who know's how long GitHub pages...

Note to self: sharing the best photos

We take quite a few photos and I'm forever reinventing how I look after them. Here's where I am at at the moment: Photos come from a few different cameras Aperture (now 3.2.4) is what we use to manage them All photos/videos are imported...

Gawking out

I've been processing log files recently to see how a live system is being used. When you have millions of hits daily, you need these processors to be fast. Today the best way is to have your log files shipped over onto a Hadoop cluster...

"grep -o" - really quite useful

One really quite useful command is “grep -o” - it allows you to fire off something like this: grep -o "Location supplier=\"\w*\"" locations.xml And the output will be the phrases matching the regular expressions that start with...

Piping content through SSH

Thanks to http://www.contentwithstyle.co.uk/content/4-ssh-config-tips-for-faster-remote-working/ I can avoid creating files which need to be scp’d: I can pipe the content directly: local$ cat localfile.txt | ssh remote "cat - >>...

A Place Registry view of the world

Here’s another view of the Places in the Nokia Places Registry. Each pixel is the location of one or more points of interest that we have as of the end of June 2011. Under each pixel there may only be one place or there may be thousands...

Nokia Place Registry visualisation

Here’s a little visualisation of what I am up to at Nokia. You see a KML representation of the Points of Interest we currently have - the taller the tower the more active places we have in that spatial area.

Insulin regime

I am about to change some of my insulin settings, but before I do I thought I’d write them here. Below is a graph of how things are currently. As I am changing things you can deduce that the regime is not correct, but it isn’t too far …

Tools to play with

From listening to The Changelog for the first time today I should play with Yahoo’s YQL. It’s a server developer tool that allows you to query the web in a SQL-like format - and have the results come back in JSON. Where it gets really...

Runtime disabling of TestNG groups

We have recently started to use TestNG and I like it. One thing that is needed is runtime control over whether or not to run a group of tests. We have both unit tests and integration tests to assure ourselves of the system’s health....

Handy JSON-related links

I keep loosing these links, so I thought I’d spend a minute and gather them here: http://jsonformatter.curiousconcept.com/ - really excellent http://chris.photobooks.com/json/default.htm...

I'm quoted at a recent Google Tech Talk

It’s nice to be quoted - by J Chris Anderson at a Google tech talk last December. CouchDB: relaxing offline javascript 16:30 in is the start of my contribution

CouchDB compaction - big impacts

CouchDB needs to have it’s databases compacted regularly. It’s quite easy to do but the ease of doing so may lead you into thinking that it’s not worthy of serious consideration. You need to be aware of a few things. Here at the beeb we...

Problems with replication maps

For a long set of reasons that I must sometime write about, I have a set of CouchDB databases which replicate with each other. Each database replicates with two others: one in the same datacentre, one in the other datacentre (we’re only...

CouchDB 0.9x - 1st read from v large views serially

On a server, we run 4 different CouchDB nodes, each with 30 or so databases. We can therefore have over 100 databases - and if you’re reading from large views - or view over large databases - you will need to do so serially. We have 4...

Uncertain overnight behaviour

It's getting to be a wonder what's happening overnight. Here are some data points to ponder: 27th: 6.5@3:22, 3.6@7:02: is this down to one glass of wine last night? 26th: 19.6@3:35: was that me overdoing the recovery from a 4.7 …

Accu-Chek Combo

The Accu-Chek Combo is an insulin pump that has the potential to make really important changes to how I manage my diabetes and therefore my live. It'll allow me to change how much insulin I get when and so better match my body's needs....

Colloquy demystifying JIRA references

The BBC’s Forge engineering team uses an IRC channel to hold meetings. It allows our team to not bother about exactly where everyone is - some folks work from home, people are (mostly) in the office, but can be in different parts of our...

Top 25 Most Dangerous Programming Errors

The 2009 CWE/SANS Top 25 Most Dangerous Programming Errors is a list of the most significant programming errors that can lead to serious software vulnerabilities. They occur frequently, are often easy to find, and easy to exploit. They...

Running stunnel at startup

You might want your stunnels to be running all the time - and to start automatically when you log in. Here’s how: get your stunnel working. You’ll need to fix your certs, choose the correct ports, and all that yourself. write a script...