Published on May 23rd, 2013 | by Jo0
Going nuts with the Canberra distance
By Arwen Cross. Originally published on the CSIRO News Blog
Love it or hate it, Canberra is our capital city. But it isn’t the only thing named Canberra. Hamish Boland-Rudder has found five other things with the same name, which he described in an article in the Canberra Times a few weeks ago. It looks like most of them post-date our 100 year old capital, and were named after it.
Canberra is an indigenous placename, but sadly we don’t know what it means or exactly what location the name originally referred to. Some popular suggestions about the meaning of Canberra are ‘meeting place’ or, more racily, ‘the space between a woman’s breasts’, referring to the shapes of Mt Ainslie and Black Mountain. Linguist Harold Koch’s research shows that neither of these explanations is very believable (see page 153, Chapter 5 of Naming and re-naming the Australian landscape). But it’s a rare honour for an Australian capital city to have an indigenous name.
The Canberra distance
One of the things that’s named after the city of Canberra is the Canberra distance, a mathematical function used to sort things according to their similarity.
The Canberra distance was invented by CSIRO scientists Bill Williams and Godfrey Lance in the 1960s. They developed it based on the Manhattan distance, which may have inspired them to name it after their own city, Canberra.
The Canberra distance isn’t a distance in the everyday sense like how far away ANU is from the nearest pub. Mathematicians recognise lots of different types of distances, for example:
- Euclidian distance – the straight line distance between two points. This is similar to our everyday idea of distance.
- Levenshtein distance – the distance between two words measured by how many single-character edits are needed to change one into the other.
- Canberra distance – a measure of similarity and dissimilarity between groups.
So what can you use the Canberra distance for?
It’s often used to sort plants and animals into groups that are more closely or distantly related to each other. Although it can be used outside biology too.
Let’s say you want to separate the sheep from the goats in your large herd. You might need to consider several criteria to make your decision:
- Binary data – has a beard/doesn’t have a beard
- Ordered categorical data – hair very woolly/ hair moderately woolly/ hair not woolly
- Quantitative data – a measurement like weight in kilograms or height in centimetres
The Canberra distance is a way to use all these criteria together to separate individuals according to how similar or dissimilar they are. In our case, we’ll separate the herd according to how sheepy or goaty they are.
If you’ve got a large herd, you’d start by measuring all the criteria for each animal. Then you’d need some statistics, including the Canberra distance, to cluster the data into groups of animals that were similar across several characteristics.
You might find that your herd was made up of three groups of animals: a large cluster of animals with sheepy characteristics, a small cluster with goaty characteristics, and another cluster of animals that is part sheep and part goat.
Going nuts with peanut research
An actual example of the use of the Canberra distance in biological research comes from peanut breeding. Our chief mathematical scientist Bronwyn Harch worked on this project for her PhD. Instead of separating sheep from goats, she was organising peanut plants into groups.
Peanut plants are an important agricultural crop, and plant breeders are always working on new varieties that will grow better in particular climates, be more resistant to disease, or produce a better quality peanut.
To do this they keep collections of seeds from many different peanut varieties. Some wild peanuts might be very resistant to disease but have a poor yield of nuts, while commercial varieties might be disease sensitive but high yielding. Peanut breeders cross varieties to generate new ones and select the offspring with desirable characteristics.
Bronwyn looked at the characteristics of peanuts in the Australian peanut collection and in the world peanut collection in India. Her analysis helped her advise Australian peanut breeders about what characteristics were missing in their collection that they could find in the world collection. Her work also informed scientists going into the Amazon rainforest to collect new peanut plants from the wild about what characteristics to look out for.
The Canberra distance is a useful mathematical tool. But now I’m going to put my feet up and enjoy some of Hamish Boland-Rudder’s other Canberra discoveries – perhaps a glass of Canberra ale.
Arwen Cross would like to thank David Nash and Bob Anderssen for their advice on this article.[subscribe2]