Copy
2 February 2021 
#407: quantum of sollazzo – The data newsletter by @puntofisso

Read it in your browser


Made with ❤️
 
I opened last week's newsletter with my blabber about the importance of data definitions, and I was reminded of my other little obsession: the pervasiveness of uncertainty in data. In an apt example of the Baader–Meinhof phenomenon, data journalism guru Donata Columbro tweeted about an article by Alice Corona on the DataNinja magazine, showing different strategies for dealing with uncertainty in data visualization: it's originally in Italian, but the automatically translated English version is ok.
 

·
 
The Institute for Government has published the 2021 edition of Whitehall Monitor, its data-driven analysis of the workings of the UK Civil Service. This year's monitor includes a look at the controversial topic of contract-awarding.
 
·
I love this semi-serious campaign aiming to stop people from just saying "hello" via chat or message. I mean, by all means say hello but ALSO say what you want! It's not rude (which is what most perpetrators think...)
 
·
I loved this talk about making disagreement at work more productive, presented by Claire Knight at You Got This From Your Couch.
·
Tim Harford's talk about slow-motion multitasking was an eye opener about my multitude of hobbies that I never master (TL;DR: I'm not going to turn into the next Einstein, sadly, but at least now I know this way of engaging with passions is a common variant).
·
We have again – for the last week – some some sponsored geotastic content – Ed Freyfogle, who's the organiser of location-based service meetup Geomob, co-host of the Geomob podcast, and co-founder of the OpenCage Geocoder, has offered to introduce a series of points around the topic of geodata. His final entry, on geocoding at scale,  is below.

Till next week,
––Giuseppe @puntofisso
 

 

 
--- Sponsored content by Open Cage ---

Geocoding at scale

In our final installment in our series about using open data for geocoding we contemplate the challenges of geocoding at scale. What are the issues you face when you have many hundreds of thousands or even millions of coordinates or addresses to work on daily? At OpenCage we serve numerous customers in this category, and a common question that comes up is whether an API based solution can handle that type of scale.

An API-based solution, managed by experts, is almost always the most reliable and most affordable way to develop such an on-going system, as otherwise you will soon be spending a lot of valuable developer time making sure your geodata is staying current. As anyone who has worked with software can confirm: “Building is easy, maintaining is hard”.

Nevertheless, there are challenges that come with depending on any external service, one of course being network availability. At OpenCage we have multiple, fully-redundant data centers, and the availability of our service is independently and publicly monitored by a third party (current and past operational status can be seen at status.opencagedata.com).

Still, even with a highly-available service, some customers worry about the “cost” of crossing the internet to an external service. The fastest API query is the one you don’t even make; a smart caching strategy can go a long way to reducing usage. Because our geocoding API is built on open data you can cache the results as long as you like, and we’ve published a few tips and points to consider.

We hope you’ve enjoyed our series on the issues around geocoding with open data. While we’ve used our service as the example, we believe many of the concepts and considerations will apply regardless of the data processing tools and services you are building on. If you have questions regarding anything we discussed, please get in touch.

If you have any geocoding needs please give the OpenCage Geocoder a try.

 
Politics

‘Virus’, ‘Riotous’, ‘Folks’: The historic words in Biden’s inauguration speech
I love a good linguistic analysis and the Washington Post never disappoints. Here's a few thoughts on how the words used by Joe Biden compare with those of his predecessors. 
It reminds me of this, by the way, which stopped updating in 2015.
 

Women of Color Were Shut Out of Congress For Decades. Now They're Transforming It.
FiveThirtyEight captures the historic moment for women of colour in US politics as the 2020 election is hailed as a moment of many "firsts".
 
Data thinking

The Top 5 Data Trends for CDOs to Watch Out for in 2021
"Modern metadata solutions, data quality frameworks, infrastructure, job roles, and other big changes are on their way."
A brilliant write-up by Prukalpa Sankar of Atlan.

Data Visualization as Grief
"Since the start of the pandemic, data visualization has taken center stage in the effort to educate the public. Data has been used as a means of warning, informing, and educating. To be sure, this is important work; but in reporting the pandemic data, we also need to reinforce the humanity of the data."
There are people's lives behind that data viz. Pretty tough article.

Tools

Draw your own geography
Ahmad Barclay has just announced extra features for this great tool he developed: 
"1. Generate a (basic) population profile
2. Import/export lookups + boundaries"

 
Parts of speech detector
Enter a complete sentence, preferably with correct grammar and orthography, and you get the sentence coloured by part of speech, based on the Stanford University Part-Of-Speech-Tagger.
 

Datasette 0.54
Simon Willison's Datasette has progressed to version  0.54, with a number of foundational new features. 
In case you hadn't heard of Datasette, it's a journalist-friendly tool to explore and publish data – basically a one-file database system based on SQLite that creates an API for "small data".
"The Datasette Library issue has been open for nearly two years now. It’s a need I identified at the NICAR 2019 data journalism conference, where it became apparent that many newsrooms are sat on an enormous pile of data that they have collected but without any central place to keep it all."

Tutorials

OneHack Academy
Vonage's hacker supremo Kevin Lewis has released this "complete beginner-level learn to code course. Just under 6 hours learn a little about a bunch of topics: HTML, CSS, JS, Node.js, storing data and three Vonage API, including video".

Geospatial Workflows to Estimate a City’s Population
Analytics and storytelling company Gramener shows how to use geospatial data and AI in order to make population estimates at a 100x100 metre grid level. This tutorial shows how to use building footprints as a proxy for human settlements as well as gridded population data.
 

Dataviz

TheLibraryMap
"TheLibraryMap is a map with more than 100,000 books located based on their relevance and similarity. Color is also applied based on the genre and topics of each book."
 


Oreos and the Art of Crossword Puzzle Construction
Russell Goldenberg of The Pudding runs this quirky "investigation into 2020’s most notorious crossword puzzle clue, told at three levels of complexity."
And yes, there are actual downloadable databases of crosswords clues...
 

Yesterday, Today, Tomorrow
"A data visualization experiment to trace the emotional waves of the pandemic."

Feeling artistic?

Andala
This Mandala generator is somehow really soothing.
 

 
Support this newsletter & spread the word

Become a GitHub Sponsor :) It costs about a coffee per month, and you'll get an Open Data Rottweiler sticker (and other stuff). 

If you're a supporter of this newsletter, thanks a lot for your support. Share this e-mail with a friend, or via social media


    


"In other news" is supported by ProofRed, who offer an excellent proofreading service. If you need high-quality copy editing or proofreading, head to www.proofred.co.uk. Oh, they also make really good explainer videos.
Supported by my GitHub Sponsors 
Steve Parks
Naomi Penfold
Chris Weston
Fay Simcock
Chris Noden
Jeff Wilson
& others


Copyright © 2021 Puntofisso, All rights reserved.



unsubscribe from this list    update subscription preferences 

Email Marketing Powered by Mailchimp