Informatics | Amethyst Informatics

I was recently asked, “Are you a Data Scientist?” My answer: “Yes! I am an Informatician, which seems to be the same thing”. A confused reply followed: “an inform-a-what?”

This got me searching the web and checking job role definitions. The overlap between the two is huge and the overall goal of both is identical – turning data into knowledge.

So what should I be calling myself? Apparently Data Scientist is the “sexiest job of the 21^st century”. Does this mean I should put my Informatics coat at the back of the wardrobe and wear the trendier Data Scientist designer label? In order to answer this question let’s do the “Data Science/Informatics” thing and take a peek at some data (frequency of internet searches, source: Google Trends).

Representing term trendiness with the frequency of Google internet searches suggests that:

Currently “Data Science” and “Informatics” are equally cool
“Data Science” trendiness has been on a slight increase for the last couple of years. If this continues then “Informatics” is at risk of being out-trended by this time next year.
“Big Data” took off around late 2011 with a rapid rise over the last 3 years, making it the current chart topper. As data volume increases will the term “Big Data” be too small and replaced.
Could the increase in “Data Science” popularity be due to the “Big Data” era? At a glance the recent rise of the “Data Scientist” has occurred inside of the “Big Data” mountain, but this does not confirm any causal effects.
A decade ago “Informatics” was as sexy as “Big Data” is now.

There is an even newer phrase on the block, the Data Artist, an expert in visualising data. One thing is very clear. Whatever labels we choose to use, we all have a common goal.

Since I am a chemist who extracts knowledge from data, I am going to stick to calling myself a Data Scientist… the “sexiest job of the 21^st century”. Off to have a quick coffee before cracking on with an informatics, data artistry and data mining analysis for my next client. Now should I have a Mocha, Latte, Cappuccino with or without sprinkles hmmm?…

Now that spring is in the air, with the daffodils fully out and the pink blossom starting to look picturesque, let’s turn our thoughts to spring-cleaning and what tips we can apply, not to our house tidy-up, but to our data processes. Like the contents of that cupboard under the stairs, taking a fresh look at what redundant clutter we are holding onto is beneficial. I remember a chemistry teacher drawing an analogy between entropy and a teenager’s bedroom. The room naturally tends towards maximum disorder unless we put some energy in and tidy it up. Our data repositories and processes are the same, since we need to put effort in to keep the level of data chaos to a minimum. Also note that however much continuous effort we put in, it is always worth periodically taking a fresh look. The business world is fast moving and dynamic, with unexpected changes in company targets. Even if these changes are only small they can build up over time, and it pays to check that your informatics strategies remain aligned with your business needs. So it’s time to get your Marigolds and technological dusters out and have a fresh look at your current workflows to see what improvements might be possible. Five areas to get you started are given below.

1) Check your dictionaries: With careful ongoing maintenance this task will be less daunting, but it is still easy for redundancies and duplications to slip in (especially when combining dictionaries from different sources, such as across sites or from company mergers). Data chaos is guaranteed if there are multiple representations for the same term. Clear business rules are needed and should be agreed across teams.

spring_dictionaries

2) Audit your capture and reporting workflows: Are the most appropriate reports being generated or have the business-critical questions changed, rendering reports outdated? Review the level of context being captured around results. Check if numbers are being rounded at the correct time. It is all very well reporting results to 3 significant figures, but rounding numbers prior to storage can lead to a huge loss in precision in downstream calculations.

3) Optimise your queries: As your repositories grow, are your data retrieval queries still running efficiently? Perhaps your SQL queries could do with some fine tuning or maybe your Warehouse could do with some restructuring. Two useful books are: ‘Oracle SQL Tuning’ by M. Gurry and ‘Building the Data Warehouse’ by W. Inmon.

4) Work with colleagues to review current processes: Get out there and talk to people from different groups, taking a real interest in their everyday workflows and identifying the slow, mundane steps that they have to repeatedly carry out. Then assess the impact to prioritise tasks, remembering that sometimes perceived impact can be quite different from actual impact.

5) Stay up to date with current technologies: Attending conferences and reading literature is time well spent if it means identifying a new technology that improves processes. For example, check out the O’Reilly Radar blog or attend the Science and Information Conference.

Like a backlog of household chores, if all of the above seems overwhelming then why not get in an extra pair of helping hands, such as Amethyst, to help you sort through the mountain of clutter and prioritise your clear-up strategies. Sometimes all it takes is a fresh pair of eyes to ask the questions that need to be answered in order to polish up your processes. After applying the above techniques you will spend less time on manual error-prone steps and have more efficient processes and better quality data, therefore maximising your chances of making successful business-critical decisions.

Category Archives: Informatics

Stolen Identity of an Informatician

Spring-cleaning your data processes

Crystallising your data