Join our team! Apply to be our Technical Community Manager or an ESIP Community Fellow.
Metamojis: metadata with personality
Why metamojis?
Data flags are commonly used throughout the earth sciences to communicate data quality and provide other helpful information to users.
SeaDataNet, for instance, uses numeric codes to convey information about data quality:
0 – No QC
1 – Good value
2 – Probably good
4 – Bad value
6 – Below Detection
Similarly, Andrews LTER uses simple letter-based codes.
A – Accepted
E – Estimated
M – Missing
Q – Questionable
But what if we could make things a little more exciting? What if instead of using numbers and letters, we used … emojis ⁉
While it may seem silly at first, emojis provide a number of attractive properties.
Self-describing:
Unlike letter and number-based codes, emojis are immediately interpretable. Even without a data dictionary, users can get the gist of what an emoji-based flag means. A user who sees a flag for instance, might guess that it means good data quality, whereas means poor data quality, and so on.
Unicode:
Each emoji is a unique character in the unicode standard. This means that emojis can be used in any context where unicode is supported (basically everywhere on the internet). Moreover, this feature makes emojis easy to search, parse or index.
Lightweight:
Emojis can describe relatively complex concepts using a single character. At 1-4 bytes per UTF-8 character, an emoji is significantly more efficient than a full textual description. For instance, using instead of “below detection” would quarter the number of bytes required.
Accessible:
Let's face it, using emojis makes the task of flagging and annotating data a lot more fun! And at the risk of sounding old, it could also help to get a new generation of earth scientists excited about metadata.
Registry of metamojis
A list of some proposed metamojis is included below. To see a full registry, or to propose changes/additions, see the github repo here.
Data quality flags
Subjective data quality assessment
Quality assessment based on user’s judgment (e.g. manual inspection)
Description | Emoji |
---|---|
Good |
|
Probably good |
|
Questionable |
|
Bad |
|
Objective data quality assessment
Quality assessment determined by objective criteria (e.g. automated scripts)
Description | Emoji |
---|---|
Good |
✅ |
Probably good |
|
Questionable |
❓ |
Bad |
❌ |
Data quality indicators
Metadata indicators related to data quality
Description | Emoji |
---|---|
Raw Data |
|
Processed Data |
|
No Quality Control |
|
Missing Data |
|
Duplicate Data |
|
Sensitive Data |
|
Below Detection |
|
Above Detection |
|
Outlier |
|
Spike |
|
Sensor drift |
|
Frozen value |
|
Impossible value |
|
Transformative processes
Metadata related to processes that change the data
Description | Emoji |
---|---|
Manually edited |
✍ |
Rolling filter (convolution) |
✳ |
Interpolated |
〽 |
Non-transformative processes
Metadata related to processes that do not change the data
Description | Emoji |
---|---|
Manually verified |
|
Manually rejected |
|
Manually flagged |
|
Pending actions
Actions to be completed in the future
Description | Emoji |
---|---|
To be categorized |
|
To be archived |
|
To be deleted |
|
To be inspected |
|
To be edited |
|
Modifiers
Flags to qualify or expand upon other flags
Description | Emoji |
---|---|
See annotation |
|
Uncertainty about classification |
|
Data descriptors
Machine data types
Data type (for interpretation by computer programs, etc.)
Description | Emoji |
---|---|
Integer |
|
Floating point |
|
String (ASCII) |
|
String (Unicode) |
|
Boolean |
|
Observable properties ⚖
SI Base properties
Properties described by SI base units
Description | Emoji |
---|---|
Length |
|
Mass |
|
Time |
⏳ |
Temperature |
|
Electric current |
⚡ |
Luminous Intensity |
|
Amount of substance |
⚗ |