Seemingly unobtrusive digital bytes known as metadata have been vaulted to the tech media limelight. What is metadata, and why all of a sudden is it so interesting to so many?
I must confess; I normally pay little attention to metadata. But when the term metadata is plastered all over tech media, I begin to notice. So does President Obama; he referred to metadata during a recent press conference (Wall Street Journal):
[W]hat the intelligence community is doing is looking at phone numbers and durations of calls. They are not looking at people’s names, and they’re not looking at content. But by sifting through this so-called metadata, they may identify potential leads with respect to folks who might engage in terrorism.
President Obama only mentioned phone numbers and call duration. The tech media is advising that metadata is also associated with email and social-networking services. I think it’s time to dissect metadata, and see why it’s plastered all over the news.
What is metadata?
Every dictionary I checked defines metadata as data about data. I’m sorry, but that’s not very helpful. Things got clearer when I read Understanding Metadata, a white paper by the National Information Standards Organization (NISO):
“Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”
The paper then divides metadata into three categories:
- Descriptive metadata: A resource for discovery and identification, including elements such as title, abstract, author, and keywords.
- Structural metadata: A way to define how objects are put together, for example, how pages are ordered to form chapters.
- Administrative metadata: Information to help manage a resource, such as when and how it was created, file types, and who has access.
Simply put, metadata summarizes information about data for the purpose of making that data easy to find and work with. Knowing that, the next step is to learn what kinds of metadata “organizations” find so interesting.
The Guardian has been at the epicenter of the NSA surveillance controversy right from the start, and more than somewhat responsible for metadata’s meteoric rise into a household term. To its credit, The Guardian created a website to help those unsure about metadata determine if their Internet travels are leaving a trail of metadata crumbs or not. Below, I’ve listed some of the more interesting crumbs mentioned by The Guardian.
Metadata associated with emails:
- Sender’s name, email, and IP address
- Recipient’s name and email address
- Date, time, and time zone
- Unique identifier of email and related emails
- Mail client login records with IP address
- Mail client header formats
- Subject of email
Metadata associated with mobile phones:
- Phone number of every caller
- Serial numbers of phones involved
- Time of call
- Duration of call
- Location of each participant
- Telephone calling card numbers
Metadata associated with Facebook:
- Username and profile bio information including birthday, hometown, work history, and interests
- Username and unique identifier
- User subscriptions
- User location
- User device
- Activity date, time, and time zone
Metadata associated with web browsers:
- Activity including pages the user visits and when visited
- User data and possibly user login details with auto-fill features
- User IP address, internet service provider, device hardware details, operating system, and browser version
- Cookies and cached data from websites
That’s quite a list of data points organizations could be hoovering up.
I asked some acquaintances if they were bothered that their metadata is likely being archived in a database somewhere – "not so much" was the general consensus. One of my friends, who happens to be a database manager, did not agree with the rest. She pointed out it would be easy to search and manipulate such a database, transforming seemingly disparate pieces of information into meaningful connections.
That data-mining capability is also upsetting privacy advocates: including Kurt Opsahl of the Electronic Frontier Foundation. He explained why in this Gizmodo article: “What they are trying to say is that disclosure of metadata—the details about phone calls, without the actual voice — aren’t a big deal, not something for Americans to get upset about if the government knows.”
Kurt then offered examples showing where bits of assembled data can create plausible conclusions:
- They know you spoke with an HIV testing service, then your doctor, then your health insurance company in the same hour. But they don’t know what was discussed.
- They know you called a gynecologist, spoke for a half hour, and then called the local Planned Parenthood’s number later that day. But nobody knows what you spoke about.
Kurt ends the Gizmodo article with the following insight:
[Y]our phone records — oops, ‘so-called metadata’ — can reveal a lot more about the content of your calls than the government is implying. Metadata provides enough context to know some of the most intimate details of your lives.
In the interest of being fair, I included an opposing opinion that was in the comment section of Kurt’s post:
There is the other side of the fence. They (government agencies) know:
- A siding company called trying to sell me a new exterior.
- Dan called, and we talked for three minutes.
- They know I called up Pizza Planet, and spoke for 1 minute.
Personally, I don’t care so much about the metadata they collect. It’s what they plan on using it for that concerns me.
Regardless of whether the capture of metadata is legal or not, metadata is being scarfed up wholesale. What concerns me about this is that conclusions based on the assimilated metadata appear to be mostly circumstantial. But, the impact on individuals is real just the same.