Monday 4 February 2019

Looking inside your G+ archive



This post is about undocumented features in the Google+ takeout downloads. But, let's start with a couple of basics. Download your data (choosing the Google+ Stream) in zipped format, unzip and then add them all to the same storage folder (you want to end up with a single Takeout folder in your storage folder). Inside the takeout folder you will find index.html: open it with your browser. Closely examine the index menu. Most is self-explanatory: in Google+ Stream>Posts you will find a date ordered list of all your posts. Some other useful things are easy to overlook: in Google+ Stream>ActivityLog you will find a massive list of all your comments on other people's posts, and anther list of your +1's on other people's posts.

Now for the hidden stuff.

Inside the archive of your G+ posts, you will find a little more than you expected. 

Each of your images will have an associated 'csv' file which, for example, includes a field showing how many times your image was viewed (the view counter).

To use, just find the image you are interested in and look for the csv file that best matches that name (for the moment, ignore any files called metadata.csv).

I was asked, in comments, why this .csv data might be useful :)


The data patterns in the csv files, being contemporary electronic records of the publisher, have evidentiary status in many world courts. With the destruction of the primary published record, the csv file might be used in a wide range of regulatory, criminal or copyright situations.

There has been a vibrant debate about the value or non-value of the view counter. Some say that the basis for any count is unclear (as a call to populate a screen which is never viewed or placement on a google product that is never viewed, eg a screensaver). They argue that the counter has no intrinsic value and/or is not a reliable indicator of visibility. These arguments are regularly raised in relation to counters, and have some limited respectability. Professional photographers and account managers have argued that counters remain an important (perhaps only) indication of visibility and some charge clients on the basis of view counts. Recently, other authorities have expressed interest in the counter (i guess we will see what comes of that :) ). Similar counters are commonly encountered on many other products (Google Maps), and are seen as a value indicator.

Whether either position has merit is for others to ponder, but it might be used in civil cases (say breach of copyright) as an indicator of unlawful use and therefore of damages. Conversely, a client charged for a large number of views, might find in the counter arguments that their internet manager has misled them.

There is other data in the CVS file, but let us focus on the view counter for now.


Example:

Assume you are looking for an image posted on 28 January 2019. 



Go to Takeout\Google+ Stream\Photos\Photos from posts\1-28-19  (note that your earlier folders may have a different format structure - you may have to dig for the right folder)





Find the image file you are looking for (in this case a tricky image of the Narregarang, named 1ej9hvbcna14y.jpg




You will see the similarly named file: 1ej9hvbcna14y.jpg.metadata.csv

When you open 1ej9hvbcna14y.jpg.metadata.csv in Excel it will show you formatted data about the image. 



When opened in something basic like Notepad, the data will be there, but you have to format it yourself.

In either case, the first 7 fields (bolded) reported:

title: IMG_9002.jpg (this is the file name i uploaded ie, a direct link back to my home file system).

description: "The Narregarang (Shaky Place): Mermaid Pool Falls..." (etc)

url: https://lh3.googleusercontent.com/..." (etc, the location in the google system)

license: (empty field)

image_views: 1,784,941 (a week later, as shown in the image above, the number had grown to 2,044,754)

creation_time.formatted: Jan 28, 2019, 12:01:43 PM UTC

There are additional fields... (many, in this case, were blank)

modification_time.formatted 
geo_data.latitude 
geo_data.longitude 
geo_data_exif.latitude 
geo_data_exif.longitude 
tags 
people.name 
people.email 
comments.comment 
comments.user_id 
comments.email 
media_key 
hex_photo_id 
upload_ip status 
liking_user_ids 
album_id 
exif.camera_make 
exif.camera_model 
exif.cell_width 
exif.cell_length

There are lots of different ways you might use this data with archives created after 3 Feb - achives created earlier were stuffed with rubbish and you will need to clean that out before attempting this process).

The data in the csv file might have lots of different types of uses. 

To examine the data as a totality you will need a program like Excel. I appended all the csv files and then sorted the data using a couple of basic commands in Windows 10 DOS and Excel.

1) Unzip your archive, open your Command prompt and navigate to Takeout\Google+ Stream\Photos\Photos from posts
[since 3 Feb, this is where the relevant *.csv files have been kept]

2) mkdir targetDir
[create a new folder called Takeout\Google+ Stream\Photos\Photos\targetDir]

3) for /r %x in (*metadata.csv) do copy "%x" targetDir\ /Y
[we will put a copy of relevant *.csv files in Takeout\Google+ Stream\Photos\Photos\targetDir  We want to keep all our data intact.]

4) cd targetDir
[jump into that directory]

5) del metadata.csv
[for our purposes, this is an unnecessary 1Kb file]

6) copy *.csv all.csv
[append all your csv files into the one file called "all.csv"]

7) open the file "all.csv" in Excel - and save as an excel file (i deleted fields that i was not going to use and header fields - you can end up with some big files which can get pretty slow). 

8) mine away. at this point i sorted on views.

This process is quick and dirty - to create an interactive list which links to the posts and images, we would need more complex code.

3 comments:

Anonymous said...

Just out of curiosity: what would that data be useful for?

Peter Quinton said...

The data patterns in the csv files, being contemporary electronic records of the publisher, have evidentiary status in many world courts. With the destruction of the primary published record, the csv file might be used in a wide range of regulatory, criminal or copyright situations.

There has been a vibrant debate about the value or non-value of the view counter. Some professional photographers and account managers have argued that it remains an important indication of visibility. Recently, other authorities have expressed interest in the counter - i guess we will see what comes of that. Similar counters are commonly encountered on many other products (Google Maps), and are seen as a value indicator.

Whether either position has merit is for others to ponder, but it might be used in civil cases (say breach of copyright) as an indicator of unlawful use and therefore of damages.

I think of it as an indication (however flawed) of visibility. To test this, i am using it as a way of differentiating 50 'hi-visibility' images (>1.7m - 2.6m views). By rerunning variations of the images in the final months of G+, subtle differences may provide insights into a range of matter :)

Anonymous said...

Oh thanks! Now it makes a lot more sense to me :)