Thursday, 28 January 2016

Announcing GA Support for Accelerated Mobile Pages (AMP) Analytics

When Google was a few years old, we wrote up a list of Ten things we know to be true. The list includes items like “Focus on the user and all else will follow” as well as “Fast is better than slow.” It would be tough to say that much of the mobile web has adhered to these principles. Users often get frustrated by poor experiences in which sites load slowly or will lock up trying to load resources that clog their data connections.

The Accelerated Mobile Pages Project (AMP) is an open source initiative that aims to address these problems by enabling content to load instantaneously and provide a better web experience for all. AMP introduces a new format that is a flavor of HTML. It’s built to prioritize speed and a fantastic user experience. One way that AMP provides reliably good page loading performance is by restricting the ability to add custom JavaScript to pages and instead relying on built in, reusable components.

Today, the AMP team announced the launch of an analytics component that will enable measurement on AMP pages. The Google Analytics team is committed to helping our users measure their content wherever it appears. So, for publishers looking to use AMP to provide an improved user experience, we’ve released Google Analytics measurement capabilities for Accelerated Mobile Pages. AMP support in Google Analytics makes it easy to identify your best content and optimize your user experience.

How Google Analytics Support Works

Analytics on AMP is handled by an open source, reusable component that the Google Analytics team helped build. The <amp-analytics> component can be configured with Google Analytics specific configuration parameters to record pageviews, events, and even custom dimensions. That configuration works hand in hand with a global event listener that automatically detects triggers like button presses. As a result, there’s no need to scatter custom JavaScript throughout your page to detect actions that should trigger events and hits. Instead, you can define which actions should trigger hits within the configuration section and let the magic of AMP do the rest.

How to Get Started

Before you get started with AMP Analytics, you’ll need to get started with AMP itself. The AMP website contains a great introduction to getting started. Once you have an AMP page up, it’s time to start thinking about how you’d like to measure its performance. 

We recommend that you use a separate Google Analytics property to measure your AMP pages. AMP is a new technology that’s going to mature over time. As such, some of the functionality that you’re used to in web analytics won’t immediately be available in AMP analytics right away. AMP pages can appear in multiple contexts, including through different syndication caches. Because of that, a single user that visits an AMP version of a page and a HTML version of a page can end up being treated as two distinct users. Using a separate Google Analytics property to measure AMP pages makes it easier to handle these issues.

Once you have your AMP page and new Google Analytics property set up, you’ll want to reference the requirements for using Analytics on AMP pages as well as the developers guide for instrumenting measurement.

What’s Next

Multiple technology partners, including Google Search, Twitter, Pinterest, and LinkedIn have announced that they’ll start surfacing AMP pages in the coming months. The Google Analytics team is excited to support AMP from day one and look forward to growing our offering as AMP’s capabilities expand.


Posted by Dan Cary, Product Manager and Avi Mehta, Software Engineer

Wednesday, 27 January 2016

AlphaGo: Mastering the ancient game of Go with Machine Learning



Games are a great testing ground for developing smarter, more flexible algorithms that have the ability to tackle problems in ways similar to humans. Creating programs that are able to play games better than the best humans has a long history - the first classic game mastered by a computer was noughts and crosses (also known as tic-tac-toe) in 1952 as a PhD candidate’s project. Then fell checkers in 1994. Chess was tackled by Deep Blue in 1997. The success isn’t limited to board games, either - IBM's Watson won first place on Jeopardy in 2011, and in 2014 our own algorithms learned to play dozens of Atari games just from the raw pixel inputs.

But one game has thwarted A.I. research thus far: the ancient game of Go. Invented in China over 2500 years ago, Go is played by more than 40 million people worldwide. The rules are simple: players take turns to place black or white stones on a board, trying to capture the opponent's stones or surround empty space to make points of territory. Confucius wrote about the game, and its aesthetic beauty elevated it to one of the four essential arts required of any true Chinese scholar. The game is played primarily through intuition and feel, and because of its subtlety and intellectual depth it has captured the human imagination for centuries.

But as simple as the rules are, Go is a game of profound complexity. The search space in Go is vast -- more than a googol times larger than chess (a number greater than there are atoms in the universe!). As a result, traditional “brute force” AI methods -- which construct a search tree over all possible sequences of moves -- don’t have a chance in Go. To date, computers have played Go only as well as amateurs. Experts predicted it would be at least another 10 years until a computer could beat one of the world’s elite group of Go professionals.

We saw this as an irresistible challenge! We started building a system, AlphaGo, described in a paper in Nature this week, that would overcome these barriers. The key to AlphaGo is reducing the enormous search space to something more manageable. To do this, it combines a state-of-the-art tree search with two deep neural networks, each of which contains many layers with millions of neuron-like connections. One neural network, the “policy network”, predicts the next move, and is used to narrow the search to consider only the moves most likely to lead to a win. The other neural network, the “value network”, is then used to reduce the depth of the search tree -- estimating the winner in each position in place of searching all the way to the end of the game.

AlphaGo’s search algorithm is much more human-like than previous approaches. For example, when Deep Blue played chess, it searched by brute force over thousands of times more positions than AlphaGo. Instead, AlphaGo looks ahead by playing out the remainder of the game in its imagination, many times over - a technique known as Monte-Carlo tree search. But unlike previous Monte-Carlo programs, AlphaGo uses deep neural networks to guide its search. During each simulated game, the policy network suggests intelligent moves to play, while the value network astutely evaluates the position that is reached. Finally, AlphaGo chooses the move that is most successful in simulation.

We first trained the policy network on 30 million moves from games played by human experts, until it could predict the human move 57% of the time (the previous record before AlphaGo was 44%). But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and gradually improving them using a trial-and-error process known as reinforcement learning. This approach led to much better policy networks, so strong in fact that the raw neural network (immediately, without any tree search at all) can defeat state-of-the-art Go programs that build enormous search trees.

These policy networks were in turn used to train the value networks, again by reinforcement learning from games of self-play. These value networks can evaluate any Go position and estimate the eventual winner - a problem so hard it was believed to be impossible.

Of course, all of this requires a huge amount of compute power, so we made extensive use of Google Cloud Platform, which enables researchers working on AI and Machine Learning to access elastic compute, storage and networking capacity on demand. In addition, new open source libraries for numerical computation using data flow graphs, such as TensorFlow, allow researchers to efficiently deploy the computation needed for deep learning algorithms across multiple CPUs or GPUs.

So how strong is AlphaGo? To answer this question, we played a tournament between AlphaGo and the best of the rest - the top Go programs at the forefront of A.I. research. Using a single machine, AlphaGo won all but one of its 500 games against these programs. In fact, AlphaGo even beat those programs after giving them 4 free moves headstart at the beginning of each game. A high-performance version of AlphaGo, distributed across many machines, was even stronger.
This figure from the Nature article shows the Elo rating and approximate rank of AlphaGo (both single machine and distributed versions), the European champion Fan Hui (a professional 2-dan), and the strongest other Go programs, evaluated over thousands of games. Pale pink bars show the performance of other programs when given a four move headstart.
It seemed that AlphaGo was ready for a greater challenge. So we invited the reigning 3-time European Go champion Fan Hui — an elite professional player who has devoted his life to Go since the age of 12 — to our London office for a challenge match. The match was played behind closed doors between October 5-9 last year. AlphaGo won by 5 games to 0 -- the first time a computer program has ever beaten a professional Go player.
AlphaGo’s next challenge will be to play the top Go player in the world over the last decade, Lee Sedol. The match will take place this March in Seoul, South Korea. Lee Sedol is excited to take on the challenge saying, "I am privileged to be the one to play, but I am confident that I can win." It should prove to be a fascinating contest!

We are thrilled to have mastered Go and thus achieved one of the grand challenges of AI. However, the most significant aspect of all this for us is that AlphaGo isn’t just an ‘expert’ system built with hand-crafted rules, but instead uses general machine learning techniques to allow it to improve itself, just by watching and playing games. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently, ultimately we want to apply these techniques to important real-world problems. Because the methods we have used are general purpose, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis.

Thursday, 21 January 2016

Teach Yourself Deep Learning with TensorFlow and Udacity



Deep learning has become one of the hottest topics in machine learning in recent years. With TensorFlow, the deep learning platform that we recently released as an open-source project, our goal was to bring the capabilities of deep learning to everyone. So far, we are extremely excited by the uptake: more than 4000 users have forked it on GitHub in just a few weeks, and the project has been starred more than 16000 times by enthusiasts around the globe.

To help make deep learning even more accessible to engineers and data scientists at large, we are launching a new Deep Learning Course developed in collaboration with Udacity. This short, intensive course provides you with all the basic tools and vocabulary to get started with deep learning, and walks you through how to use it to address some of the most common machine learning problems. It is also accompanied by interactive TensorFlow notebooks that directly mirror and implement the concepts introduced in the lectures.
The course consists of four lectures which provide a tour of the main building blocks that are used to solve problems ranging from image recognition to text analysis. The first lecture focuses on the basics that will be familiar to those already versed in machine learning: setting up your data and experimental protocol, and training simple classification models. The second lecture builds on these fundamentals to explore how these simple models can be made deeper, and more powerful, and explores all the scalability problems that come with that, in particular regularization and hyperparameter tuning. The third lecture is all about convolutional networks and image recognition. The fourth and final lecture explore models for text and sequences in general, with embeddings and recurrent neural networks. By the end of the course, you will have implemented and trained this variety of models on your own machine and will be ready to transfer that knowledge to solve your own problems!

Our overall goal in designing this course was to provide the machine learning enthusiast a rapid and direct path to solving real and interesting problems with deep learning techniques, and we're now very excited to share what we've built! It has been a lot of fun putting together with the fantastic team of experts in online course design and production at Udacity. For more details, see the Udacity blog post, and register for the course. We hope you enjoy it!

Thursday, 14 January 2016

Data-Driven CMOs: Leaving the Information Age

Originally Posted on the Adometry M2R Blog

Remember the so-called “Information Age”? Once a catch all for all things technological, over time the term came to refer to the transition to a society in which individuals had access to a wealth of information to aid in decision-making – a global democratization of knowledge. From a marketing perspective, the Information Age also came to represent a fundamental shift for data-driven CMOs away from simple push tactics to a dynamic, real-time ebb and flow of information between brands and customers.

With this transition has come all sorts of complexity. An argument can be made that demands on marketers have never been higher; yet, in some respects the evolution towards data-driven marketing has simplified or even solved some of the profession’s biggest pain points, such as:
  • Gaining an ability to track performance at a granular level
  • Understanding consumer behaviors within the marketing funnel
  • Gleaning insights about how marketing impacts consumers’ inclination to make a purchase decision
In short, the Information Age now has less to do with access to information as it does with the ability to utilize it effectively.

Moving from Information to Insights

In a previous interview with CMO.com, I was asked what I now know that I wish I had known earlier in my marketing career. My response?
“Today’s marketing leaders are a combination of creative, technologist, analyst and strategist. Success in modern marketing is predicated on being agile, having great vision and being able to effectively manage change across the organization. To be clear, the advice would not be to blindly chase shiny new objects. Rather, it would be to proactively set aside the time and resources to foresee, evaluate and test opportunities on the horizon.”
So how do marketers manage change across their organizations? It starts identifying where marketing can offer unique value by transforming raw information into insights.

“Big data really isn't the end unto itself. It’s actually big insights from big data. It’s throwing away 99.999% of that data to find things that are actionable.”

The comment above was made previously by Bob Borchers, Chief Marketing Officer for Dolby Laboratories, at a Fortune Brainstorm Tech conference. It should go without saying...but to reiterate; data isn't the same as knowledge. Data without context is no more useful than knowing your current driving speed without understanding which direction the car is headed.

Another way to think about this is to consider the difference between building a data-driven marketing culture and a truly data-driven organization. We’re already witnessing this maturation happening within organizations that were early adopters of “big data”. Led by marketers who invested in foundational elements – attribution measurement and analytics, cross-channel allocation and alignment, etc. – these organizations are now taking the next step to integrate marketing with other disciplines, such as finance. In doing so, discussions about marketing performance start to sound less like functional assessments of campaign efficacy and more like part of a strategic, holistic business plan. Now impression and click-stream data can be discussed through the lens of media costs (online and offline) and supplier value, linked directly to sales.

Using a data-driven attribution measurement solution offers additional clarity by showing exactly how individual channels, publishers and creatives contributed to revenues. By looking beyond simple metrics and getting a more complete view of performance across channels, marketers suddenly have a sense for how to proactively manage towards overarching business objective (e.g. top-line growth) while also maintaining a sense for costs and ROI.

So is this still the Information Age or something else? What’s clear is that simply gathering and organizing information is no longer the endgame, it’s only the beginning.

Posted by Casey Carey, Google Analytics team

Monday, 11 January 2016

Why attend USENIX Enigma?



Last August, we announced USENIX Enigma, a new conference intended to shine a light on great, thought-provoking research in security, privacy, and electronic crime. With Enigma beginning in just a few short weeks, I wanted to share a couple of the reasons I’m personally excited about this new conference.

Enigma aims to bridge the divide that exists between experts working in academia, industry, and public service, explicitly bringing researchers from different sectors together to share their work. Our speakers include those spearheading the defense of digital rights (Electronic Frontier Foundation, Access Now), practitioners at a number of well known industry leaders (Akamai, Blackberry, Facebook, LinkedIn, Netflix, Twitter), and researchers from multiple universities in the U.S. and abroad. With the diverse session topics and organizations represented, I expect interesting—and perhaps spirited—coffee break and lunchtime discussions among the equally diverse list of conference attendees.

Of course, I’m very proud to have some of my Google colleagues speaking at Enigma:

  • Adrienne Porter Felt will talk about blending research and engineering to solve usable security problems. You’ll hear how Chrome’s usable security team runs user studies and experiments to motivate engineering and design decisions. Adrienne will share the challenges they’ve faced when trying to adapt existing usable security research to practice, and give insight into how they’ve achieved successes.
  • Ben Hawkes will be speaking about Project Zero, a security research team dedicated to the mission of, “making 0day hard.” Ben will talk about why Project Zero exists, and some of the recent trends and technologies that make vulnerability discovery and exploitation fundamentally harder.
  • Kostya Serebryany will be presenting a 3-pronged approach to securing C++ code based on his many years of experiencing wrangling complex, buggy software. Kostya will survey multiple dynamic sanitizing tools him and his team have made publicly available, review control-flow and data-flow guided fuzzing, and explain a method to harden your code in the presence of any bugs that remain.
  • Elie Bursztein will go through key lessons the Gmail team learned over the past 11 years while protecting users from spam, phishing, malware, and web attacks. Illustrated with concrete numbers and examples from one of the largest email systems on the planet, attendees will gain insight into specific techniques and approaches useful in fighting abuse and securing their online services.

In addition to raw content, my Program Co-Chair, David Brumley, and I have prioritized talk quality. Researchers dedicate months or years of their time to thinking about a problem and conducting the technical work of research, but a common criticism of technical conferences is that the actual presentation of that research seems like an afterthought. Rather than be a regurgitation of a research paper in slide format, a presentation is an opportunity for a researcher to explain the context and impact of their work in their own voice; a chance to inspire the audience to want to learn more or dig deeper. Taking inspiration from the TED conference, Enigma will have shorter presentations, and the program committee has worked with each speaker to help them craft the best version of their talk.

Hope to see some of you at USENIX Enigma later this month!

Thursday, 7 January 2016

Google Apps Script: Tracking add-on usage with Google Analytics

The following was originally published on the Google Developers Blog.

Editor's note: Posted by Romain Vialard, a Google Developer Expert and developer of Yet Another Mail Merge, a Google Sheets add-on.

Google Apps Script makes it easy to create and publish add-ons for Google Sheets, Docs, and Forms. There are now hundreds of add-ons available and many are reaching hundreds of thousands of users. Google Analytics is one of the best tools to learn what keeps those users engaged and what should be improved to make an add-on more successful.

Cookies and User Identification

Add-ons run inside Google Sheets, Docs, and Forms where they can display content in dialogs or sidebars. These custom interfaces are served by the Apps Script HTML service, which offers client-side HTML, CSS, and JS with a few limitations.

Among those limitations, cookies aren’t persistent. The Google Analytics cookie will be recreated each time a user re-opens your dialog or sidebar, with a new client ID every time. So, Analytics will see each new session as if initiated by a new user, meaning the number of sessions and number of users should be very similar.

Fortunately, it’s possible to use localStorage to store the client ID — a better way to persist user information instead of cookies. After this change, your user metrics should be far more accurate.

Add-ons can also run via triggers, executing code at a recurring interval or when a user performs an action like opening a document or responding to a Google Form. In those cases, there’s no dialog or sidebar, so you should use the Google Analytics Measurement Protocol (see policies on the use of this service) to send  user interaction data directly to Google Analytics servers via the UrlFetch service in Google Apps Script.

A Client ID is also required in that case, so I recommend using the Apps Script User properties service. Most examples on the web show how to generate a unique Client ID for every call to Analytics but this won’t give you an accurate user count.

You can also send the client ID generated on client side to the server so as to use the same client ID for both client and server calls to Analytics, but at this stage, it is best to rely on the optional User ID in Google Analytics. While the client ID represents a client / device, the User ID is unique to each user and can easily be used in add-ons as users are authenticated. You can generate a User ID on the server side, store it among the user properties, and reuse it for every call to Analytics (both on the client and the server side). 

Custom Dimensions & Metrics
In add-ons, we usually rely on event tracking and not page views. It is possible to add different parameters on each event thanks to categories, actions, labels and value, but it’s also possible to add much more info by using custom dimensions & metrics.

For example, the Yet Another Mail Merge add-on is mostly used to send emails, and we have added many custom dimensions to better understand how it is used. For each new campaign (batch of emails sent), we record data linked to the user (e.g. free or paying customer, gmail.com or Google for Work / EDU user) and data linked to the campaign (e.g. email size, email tracking activated or not). You can then reuse those custom dimensions inside custom reports & dashboards.

Click for full-size version
Once you begin to leverage all that, you can get very insightful data. Until October 2015, Yet Another Mail Merge let you send up to 100 emails per day for free. But we’ve discovered with Analytics that most people sending more than 50 emails in one campaign were actually sending 100 emails - all the free quota they could get - but we failed to motivate them to switch to our paid plan.

Click for full-size version

As a result of this insight, we have reduced this free plan to 50 emails/day and at the same time introduced a referral program, letting users get more quota for free (they still don’t pay but they invite more users so it’s interesting for us). With this change, we have greatly improved our revenue and scaled user growth.

Or course, we also use Google Analytics to track the efficiency of our referral program.

To help you get started in giving you more insight into your add-ons, below are some relevant pages from our documentation on the tools described in this post. We hope this information will help your apps become more successful:
Posted by Romain Vialard, Google Developer Expert. After some years spent as a Google Apps consultant, he is now focused on products for Google Apps users, including add-ons such as Yet Another Mail Merge and Form Publisher.

Thursday, 17 December 2015

Four years of Schema.org - Recent Progress and Looking Forward



In 2011, we announced schema.org, a new initiative from Google, Bing and Yahoo! to create and support a common vocabulary for structured data markup on web pages. Since that time, schema.org has been a resource for webmasters looking to add markup to their pages so that search engines can use that data to index content better and surface it in new experiences like rich snippets, GMail, and the Google App.

Schema.org, which provides a growing vocabulary for describing various kinds of entity in terms of properties and relationships, has become increasingly important as the Web transitions to a multi-device, mobile-oriented world. We are now seeing schema.org being used on many millions of Web sites, defining data types and properties common across applications, platforms and products, in order to enhance the user experience by delivering the most relevant information they need, when they need it.
Schema.org in Google Rich Snippets
Schema.org in Google Knowledge Graph panels
Schema.org in Recipe carousels
In Schema.org: Evolution of Structured Data on the Web, an overview article published this week on ACM, we report some key schema.org adoption metrics from a sample of 10 billion pages from a combination of the Google index and Web Data Commons. In this sample, 31.3% of pages have schema.org markup, up from 22% one year ago. Structured data markup is now a core part of the modern web.

The schema.org group at W3C is now amongst the largest active W3C communities, serving as a hub for diverse groups exploring schemas covering diverse topics such as sports, healthcare, e-commerce, food packaging, bibliography and digital archive management. Other companies, also make use of the same data to build different applications, and as new use cases arise further schemas are integrated via community discussion at W3C. Each of these topics in turn have subtle inter-relationships - for example schemas for food packaging, for flight reservations, for recipes and for restaurant menus, each have different approaches to describing food restrictions and allergies. Rather than try to force a common unified approach across these domains, schema.org's evolution is pragmatic, driven by the combination of available Web data, and the likelihood of mainstream consuming applications.

Schema.org is also finding new kinds of uses. One exciting line of work is the use of schema.org marked up pages as training corpus for machine learning. John Foley, Michael Bendersky and Vanja Josifovski used schema.org data to build a system that can learn to recognize events that may be geographically local to a particular user. Other researchers are looking at using schema.org pages with similar markup, but in different languages, to automatically create parallel corpora for machine translation.

Four years after its launch, Schema.org is entering its next phase, with more of the vocabulary development taking place in a more distributed fashion, as extensions. As schema.org adoption has grown, a number groups with more specialized vocabularies have expressed interest in extending schema.org with their terms. Examples of this include real estate, product, finance, medical and bibliographic information. A number of extensions, for topics ranging from automobiles to product details, are already underway. In such a model, schema.org itself is just the core, providing a unifying vocabulary and congregation forum as necessary.