Polywrap Telemetry / Data Analytics

Overview

Preface: this is a sensitive topic, where we must ensure user’s rights to privacy are always respected.

The Polywrap project is growing rapidly and we need a way to use data to ensure that we’re directing our focus into areas that provide better user experiences.

Why do we need data?

Data helps us answer important questions in all areas of our project. Here are some examples:

Tech:

  • What are the most popular Polywrappers?
  • Which Polywrappers are typically used together?

Strategy & Operations:

  • Is our community active and growing?
  • Where do the top contributors come from? How often do they contribute and what backgrounds do they have?

A typical data analytics cycle

Here’s an example of what a data analytics cycle might look like:

  1. Capture
  • Data acquisition
  • Data entry
  1. Maintain
  • Data architecture
  • Data warehousing
  1. Process
  • Data mining
  • Data modeling
  • Data summarization
  1. Analyze
  • Prediction analysis
  • Qualitative analysis
  1. Communicate
  • Data reporting
  • Data visualization
  • Decision making

Discussion

Let’s use this forum to discuss anything relating to Polywrap telemetry, including:

  • How would Polywrap go about collecting data?
  • Should we do data collection and analytics at all?
  • What type of hires would we need?
2 Likes

Preface: this is a sensitive topic, where we must ensure user’s rights to privacy are always respected.

A few principles I think we can apply here:

1. Wherever we add telemetry events, users must “opt-in” first. Otherwise, they’re off by default.
2. All telemetry events should be anonymized, lacking the concept of a “user” and instead being thought of as “anonymous sessions”.
3. When telemetry events are added, we should have some review process put in-place. This is standard practice for larger tech companies, where there’s a “telemetry review process” that tries to ensure nothing potentially harmful is collected from users.

Some data I think we could be gathering from our own tools:

Hub - errors, page visits, search terms, favorites, playground queries, time spent, external links followed
CLI - errors, commands run, command specific events
Client - errors, wrappers & methods called
Landing Page - errors, page visits, time spent, external links followed
Docs - errors, page visits, time spent, external links followed

Some data I think we can pull in from outside sources:

  • Development Package Downloads (NPM, etc)
  • Google Search Count
  • Pagerank Stats (number of external links to our websites)
  • Twitter Impressions
  • Social Sentiment

Some interesting use-cases:

A. Developer “Jill” has chosen to turn on anonymized telemetry for Polywrap CLI. Jill encounters an error when building & testing their wrapper. In the error message, is a “session ID” that can be used to lookup a full trace of telemetry events for the session in question. Through the telemetry database’s public API / UI, anyone can now walk through the trace and try and understand what went wrong for Jill.
B. Polywrap contributor “Carl” wants to understand what the most popular wrappers are. Carl decides to aggregate:

  • Most viewed wrappers from Hub DB
  • Most queried wrappers from Client DB
  • Most imported wrappers from Hub / CLI DB
  • Most common search terms within the Hub

Some concerns that must be thought about:

  • If we’re capturing the value of input arguments for wrapper methods (ex: the arguments to Uni’s swap(...) function), we must make sure users / developers are VERY aware of this. If a naive developer added a method argument named privateKey: String!, you can imagine what might happen next.
    • This should be something we already have in the our docs, a VERY NOTICEABLE warning telling developers that they should not pass sensitive data into wrapper methods, and should instead use host plugins.
1 Like