Preface: this is a sensitive topic, where we must ensure user’s rights to privacy are always respected.
The Polywrap project is growing rapidly and we need a way to use data to ensure that we’re directing our focus into areas that provide better user experiences.
Why do we need data?
Data helps us answer important questions in all areas of our project. Here are some examples:
Tech:
What are the most popular Polywrappers?
Which Polywrappers are typically used together?
Strategy & Operations:
Is our community active and growing?
Where do the top contributors come from? How often do they contribute and what backgrounds do they have?
A typical data analytics cycle
Here’s an example of what a data analytics cycle might look like:
Capture
Data acquisition
Data entry
Maintain
Data architecture
Data warehousing
Process
Data mining
Data modeling
Data summarization
Analyze
Prediction analysis
Qualitative analysis
Communicate
Data reporting
Data visualization
Decision making
Discussion
Let’s use this forum to discuss anything relating to Polywrap telemetry, including:
How would Polywrap go about collecting data?
Should we do data collection and analytics at all?
Preface: this is a sensitive topic, where we must ensure user’s rights to privacy are always respected.
A few principles I think we can apply here:
1. Wherever we add telemetry events, users must “opt-in” first. Otherwise, they’re off by default. 2. All telemetry events should be anonymized, lacking the concept of a “user” and instead being thought of as “anonymous sessions”. 3. When telemetry events are added, we should have some review process put in-place. This is standard practice for larger tech companies, where there’s a “telemetry review process” that tries to ensure nothing potentially harmful is collected from users.
Some data I think we could be gathering from our own tools:
Hub - errors, page visits, search terms, favorites, playground queries, time spent, external links followed CLI - errors, commands run, command specific events Client - errors, wrappers & methods called Landing Page - errors, page visits, time spent, external links followed Docs - errors, page visits, time spent, external links followed
Some data I think we can pull in from outside sources:
Development Package Downloads (NPM, etc)
Google Search Count
Pagerank Stats (number of external links to our websites)
Twitter Impressions
Social Sentiment
Some interesting use-cases:
A. Developer “Jill” has chosen to turn on anonymized telemetry for Polywrap CLI. Jill encounters an error when building & testing their wrapper. In the error message, is a “session ID” that can be used to lookup a full trace of telemetry events for the session in question. Through the telemetry database’s public API / UI, anyone can now walk through the trace and try and understand what went wrong for Jill. B. Polywrap contributor “Carl” wants to understand what the most popular wrappers are. Carl decides to aggregate:
Most viewed wrappers from Hub DB
Most queried wrappers from Client DB
Most imported wrappers from Hub / CLI DB
Most common search terms within the Hub
Some concerns that must be thought about:
If we’re capturing the value of input arguments for wrapper methods (ex: the arguments to Uni’s swap(...) function), we must make sure users / developers are VERY aware of this. If a naive developer added a method argument named privateKey: String!, you can imagine what might happen next.
This should be something we already have in the our docs, a VERY NOTICEABLE warning telling developers that they should not pass sensitive data into wrapper methods, and should instead use host plugins.