Data at Basecamp with Jane Yang

Data and policy

at

Basecamp

Basecamp

Data and policy

Written by

Javi Santana

March 18, 2021

About Jane

Tell us about yourself and your background

The short version is I’m a Jane of all trades. :)

My academic training is in chemical engineering with a focus on sustainable energy and engineering biology. It wasn’t a particularly well-thought out major. I’d just always enjoyed maths and sciences; I was lucky to have incredible teachers in those subjects and it also runs in the family: my parents worked as a doctor and engineer in China before immigrating to the US, where in order to survive they needed to pivot their careers significantly. In any case, studying engineering is when I graduated from analysis in Excel to picking up programming analysis languages and solidified a systems-level mindset.

During university, I got involved with Engineers Without Borders (EWB) and that ended up being incredibly formative. It was a practical way of addressing the unequal access to opportunities and resources I’d observed all throughout my life. That set me down a career path I’d never considered before.

My first full-time job after graduating university in 2011 was working as a grant writer for the International Rescue Committee — Kenya. I worked and lived in Nairobi, with visits to Kakuma and Dadaab refugee camps as well as Turkana. My job was to write with conviction. I learned how dehumanizing the typical grant application process was, forced to compress injustice into bite-sized metrics to convince those with funds that they needed to pay attention. I remember penning a paragraph that included mention of “X maternal deaths last month” and wondering about the lives of those mothers.

The IRC is an incredible organization doing necessary work, but I learned that I didn’t have the emotional fortitude to continue working solely in humanitarian crisis, i.e. life-saving. I needed to work on life-building.

I then spent a few years in Washington DC as a management consultant working with various US government agencies and multilateral organizations including USAID and the World Bank. A common theme of my work was elevating the use of analytics and data visualizations to complement interviews in informing decisions. I also taught courses for my colleagues on practical goal-setting that integrated selecting meaningful metrics to track.

I felt the pull to go back into direct service and transitioned to a Nairobi-based in-house strategy & research team at One Acre Fund, a social enterprise with systems-level thinking focused on reducing poverty amongst smallholder farmers. Here, I picked up R and introduced dashboarding tools for scenario analysis. I also solidified my thinking around the importance of having a triangle of data, intuition, and anecdotes to inform decision-making.

Personal factors led me to return to the US midwest where I’d grown up. It was at that point that I pivoted into a ‘pure data’ role. I helped stand up a dedicated data team at IFF, a leading community-development financial institution, and led the strategic work to build data systems for their future.

And then a certain job description came my way. Initially I wasn’t interested in applying myself given the job level (I was a Data Director at the time, and the position was for a Data Analyst) and my general skepticism of for-profit tech companies that stems from a deep distaste for the destructive capitalism so many practice. I sent it to a few friends I thought might be interested instead. Then a podcast episode made me reconsider and I decided to apply myself. That’s how I landed at Basecamp, where I currently work.

What’s your position at Basecamp, how does your day looks like

I work on data and policy at Basecamp. It’s a pretty wide umbrella that covers everything from to carbon footprinting to tax management to (anonymized and aggregate) product analysis. The throughline is examining human questions at a system level, and trying my best to balance the use of quantitative data, qualitative stories, and quintessential intuition in that endeavor.

In a typical six-week “cycle”, as we call it, I’m keeping eyes on how the business is doing financially; sharing snippets of information on new feature uptake; collaborating with various colleagues to evolve our data infrastructure; working across the company to level up our product policies to be clearer, more human, and more relevant to the context of society; and advocating internally for the company to set a higher bar on issues ranging from environmental sustainability to racial equity.

An ever-ongoing project is evolving our data infrastructure and data retention policies at Basecamp. We’re always working on striking the right balance between having insight from data analysis and holding ourselves to a high bar of data privacy.

Data at Basecamp

What is the company about? What are you building?

Basecamp is a tech company with a soul. We build products that help people and teams do their best work. Our flagship eponymous product Basecamp is a collaborative project management tool. HEY is an email service that protects your privacy and attention.

In this basecamp blog post there is a pretty good list of challenges to be solved using data. What are the main problems you’re tackling?

What I’m doing on the data analysis front ebbs and flows. When I first started, I spent a lot of energy establishing regular business reporting — including expense reporting, which was not previously analyzed with regularly — and getting a handle on our sales tax responsibilities. There was also a back-log of marketing site A/B tests to analyze and I spent time doing deeper dives into some trends we were seeing on the business side. It also so happened that Basecamp experienced a series of severe outages right after I started (coincidental timing, not causal!) so I had to get oriented around a whole new set of data systems in rapid fire. Luckily I had very kind and knowledgeable colleagues, including Basecamper emeritus Justin White, who supported me along the way.

These days, the processes for business reporting and tax management are fairly established and we’re not doing as much A/B testing so I’m spending more time democratizing access to data and refocusing more time on building company awareness of how customer uptake of features aligns with the intended philosophy behind the design choices. It’s a very dynamic portfolio of work.

What are the ones still unsolved?

An ever-ongoing project is evolving our data infrastructure and data retention policies at Basecamp. We’re always working on striking the right balance between having insight from data analysis and holding ourselves to a high bar of data privacy. One of the first cross-team projects I worked on was establishing automatic account deletion for long-abandoned accounts. A year ago I worked with my colleague Adam Stoddard to excise all third party web tracking from our apps and websites. I’ve also had the pleasure of collaborating with my colleague Jorge Manrubia, who built a way for sensitive data to be encrypted at-rest for HEY that he’s now open-sourcing in Rails.

We also are constantly deterring malicious actors: spammers, phishers, and the like. My colleagues Rosa Gutíerrez and George Claghorn are the heroes who handle most of these cases but I’ve been involved as well, particularly when we experience persistent surges. Figuring out how to proactively deter malicious actors without negatively affecting the experience of our legitimate customers is a constant project.

How is your data team[s] organized? People, roles, workflows. Do you use the same “shapeup” process for internal features?

I’m the only person at Basecamp with “data” in their job title but data is by no means a monopoly and it takes a village to maintain and evolve our data infrastructure.

A lot of people at Basecamp have the ability to pull and analyze data. I never get offended when other people do data write-ups. In fact, I think it's cool there's such a high baseline level of data fluency at Basecamp. When I think about how I can add the most value to Basecamp, I consider where I have more unique abilities to contribute.

On the infrastructure side, our Operations and Security, Infrastructure, & Performance (SIP) teams do the heavy lifting. They both maintain our data infrastructure and monitor operational data: performance, throughput, etc.

In terms of work processes, I do plan out my work to some extent in six-week periods but with far more flexibility to respond to changing priorities and contexts.

How do you make decisions about how to measure something? For example, how do you decide which metrics to use to understand if a new feature is working or not.

Our default is to not add any extra measurement. Basecamp is proudly not "data-driven." Before late 2010, there was no data analyst at Basecamp. Since then, the company has considered data as one ingredient to inform decisions. We consider a variety of factors including the triangle of data, anecdotes, and intuition when making decisions.

When it comes to features, we’re mostly interested in what is the uptake? That generally can be queried from the application database. We focus more on high-level indicators to tell us when our customers like something we’ve released like signups and trial conversion rates.

So in order for us to rig up specific measurement for a particular feature or customer behavior, we really need to need it. It takes a pitch to instrument, it’s not a given.

Could you explain a little bit more the "triangle of data, anecdotes, and intuition "

There are three primary sources of information that should influence any given decision:

  1. Quintessential instinct: i.e. your gut, developed from years of experience and building internal values frameworks
  2. Quantitative data: aggregated information that tells you about signals within a crowd that you might have missed if you only asked a few folks
  3. Qualitative data: much more rich information about a select few experiences

When these three sources point to the same decision, rejoice! You're probably on the "right" track — at least based on what you know. When the sources point in different directions though, it means it is probably worth digging in a little more on why they aren't aligned. Is it differences in assumptions? Biases? A mistake? Even after you figure out the why (or if there isn't a way to really suss it out), then determine what the conscious trade-offs you are making and make your decision with awareness.

I've got an essay I want to write on the concept of data never having gotten its driver's license; whenever I write that, I'll send it along. :)

Looking forward to reading it, we will link it here when it's published

update: here is the article.

Can you describe the high level data architecture?

Our applications run on MySQL databases. We’ve built home-grown logging and web analytics data pipelines that flow into a variety of tools for analysis and monitoring including BigQuery, Grafana, and Kibana.

Why "home-grown" tools instead of external products?

It's a balance of prioritizing privacy, having control for maintenance, and cost.

We've found many third-party options make sense if you constantly rely on data as part of your product development, for instance, but for our purposes, sticker shock often prompts us to ask: what could be build simply in-house that gets the job done?

That all said, data tooling is not our bread and butter, and we don't always have the in-house resources to do a ton of development and maintenance. So when there are third-party tools that do provide useful analytics for us in a way that meets our privacy bar and make sense cost-wise for our use case, we do look into it. A couple examples include error tracking analytics (we use Sentry) and we're in the midst of retiring our homegrown web analytics pipeline in favor of a couple third-party options that together give us the functionality we need. One of those is privacy-forward Plausible.


Tell us some interesting numbers (traffic, rows, gigabytes you store, process and so on).

We store and process TB-scale data, have millions of visitors to our websites each year, and operate 9 applications to the end of the Internet… all with just 59 Basecampers.

What are the most interesting, unexpected, or challenging lessons that you have learned at Basecamp? I’m a big fan of Basecamp and I know you do things in a different way so I guess the way you approach analytics is also different.

We often consider how can we do less of something at Basecamp. For instance, our in-house JavaScript experts spend more of their time working on ways to reduce the amount of JavaScript that needs to be written. Somewhat similar, I spend a lot of time thinking about how to not analyze data. When we need to analyze data, we can and do but more often we ask ourselves: do we really need it?

What are the trends do you see in data engineering/analytics

There’s tension around the soul of analytics. That age-old question of not “can we?” but “should we?” I’m heartened to see more people asking the latter and advocating that data practitioners see that our jobs are not neutral. So often, a data analyst holds the choice on whether to guide an initiative down a generative or destructive path. We need to take ownership of that. We have to teach that the first step to a project isn’t figuring out which cool new algorithm we want to use, but what are the human implications of this question?

Another trend I see is an increasing awareness of privacy, and the fact that “free” generally means “paid with my personal data.” It’s becoming mainstream, which is both exciting and poses a new challenge: how to establish different financing models for essential digital services in an equitable way. Most people can’t afford to pay cash for 100 subscription services a month, which is part of how we got where we are now with the data-fueled Internet.

There’s tension around the soul of analytics. That age-old question of not “can we?” but “should we?”



What are your favorite data tools

R is my go-to tool. I love its versatility and there’s a warm community including the folks behind the tidyverse and R-Ladies.

A lot of data professionals give spreadsheets grief but I think they’re often the right tool in terms of visualizations and accessibility when working with small datasets.

I also strongly believe that staying connected to the humans represented in the data you work with is essential for all data practitioners. This is the work of soul management and without it, “data” becomes desolate.

Some blogs, books, podcasts, people related to data you read/follow?
  • Data & Society, a think tank that studies the social implications of data-centric technologies & automation.
  • Nathan Yau, a thoughtful data visualizer, via his blog Flowing Data
  • Vicki Boykis, a data scientist and tech industry commentator, via her newsletter Normcore Tech
  • Randy Au, a data scientist, via his newsletter about the less sexy but important side of data sciences Counting Stuff
  • The Justice Download and The Markup for news that keeps the human implications of data and technology centered 
  • Data for Black Lives, a movement with clarity that comes from real talk. Recommended with the self-introspection of: "in the spaces I'm in and the products I make, whose safety am I prioritizing? Who gets affected and how by the choices I'm making?"



What other companies are you curious about how they manage their data?

I’m curious if there’s been meaningful headway on improving interoperability of health data systems. There’s a lot of privacy issues of course. Still it shouldn't be so hard for people to move from one medical provider to another. There's a lot of work to be done on the data engineering side to make an often extremely stressful experience easier for people.