Is AI the Next Crypto? Insights from 2M HN comments

Is AI the Next Crypto? Insights from 2M HN comments

Kyle Corbitt

Nov 8, 2023

OpenPipe is a platform that makes it easy to convert your existing prompts and completions into a task-specific fine-tuned model. You can get started in a few minutes and the fine-tuned model is often cheaper, faster and often more accurate than the prompt you started with.

Plenty of observers have noted the similarities between crypto’s frothy history and the current boom in AI. Crypto seems to have busted, at least for the moment — is AI next?

Just for fun, I decided to analyze some data on the question. Both crypto and AI have been heavily debated on Hacker News, with discussions going back years. By looking at trends in HN commenter opinions we might find interesting similarities and differences.

Gathering the data

Note: if you want to follow along at home, I uploaded the public dataset of all HN posts and comments here. I also included all the code used in this analysis here, but be warned, it's research grade at best!

To get started, I used the HN API to pull all 38 million posts and comments in HN history. Using Polars, we can narrow the dataset to only include front-page stories since 2010:

stories = hn.filter(
    (pl.col("type") == "story")
    & (pl.col("text").is_null())
    & (pl.col("url").is_not_null())
    & (pl.col("score") >= 30)
    & (pl.col("descendants") >= 5)
    & (pl.col("time") >= datetime(2010, 1, 1))
# -> 285,603 stories

285K stories is still a lot to go through! A simple approach to sort these posts into categories might involve a keyword search for strings like “Bitcoin,” “Blockchain,” “Crypto”, etc. But that would produce a lot of false negatives — stories like Technical Analysis of the Poly Network Hack and Kimchi: The latest update to Mina’s proof system would easily get missed.

Throwing GPT-3.5 at the problem

Instead of naive string matching, let’s use an LLM! I prepared a dataset of all 285K front-page stories along with the top comment on each, since sometimes the story title alone doesn’t provide enough information. Let’s roughly calculate the cost of classifying this dataset using GPT-3.5:

avg_chars = (
    + stories["title"].str.len_chars().mean()
    + stories["url"].str.len_chars().mean()

# A token averages to 4-5 chars in English
avg_input_tokens = avg_chars * 0.2

# Add extra tokens for the task instructions
avg_input_tokens += 100

total_input_tokens = avg_input_tokens * len(stories)

# Simple task to classify across both categories, assume 20 output tokens
total_output_tokens = 20 * len(stories)

input_token_price = 0.001 / 1000
output_token_price = 0.002 / 1000

approx_cost = (
    total_input_tokens * input_token_price + total_output_tokens * output_token_price

The total cost comes to $86 — not bad to classify every HN front page story for the last 13 years!

Once we’ve classified these stories, we can use Polars and Seaborn to graph the popularity of AI and blockchain as fractions of all stories posted:

Cool! I noticed a couple of surprises on this graph:

  • With the exception of the initial 2013-2014 crypto wave, ML consistently occupied a larger fraction of front-page stories over the last 13 years.

  • There was a large, long-lived bump in AI/ML stories that peaked in 2018 and tailed off until 2022, when the current wave of LLM hype began.

Vibes-based commenting

Ok, fraction of all posts is an interesting metric, but I’d love to know how HN commenters feel about the topics… is there a way to answer that question (without breaking the bank)?

First off, I prepared a dataset of all comments on stories that were classified as either crypto or AI-related. That gets me 2.1M comments. Classifying sentiment toward a particular topic is probably too complex for a classical sentiment analysis model (someone could write an angry comment defending crypto), so let’s try GPT-3.5 again. To test, I uploaded 4000 comments to a dataset on OpenPipe*, labeled them with both GPT-3.5 and GPT-4 (you can see the prompt I used here), and used the evals page to compare the outputs side by side:

You can see at the top right that GPT-3.5 agrees with GPT-4 (labeled as “Original Output”) only 71.5% of the time. That’s not great, since there are only 3 possible classes. Using filters, I can quickly find the outputs that don’t match, and GPT-4’s answers are clearly better most of the time. Ok fine, how much would it cost to just use GPT-4 then? After all, OpenAI just released price cuts!

# Found on OpenPipe
average_input_tokens = 446
average_output_tokens = 20

input_token_price = 0.01 / 1000
output_token_cost = 0.03 / 1000

total_cost = (
    input_token_price * average_input_tokens + output_token_cost * average_output_tokens
) * len(crypto_ai_comments)

Ok, GPT-4 can get the job done for… $10K. Hmm. We could stick with GPT-3.5, but its accuracy isn’t great. And it would still cost $1K, for less trustworthy results.

We do have another option though, which is to fine-tune our own model! By using fine tuning, we can get better accuracy with a much smaller model than we'd be able to with a general-purpose prompted model. Smaller models mean faster inference, and much lower prices. If you're interested in learning more or doing fine-tuning on your own, we created a guide that you can find here. The OpenPipe platform also makes creating a fine-tuned model incredibly easy — it’s literally 2 clicks once your dataset is generated.

In this case we'll fine-tune a Mistral 7B model on the dataset. Mistral 7B is one of the strongest LLMs in its size class right now, beating the similarly-sized Llama 2 variant on most benchmarks.

I went ahead and did that fine-tune, and added it to the evals page. And since it’s using the same dataset as before, we can compare it head to head with the GPT-3.5 prompt from earlier. We’ve improved our match rate with GPT-4 from 71.5% to 87.8% — not bad!

Of the 12.2% that our fine tune still got wrong, I quickly reviewed a few dozen. In many cases, there really are multiple “correct” answers — for example, GPT-4 classified this comment as positive towards AI, and the fine-tune classified it as neutral. I think both answers are defensible.

Just the Facts

Ok, let’s plot user sentiment over time! On this graph each negative comment counts for -1 and each positive comment is +1, and I compute a rolling average over a 6-month window.

Interesting! A couple of observations:

  • While sentiment towards AI has always been higher than sentiment towards crypto, AI sentiment steadily dropped from 2010-2018 and has been hovering around neutral for the last 5 years.

  • Crypto sentiment has a much weaker correlation with the crypto hype cycle than I’d expected. Although it seems like right now not many commenters are popping up to defend FTX/SBF. 😂

Ok so admittedly, we haven’t come away with a clean answer to the question in the title (although for the record, I think the answer is “no”). But we’ve found some interesting trends in the stories posted on the front page of HN, as well as the way the average commenter’s opinion has shifted over time. And hopefully we’ve also learned more about how modern LLM-based tools can solve problems in a few hours that only a couple years ago would have taken an ML team weeks!

Addendum: HN Sentiment Overall

HN user dragonwriter noted that up until recently, sentiment towards crypto and AI was relatively strongly correlated, and wondered if that was an artifact of HN sentiment just getting more negative overall.

I had actually already run the same analysis for Rust and remote work when preparing this post, and removed them in the interest of brevity. Here's the comment sentiment graph with those added back in:

Interestingly, there is in fact a noticeable downward slope in average sentiment over time for those topics as well, although they both remain far more popular than either AI or crypto.