Find Federico on MastodonFind John on MastodonFind Alex on MastodonFind Club MacStories on Mastodon

AI Is Strip Mining the Web

THE EXTENSION

Exploring topics beyond our day-to-day coverage.

AI Is Strip Mining the Web

Source: Reddit.

Earlier this week, I went on a bit of a rant about artificial intelligence during Ruminate, the bi-weekly podcast I do with Robb Knight. It’s a fun show about the Internet, snacks, and videogames that I’d love for you to check out sometime if you haven’t.

I bounced those ideas off of Robb specifically to work through what I expected would be this column for Weekly. Talking through an idea always crystalizes it in my mind, and in light of Robb’s feedback and hearing from listeners throughout the week, I thought I’d break those ideas down a little further today.

I went into this week’s episode of Ruminate with the notion that no technology in recent memory simultaneously fascinates and repulses me as much as artificial intelligence. That’s sort of true, but the reality is more complex. It’s not the technology itself, but how the technology is being used and how we got to where we are with it, that’s the issue.

Proofing this article using Speechify.

I’m not going to survey all the uses of artificial intelligence here, but there’s a lot of good being done in medical research and other scientific pursuits with AI. Closer to home, I’m a big fan of the more natural text-to-speech engines I use to listen to web articles and proofread my writing. However, as we’ve seen with fake political robocalls, there’s harm that can be done when that technology is used for voice cloning, too.

What bothers me the most, though, is how AI companies have been built and the way they’re now jockeying to become the web’s gatekeepers. I suppose that’s not shocking given what I do for a living.

Let’s start with artificial intelligence companies’ origins. These companies have attracted a lot of money and are highly valued partly because they haven’t paid for the raw material on which they’re built. It’s like a car manufacturer not having to pay for engines anymore. It’s not the only cost of building a car company, but it’s a big, valuable one. For OpenAI, Perplexity, and many others, the ‘engine’ they get for free is the web’s content. AI companies’ LLMs are built from decades of creativity and hard work that was made for people, not bots, and that doesn’t seem right.

This is where we get into the whole topic of ‘fair use,’ a legal concept under US copyright law that many AI companies rely on in defense of scraping the web. Fair use is why it’s okay to save an article from MacStories in a read-later app but not to republish it on another website wholesale. That’s an easy example, but it gets a lot trickier with things like LLMs that weren’t around when copyright law was enacted.

I imagine AI companies feel that what they do is similar to Google indexing the web. Google crawls the web, making copies of content it comes across, to index everything for its search engine. And, while Google is certainly not innocent when it comes to skirting close to—or over—the fair-use line, if you’re looking strictly at its search engine, the copies it makes have historically returned some value to websites through discovery.

That’s where I’d argue that AI companies are different. Products like chatbots and summaries of topics and articles are meant to replace the web, not allow users to discover it. Some companies use annotations alongside AI-generated results as a counterbalance, but it strikes me as window dressing to make chatbots feel like search engines, which they really aren’t.

Finally, I want to expand on my comments about the Arc browser. As a browser, I think it has some interesting features and ideas about what modern web browsing can be. Many of those features aren’t for me, but I understand why others like them.

However, I’m not so sure The Browser Company is, in fact, a browser company. Maybe it started out that way, but recently, its CEO floated the notion of making the pinch to summarize feature of Arc Search the default view when visiting certain kinds of websites. Here’s Federico’s 8200-word MacPad story boiled down to four unremarkable sentences:

Think about what it would be like for that to be the default view in Arc. It’s not a MacStories.net webpage. It’s an auto-generated page that’s an answer to a question. It’s also a page that’s so far removed from the notion of browsing the web that I’m not sure Arc could even be called a web browser anymore if that were the default view. Instead, it’s a parallel, auto-generated web built on top of the existing web that acts as a gatekeeper to the source material.

It’s that intermediate layer that makes me suspicious of Arc’s motives. The company is starting to feel like a wolf in sheep’s clothing, a browser with an interesting, playful design on the surface and a dark heart lurking beneath. My suspicions are only compounded by the fact that The Browser Company hasn’t so much as hinted publicly about what its business plan is. Does it hope to become the front end for an AI company’s technology? I suspect it does, but no one knows for sure.

I’ll wrap up with something I read recently that Robb linked to on his website by Adam Newbold of Neatnik:

Something to try the next time you’re considering a product or a service: spend a little time thinking about the motive behind it. It’s something that’s easy to skip entirely, especially when we’re busy evaluating a thing by its far more obvious and surface-level criteria. But motive is really important, and I think it can be an effective filter for making good decisions about how we spend our money and attention (and whom we support along the way).

I think that’s pretty good advice and why I’ve let go of the Arc FOMO and put the app aside. I’ll check back in now and then to see if The Browser Company is on a better path, but for now, it’s not an approach I can support.

My thoughts on Arc notwithstanding, I don’t mean to make anyone feel bad about using the app. This sort of calculus is different for everyone. For instance, I still use Gmail, knowing that I’m the product, not the customer. Does that undermine where I draw the line with Arc? I don’t think so, but I’m sure some people would conclude it does.


I suppose this all sounds pretty dire and bleak, and to be sure, it’s not hard to see that the web is at an inflection point. AI spam is flooding the web with junk, and online media companies are laying off legions of people or closing down entirely. Those are real, tangible losses for the web.

However, I remain optimistic that no matter how good AI gets, it won’t ever displace the human creativity that has found a home on the Internet. The web is changing in a fundamental way. Still, perhaps paradoxically, the more AI replaces it with a generic top layer, the more I think people will seek out a more authentic web built by people whose ideas they want to hear and whose motives they trust.