canmom:

youzicha:

canmom:

Hypothetical Decentralised Social Media Protocol Stack

if we were to dream up the Next Social Media from first principles we face three problems. one is scaling hosting, the second is discovery/aggregation, the third is moderation.

hosting

hosting for millions of users is very very expensive. you have to have a network of datacentres around the world and mechanisms to sync the data between them. you probably use something like AWS, and they will charge you an eye-watering amount of money for it. since it’s so expensive, there’s no way to break even except by either charging users to access your service (which people generally hate to do) or selling ads, the ability to intrude on their attention to the highest bidder (which people also hate, and go out of their way to filter out). unless you have a lot of money to burn, this is a major barrier.

the traditional internet hosts everything on different servers, and you use addresses that point you to that server. the problem with this is that it responds poorly to sudden spikes in attention. if you self-host your blog, you can get DDOSed entirely by accident.

scaling hosting could theoretically be solved by a model like torrents or IPFS, in which every user becomes a ‘server’ for all the posts they download, and you look up files using hashes of the content. if a post gets popular, it also gets better seeded! an issue with that design is archival: there is no guarantee that stuff will stay on the network, so if nobody is downloading a post, it is likely to get flushed out by newer stuff. it’s like link rot, but it happens automatically.

IPFS solves this by ‘pinning’: you order an IPFS node (e.g. your server) not to flush a certain file so it will always be available from at least one source. they’ve sadly mixed this up in cryptocurrency, with ‘pinning services’ which will take payment in crypto to pin your data. my distaste for a technology designed around red queen races aside, I don’t know how pinning costs compare to regular hosting costs.

theoretically you could build a social network on a backbone of content-based addressing. it would come with some drawbacks (posts would be immutable, unless you use some indirection to a traditional address-based hosting) but i think you could make it work (a mix of location-based addressing for low-bandwidth stuff like text, and content-based addressing for inline media). in fact, IPFS has the ability to mix in a bit of address-based lookup into its content-based approach, used for hosting blogs and the like.

as for videos – well, BitTorrent is great for distributing video files. though I don’t know how well that scales to something like Youtube. you’d need a lot of hard drive space to handle the amount of Youtube that people typically watch and continue seeding it.

aggregation/discovery

the next problem is aggregation/discovery. social media sites approach this problem in various ways. early social media sites like LiveJournal had a somewhat newsgroup-like approach, you’d join a ‘community’ and people would post stuff to that community. this got replaced by the subscription model of sites like Twitter and Tumblr, where every user is simultaneously an author and a curator, and you subscribe to someone to see what posts they want to share.

this in turn got replaced by neural network-driven algorithms which attempt to guess what you’ll want to see and show you stuff that’s popular with whatever it thinks your demographic is. that’s gotta go, or at least not be an intrinsic part of the social network anymore.

it would be easy enough to replicate the ‘subscribe to see someone’s recommended stuff’ model, you just need a protocol for pointing people at stuff. (getting analytics such as like/reblog counts would be more difficult!) it would probably look similar to RSS feeds: you upload a list of suitably formatted data, and programs which speak that protocol can download it.

the problem of discovery – ways to find strangers who are interested in the same stuff you are – is more tricky. if we’re trying to design this as a fully decentralised, censorship-resistant network, we face the spam problem. any means you use to broadcast ‘hi, i exist and i like to talk about this thing, come interact with me’ can be subverted by spammers. either you restrict yourself entirely to spreading across a network of curated recommendations, or you have to have moderation.

moderation

moderation is one of the hardest problems of social networks as they currently exist. it’s both a problem of spam (the posts that users want to see getting swamped by porn bots or whatever) and legality (they’re obliged to remove child porn, beheading videos and the like). the usual solution is a combination of AI shit – does the robot think this looks like a naked person – and outsourcing it to poorly paid workers in (typically) African countries, whose job is to look at reports of the most traumatic shit humans can come up with all day and confirm whether it’s bad or not.

for our purposes, the hypothetical decentralised network is a protocol to help computers find stuff, not a platform. we can’t control how people use it, and if we’re not hosting any of the bad shit, it’s not on us. but spam moderation is a problem any time that people can insert content you did not request into your feed.

possibly this is where you could have something like Mastodon instances, with their own moderation rules, but crucially, which don’t host the content they aggregate. so instead of having ‘an account on an instance’, you have a stable address on the network, and you submit it to various directories so people can find you. by keeping each one limited in scale, it makes moderation more feasible. this is basically Reddit’s model: you have topic-based hubs which people can subscribe to, and submit stuff to.

the other moderation issue is that there is no mechanism in this design to protect from mass harassment. if someone put you on the K*w*f*rms List of Degenerate Trannies To Suicidebait, there’d be fuck all you can do except refuse to receive contact from strangers. though… that’s kind of already true of the internet as it stands. nobody has solved this problem.

to sum up

  • primarily static sites ‘hosted’ partly or fully on IPFS and BitTorrent
  • a protocol for sharing content you want to promote, similar to RSS, that you can aggregate into a ‘feed’
  • directories you can submit posts to which handle their own moderation
  • no ads, nobody makes money off this

honestly, the biggest problem with all this is mostly just… getting it going in the first place. because let’s be real, who but tech nerds is going to use a system that requires you to understand fuckin IPFS? until it’s already up and running, this idea’s got about as much hope as getting people to sign each others’ GPG keys. it would have to have the sharp edges sanded down, so it’s as easy to get on the Hypothetical Decentralised Social Network Protocol Stack as it is to register an account on tumblr.

but running over it like this… I don’t think it’s actually impossible in principle. a lot of the technical hurdles have already been solved. and that’s what I want the Next Place to look like.

This is something that I have been daydreaming about for a long time also. I agree by far the biggest problem would be to actually get people to use it, but still it’s interesting to think about the technical issues…

I think aiming specifically to “recreate tumblr” actually helps with some of the questions. If people routinely reblog posts, then it would be natural for them to also “seed” those posts, giving some redundancy. The client could store posts that you have viewed locally, so that they don’t go away too easily and you can reblog them later if the links rotted.

Also, the way to discover content/users on tumblr is that you see it reblogged by someone you follow, so there is no recommendation algorithm that can be manipulated. There is a trade-off between privacy and discoverability: if (like Twitter) likes and follows are public, then anyone can make “client-side” recommendations based on “liked by somebody who is followed by many users that you follow”, etc.

Making follower/following-lists are public would also have a nice bonus effect on direct messaging. You can always sign and publish the public keys of anyone you interact with, to construct a PGP-style web-of-trust. This system would be really resistant to eavesdropping. As soon as you knew even a single correct identity (e.g. because someone emailed it or published it on their web page or gave you a physical business card), then any attempt to man-in-the-middle you would instantly unravel. We could have secure communications without needing a centralized certificate authority.

Apart from data availability, I think some other problems are:

Naming. One problem with P2P systems is that it’s hard to create globally unique nicknames. I want to be “youzicha”, but without a central party, how can you enforce that nobody else uses the same nickname? Actually, nowadays you can use a blockchain to do it, but this is pretty heavy-handed, you would need to include some kind of rationing or payments or proof-of-work to prevent people from immediately nickname-squatting every short name. I think it’s better give up on unique names altogether, so that people’s unique identifier is just their public key, and then they can publish whatever metadata they like to make themselves easier to find. ICQ used to work this way, with users being identified by just a number but no human-readable nickname.

Anonymization. We don’t think about it so often, but one service that centralized companies provide to us is to act as anonymizing proxies. It works both ways: I can publish this tumblr pseudonymously as “youzicha” without disclosing my real-world identity, and also I can look at peoples post on Twitter and Tumblr without them being aware of it. If everything was purely P2P, you would see each page view (and the IP-address of the person who made it) in real-time, which seems like a nerve-wracking experience.

I think this is a genuine advance: back in the old Usenet days people generally posted under their full government names, which maybe worked well because Usenet as a whole was a kind of subculture, but now people constantly doxx each other and having the wrong political opinions can damage your career. (C.f. the debate surrounding Facebook, Google+ and their “real names” policy.) If the system doesn’t provide anonymity it seems important to at least make this fact very clear in the user interface, users could get burnt. Maybe automatically do some IP geo-lookups to illustrate the kind of information it leaks.

Blocklists, spam, harassment. As you noted above this seems like a big problem.

But if implemented well it could be a selling point, because the current solutions are so disliked. On the being-censored side, sites like Hacker News and Twitter play weird mind games to secretly shadow-ban you, which feels disrespectful. On the censoring sites, people who deal with a lot of incoming messages find the current blocking solutions too blunt. If you provided an elaborate (Turing complete?) policy language, a thousand flowers could bloom: shared blocklists, “topics” like USENET newsgroups which anyone can post to, and then “overlay” newsgroups which are moderated, etc. Popular bloggers could do the Luna thing where you have pay them (using some cryptocurrency) to see your message.

Peoples could publish their rules for receiving messages, which would serve several purposes. First, clients can avoid routing messages which would be discarded anyway (a kind of distributed DoS-protection, as a replacement for Cloudflare). Second, your client software can usefully advice you (“sorry, because of spam rules this message cannot be sent to PopularBlogger. In order to unblock it, do one of (1) build up a posting history of n messages on X Forum, (2) have your message approved by a moderator in group Z, (3) get a friend-of-a-friend introduction from one of the following people, …”). And most importantly, you can performatively block Nazis and post really elaborate DNI lists.

Beheading and child abuse videos. I think this is a bigger problem than “if we’re not hosting any of the bad shit it’s not on us”, because if a social media system is truly censorship-resistant the government will not allow it to exist for long. Interestingly, this is goes against some of the other desiderata: you’d want it to not be anonymous, to make it easy for the police and/or online vigilantes to chase down criminals. And you might want content to not be discoverable. (E.g. if you use BitTorrent Mainline DHT you maybe interact with people who search for bad things, but since they only provide a SHA-1 hash you never know.)

re:

#I have none of the technical knowledge to assess this ideas merits#but it totally captures what i’d need from a network#…except the no deletes thing ig#that one might be rough.

so when i say ‘no deletes’ it’s a little complicated. since this post is getting a little traction, let me explain some of the technical stuff in more detail.

the way content-based-addressing works is, instead of linking to a place on the internet (a specific server), you use something called a hash that’s computed from the file itself.

for example, let’s say you have a picture you want to share over IPFS. this picture, say.

29a4ec3531ab61d7a8cb0cadbf776a65befa3cfc

If I have a copy of this picture, I can compute something called a hash function. The hash function is essentially a pseudo-random scrambling of the data, which is usually much smaller than the actual data. For example, the SHA-256 hash of luciano.webp here is, in hexadecimal,

31F7D77DE068047411F241209C3822F0AB6CB81DC508C6C9D83B64F38F99556A

This is only 256 bits, much smaller than the actual 106KB file.

The way hash functions work, it’s very very very unlikely for two files to have the exact same hash. You can’t ‘work backwards’ from the hash to the original picture, there’s not enough data in the hash, but if you know what the hash is, and someone sends you something they claim is luciano.webp, you can very easily verify that it’s (almost certainly) the picture you’re looking for.

So the idea of all this is that you start with the hash and track down someone who has the original file and get it from them.

The way BitTorrent originally worked is that there’s a computer called a ‘tracker’, which keeps track of everyone who has a copy of luciano.webp. You can say to the tracker ‘hey I’m looking for luciano.webp, who has it?’ and the tracker will send you a list, and then you can ask each of them for a copy. A torrent file is nothing more than a list of hashes and a list of trackers.

However, sometimes the tracker will go down. It’s a single point of failure. But there’s a way around this problem…

There’s a very clever bit of tech invented for BitTorrent called a Distributed Hash Table (DHT). This makes it so that every computer on the network can be a tracker. The hash itself is used as an address to look up the computers keeping track of who has luciano.webp.

So when you join the network, you will also become a tracker for certain files. You don’t know what those files are, since it’s all based on hashes. What’s great about this is that if a tracker goes down, another computer can sub in. The DHT gives a mechanism to determine who should be the trackers for each file.

IPFS, Interplanetary File System, is an attempt to use the same tech for websites. Basically, every time you download a file across IPFS, you hold onto a copy and let the trackers know that you have it, using the hash. If someone else comes looking for that file, you can serve it to them. When you download a file, you’ll find the nearest computer that has a copy and get it from them.

One nice thing about this is that if someone else posts luciano.webp on their blog, it’s already spread across the network, and so they can just download it from the nearest person.

Of course, you don’t have unlimited storage space, so sooner or later you run out. At this point, you “flush” the oldest files that nobody has asked for recently – delete them from your computer, and tell the trackers you don’t have them anymore.

So if it’s not being downloaded, data gradually gets deleted from the computers on the IPFS. If you want data to stick around, you have to keep a computer running with instructions to never delete that file (this is called “pinning”), or pay someone else to do the same.

So when we say ‘things can’t be edited or deleted’, it’s complicated. Once you publish a file onto IPFS, it’s hard to purge it from the network quickly. If you wanted to put up version 2 of a post, people can still look up version 1 using the hash of version 1. You can delete version 1 from your computer, and tell everyone the hash of version 2, but the version 1 will stick around. (There are ways around this in IPFS – see here, here – which let you direct people to the most up to date versions of a site/file.)

So to get rid of something after you’ve released it to IPFS, you have to have everyone get rid of it. But wait around long enough, and if nobody is pinning it, and nobody is still downloading it… that file will get deleted sooner or later.

Despite the very different underlying tech, functionally this is actually pretty similar to how Tumblr already works. Once something gets reblogged, it’s out of your hands, you can’t edit or delete the copy they have on their blog. (This is in contrast to a service like Twitter, where if you delete a Tweet, it also deletes the retweets. In programming jargon, Tumblr reblogs copy by value, while Twitter retweets copy by reference.)

On Tumblr, old posts tend to be very hard to find – people delete their blogs, or change URLs, and the search sucks. On IPFS on the other hand, old posts might be flushed off the network.

So it wouldn’t be much different in practice.


Tags:

#man‚ I am so disappointed by the shining potential of IPFS getting eaten by Generic Crypto Shit #I used to run a node #(but network effects meant that I never encountered a webpage that I both wanted to pin and could meaningfully pin) #(so few of them being IPFS-based) #I used to read their newsletter #(but I noticed one day that it had been so long since #the newsletter had mentioned progress or even *hopes* towards the beautiful‚ resilient meshnet whose seed I saw in them) #I don’t know‚ have they ever recovered their course? #…I also kind of want to check in on Scuttlebutt now and see how they’re doing #The Great Tumblr Apocalypse #IPFS #disappointed permanent resident of The Future #101 Uses for Infrastructureless Computers #this probably deserves some warning tag but I am not sure what #this post was queued because my to-reblog list is too long and I didn’t want to dump it on you all at once

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.