Apple Reportedly Storing over 8M Terabytes of iCloud Data on Google Servers

cadence- · on July 6, 2021

All the data is encrypted so it doesn’t matter where they are storing it. Using multiple service providers (Google, Amazon, their own servers, etc) might actually be beneficial because it eliminates one point of failure, especially if they replicate data in a smart way to take advantage of it. It might also speed up access to data.

suifbwish · on July 6, 2021

Unless you have encryption that is guaranteed to never be obsolete it ALWAYS matters where you store the data.

pmorici · on July 6, 2021

If your model is "even encryption isn't good enough" then you might as well not care because there is nothing you can do to protect it.

Dah00n · on July 6, 2021

That is a strawman. Apple could just not host at Google.

LadyCailin · on July 6, 2021

Is the assumption that google is going to store, then attempt to decrypt the data at some point in the future? Or is the claim that google's services are less secure than Apple's would be? If not, then I'm not sure that this is worse at all. Sure, if you believe google to be actually malicious, then this would be a problem, but I don't think there's a reasonable argument to say that google would go to this length of maliciousness.

Dah00n · on July 6, 2021

Well, I don't assume either really but I guess GP does. I find it unlikely to the extreme that Google would ever want to scan this content or even less so that they would go to lengths to decrypt it first. But saying that all is lost if the files can't be protected by encryption is a strawman as the data is unreachable by Google if hosted elsewhere.

lrvick · on July 6, 2021

Google -is- a malicious and exploitative surveillance capitalism company preying on people that don't feel like they have a choice.

Inch by inch "don't be evil" has been replaced with "maximize profit".

I have no doubt that if in 20 years decrypting "historical" Apple user data for "training purposes" is legal and will make Google leadership more money, they will pressure ethically flexible engineers to do it for them.

Assume anything profitable that is legally defensible somewhere in the world will be done by every surveillance capitalism company at some point.

LadyCailin · on July 7, 2021

Well, you may be right, but then if you are, I’m not sure you can justify leaving your data with Apple either, which is a public company that has the same underlying market incentives. In fact, if that’s what I believed, I doubt I would put any of my data anywhere, and rather just go live in the woods, or more likely, just give up and stop caring. I’m not convinced that Apple will be any more or less malicious than google is/was/will be when considering a 20 year time frame.

pmorici · on July 7, 2021

No it isn’t, wether Apple stores its data in a contract data center or one it owns doesn’t change the fact that if you don’t believe encryption is effective then it’s a lost cause to worry about the privacy of your data.

spookthesunset · on July 6, 2021

I mean if modern encryption proves broken, the world has quite a bit larger problems than open access to the videos you took of that 2017 summer holiday to Tahiti or the password vault containing your keys to log into HN and your Target account.

Like, suddenly everything becomes plaintext.

fouc · on July 6, 2021

Are you saying there is no encryption guaranteed to be good for the next 50 years ?

occamrazor · on July 6, 2021

Of course there is no guarantee: AES256 could be broken tomorrow, or maybe it has already been. What we know is that, extrapolating compute speed from the past decades and even assuming quantum computers become useable in practice, the best algorithms we currently have cannot be brute-forced within the next 50 years.

lxgr · on July 6, 2021

> AES256 could be broken tomorrow, or maybe it has already been.

This is extremely unlikely.

> What we know is that, extrapolating compute speed from the past decades and even assuming quantum computers become useable in practice, the best algorithms we currently have cannot be brute-forced within the next 50 years.

Quantum computers only offer a quadratic speedup against symmetric ciphers.

AES 256 will survive much longer than the next 50 years against brute force attacks.

It will either be broken spectacularly, using theoretical methods entirely inconceivable today, or live on – brute force is of no concern at all due to the amounts of energy and matter required to perform it against 256 bit keys.

1e-9 · on July 7, 2021

> This is extremely unlikely.

AES-256 was broken in 2011.[1] While only four times faster than brute force and thus not a practical attack, it suggests that compromise is possible. The Snowden documents indicated that the NSA was working on breaking AES-256. It seems unlikely they would waste effort on a task they considered impossible. Whatever they achieve will be achievable by others eventually.

On top of that, no implementation is perfect. Bugs are discovered in cryptographic APIs on a regular basis. Even if your API is perfect, the application calling the API can have bugs that allow compromise.

[1] https://web.archive.org/web/20120905154705/http://research.m...

gambiting · on July 6, 2021

>>AES 256 will survive much longer than the next 50 years against brute force attacks.

From what I understand it simply can't be broken by brute force because simply iterating through every possible value of a 256 bit key would require more energy than there is in the universe, and that's without actually trying any of the combinations, just simply having a computer do a i++ through all possible values.

I'm not sure if quantum computing helps here in any way , someone else would need to chime in here with details.

Unklejoe · on July 6, 2021

> iterating through every possible value of a 256 bit key

It's my understanding that when encryption gets "broken", it usually refers to something other than a simple brute force attack. Like, something that would make it so you don't need to run as many iterations or whatever.

I assume this because a brute force attack is something that is always possible from day 1, whereas an encryption scheme being broken is something that happens some time afterwards.

curryst · on July 6, 2021

My understanding is that encryption is "broken" any time it becomes feasible for someone to decrypt your data without the key. Brute force attacks are always hypothetically possible, but the encryption isn't broken unless such an attack is feasible.

As a counter-example, DES would count as "unbroken" under your definition. The EFF built a machine in 1998 for under $250,000 that could crack a DES key in under 24 hours. I don't know what that would cost today, but I wouldn't be surprised if a couple GPUs could get you the same thing today.

The difference is whether such an attack has even a vanishing chance of succeeding. For AES, the hardware just isn't anywhere close to that. Afaik, there isn't anything that could even hypothetically threaten to make brute force attacks on AES feasible on the table today.

SahAssar · on July 8, 2021

I think you're mixing "weak" and "broken". Out of interest I looked at the Known attacks section of wikipedias AES article and it says as the first sentence "For cryptographers, a cryptographic "break" is anything faster than a brute-force attack".

DES is both weak and broken, but it could be either without the other.

CodesInChaos · on July 6, 2021

> I'm not sure if quantum computing helps here in any way

Theoretically a quantum computer can brute-force AES-256 using 2^128 sequential steps using Grover's algorithm (i.e. a quadratic advantage over a classical computer). Parallelization diminishes the advantage, e.g. if you're limited to 2^64 sequential steps, you get a 2^64 speedup over classical, for a cost of 2^192 which is still ridiculously large.

Thus quantum computing is not a relevant threat for AES-256 or most other 256-bit symmetric crypto.

brutus1213 · on July 6, 2021

Against a state-actor, perhaps not. Check out quantum cryptography. It is a serious business.

pseudo0 · on July 6, 2021

The current academic consensus is that AES-256 is quantum resistant, as even with Grover's algorithm the attack would still require 2^128 bits of work. Generally symmetric encryption schemes are impacted to a much lesser extent than public key cryptosystems, with larger key sizes sufficing to ensure quantum resistance. I'd be quite surprised if Apple weren't using AES-256 already.

eurasiantiger · on July 6, 2021

They are using AES-128.

https://support.apple.com/en-us/HT202303

pseudo0 · on July 6, 2021

A minimum of AES-128, which could mean 100% AES-128 or 1% AES-128 from old iOS devices that haven't been updated in years. And even AES-128 would require doing a 2^64 exhaust on a quantum computer, which would be quite a remarkable feat.

eurasiantiger · on July 11, 2021

More likely it’s all AES-128 since that is natively supported by energy-efficient TPM chips, while -256 isn’t.

CodesInChaos · on July 6, 2021

Quantum cryptography is much less convenient than conventional asymmetric cryptography and offers few benefits. Doesn't stop people from trying to sell it though...

And for the "encrypted data at rest" scenario we're talking about here, where symmetric encryption suffices, quantum crypto makes no sense what-so-ever.

PeterWhittaker · on July 7, 2021

> Unless you have encryption that is guaranteed to never be obsolete it ALWAYS matters where you store the data.

Well, while there is truth to that, it isn't the whole story. There is a time value to all information that must be factored in. If nothing else, think of one overbroad* classification example: Battle plans, SECRET; Intelligence; TOP SECRET.

* By which I mean there are subtleties and exceptions too numerous to go into here, but the example remains largely illustrative.

The higher the classification, the higher the long term time value, the greater robustness required in one's controls.

In this case, information about/for large numbers of private individuals, today's strong enough symmetric encryption may be strong enough for quite some time.

Or it might not be. I'd love to see a detailed risk assessment....

1e-9 · on July 7, 2021

Not just that, but also a guarantee of no compromising bugs in either the cryptographic API or the application code that calls it.

Yes. If you want your data protected, it always matters where you store it.

peter422 · on July 6, 2021

It does seem likely that Google will pay to store every bit of encrypted customer data that is currently stored forever because potentially decades from now it could be cracked and they could access all that amazing ancient private information.

helsinkiandrew · on July 6, 2021

It still puzzles me that people just can't understand the the huge chasm between storing and using peoples data from their own products and stealing data from other companies thats stored on your systems in the hope that one day it can be decrypted and used to improve ads.

Google is a huge company, the idea that it would set out to do something that everyone involved in would know is directly breaking the law (rather than doing something thats a grey area, or they know is legal but 'unethical' or could become a PR issue, or destroy trust in them and destroy their product) is fairly unbelievable.

This comes up again and again with stories about Google and (particularly) AWS cloud computing. I hope for better on HN!

fshbbdssbbgdd · on July 6, 2021

I mean, there’s these posts on Hacker News claiming to be from former AWS employees saying they stole customer data to launch competing services: https://news.ycombinator.com/item?id=23929959

There’s also mainstream reporting that Amazon employees used retail sales data to launch competing products: https://www.wsj.com/articles/amazon-scooped-up-data-from-its...

Maybe the first story is fake and the second is real… but they both point to a “win at all costs” company culture where policies might be violated, even if it threatens trust in the platform and a PR problem when exposed.

helsinkiandrew · on July 6, 2021

But those cases are very different too (I’m not trying to defend what they did). Talking to a startup then copying their product is douchy but not illegal. Using Amazon.com sales data is like looking at the public “top products in electronics” and selling your own version of the top sellers - I imagine all large retailers do something similar when deciding what products to make own versions of.

Hacking computers, networks, or services of your competitors, even when running in you data center is just bad business.

paranoidrobot · on July 6, 2021

> from former AWS employees saying they stole customer data to launch competing services

You're reading that a lot differently than I am.

The quoted post says:

>AWS proactively looked at traction of products hosted on its platform, built competing products, and then scraped & targeted customer list of those hosted products

None of that reads, to me, as them having had to use confidential data to do any of these things.

You can identify many organisations that are running on AWS without knowing anything about AWS accounts - blog posts, IP space, public code, social media comments from staff, linkedin and all sorts of other places will often reveal that.

Scraping/finding customer lists can be done using research, too. I've spoken to Account Manager-types at places and they've often used various tools that scrape other public resources to identify customers of competing services.

Swap out company names, and it's effectively what I've seen from a bunch of companies, without it delving into anything unethical.

hulitu · on July 6, 2021

> It still puzzles me that people just can't understand the the huge chasm between storing and using peoples data from their own products and stealing data from other companies thats stored on your systems in the hope that one day it can be decrypted and used to improve ads.

Data is still data.

> Google is a huge company, the idea that it would set out to do something that everyone involved in would know is directly breaking the law (rather than doing something thats a grey area, or they know is legal but 'unethical' or could become a PR issue, or destroy trust in them and destroy their product) is fairly unbelievable.

They do this in Europe by not complying with GDPR. ( and they are not the only one)

> This comes up again and again with stories about Google and (particularly) AWS cloud computing. I hope for better on HN!

exitheone · on July 6, 2021

This sentiment seems so weird to me. Can you point to a single enterprise offering where Google is collecting data and using it themselves? Or where it would even make sense to do so? Corporate customers pay a lot more per byte than the fleeting value of private data so Google has a strong incentive to never ever touch that data.

SilverRed · on July 6, 2021

Peter's comment comes off as satire to me. A joke about how absurd it would be for google to store 8 exabytes of data for decades because it might one day be useful.

usrusr · on July 6, 2021

Because it might one day be useful for.. targeting ads? I can almost see the abuse potential, the ads "I know what you did the summer of 1997" will turn us all into mind controlled consumer drones. (those of us who are old enough)

hulitu · on July 6, 2021

For them, maybe not. But maybe you heard that there are some agencies which are interested.

mda · on July 6, 2021

I think it was a satirical comment.

spookthesunset · on July 6, 2021

When it comes to posts about Google, Poe’s law operates in full force.

Closi · on July 6, 2021

I don’t agree with the original point, but there is no doubt lots of telemetry in the cloud.

nl · on July 6, 2021

Telemetry (ie usage data) is very very different the data being stored itself.

remus · on July 6, 2021

I don't think that is really the point that was being made. As you say the practical chances of google storing this data for x years, then committing corporate suicide by decrypting it is are miniscule (and presumably apple agrees).

The point is that there are layers of security, and by moving the data outside of their physical control apple has given up one of those layers of security.

VortexDream · on July 6, 2021

I'm not sure why people are are thinking in such a limited fashion. Google is effectively holding onto the data for the US government or whoever else has the capability to access all that data in the future. This is the kind of stuff authoritarian governments dream about. And I'm not saying Google is doing it intentionally, just that they're holding onto the data at all and at some point there will be somebody who will make use of that data.

remus · on July 6, 2021

If that is the argument, does it matter whether it is Apple or Google who is holding the physical data? Apple and Google are both based in the US and beholden to the US government, so from the government's perspective it's just a change of address when they send out a warrant.

VortexDream · on July 6, 2021

It doesn't. I'm not sure why you think I think there's a difference. Any corporation that stores our data long-term is a threat to our privacy and our rights.

remus · on July 6, 2021

Oh cool, thanks for clarifying! In the context of the article, I thought you were saying one was better than the other.

paganel · on July 6, 2021

> does it matter whether it is Apple or Google who is holding the physical data?

It might matter, yes. A company run by a guy like Eric Schmidt is a lot more likely to play nice with the US government when it comes to privacy compared to a company run by a guy like Cook, who from the outside seems obsessed with user-privacy (as long as China isn't directly involved).

ahmedfromtunis · on July 6, 2021

Didn't Schmidt retire as CEO 10 years ago (and 3 years ago as chairman)? Isn't now a Pichai-run company?

paganel · on July 6, 2021

Of course, just wanted to say that not two big US companies are the same, it highly depends on who leads them. The powers that be that decided that a guy like Schmidt was fit to run a company like Google could do that again.

spookthesunset · on July 6, 2021

I mean google is full of employees who threatened to walk out when their employer wanted government defense contracts. Pretty sure google would get a ton of internal pushback if a team wanted to do what you describe.

Google employs the same standard issue tech person you see here on HN.

erhk · on July 6, 2021

Google spends way too much money building consumer and corporate trust to attempt to cash it in for data they already have the majority of.

mackrevinack · on July 6, 2021

its it really going to be 'amazing' if the information is a decade or more old?

swiley · on July 6, 2021

>All the data is encrypted

Where are the keys though? If they ever end up on the same platform it doesn't matter.

brianwawok · on July 6, 2021

On your physical device presumably

1vuio0pswjnm7 · on July 6, 2021

What if the user encrypted and stored the data on their own storage media. It might actually be beneficial because it eliminates dependence on a third party, i.e., a point of failure. "It might also speed up access to data."

adt2bt · on July 6, 2021

It’s a bit of an open secret among big tech that Apple is a huge user of storage on all major clouds. They manage encryption on their hardware and in return get cheap storage, higher availability and competition from their suppliers. Highly doubt Google has access to any iCloud data other than an estimate of the immense size of the service.

fulafel · on July 6, 2021

> Highly doubt Google has access to any iCloud data other than an estimate of the immense size of the service.

They do of course have access to the ciphertext and access to traffic patterns, in crypto threat models traffic analysis tells the adversary a wealth of information.

deeter72 · on July 6, 2021

Why would Google even attempt to spy on those patterns? it serves no purpose and how do you justify employee time spent for such useless things? Not to mention apple proxies all requests through their servers rendering such analysis utterly useless to begin with.

tg180 · on July 6, 2021

On one occasion the traffic to google cloud is systematically not proxied: every time one sends an attachment in iMessage, the file (or the media) is encrypted on device and sent to gcs-{eu,us,asia}-00002(?).content-storage-upload.googleapis.com, received from gcs-{eu,us,asia}-00002(?).content-storage-download.googleapis.com

This should be pretty visible to Google, the rest of the traffic is handled better.

spookthesunset · on July 6, 2021

How do both parties determine the keys used during a conversation?

Are they making heavy use of public key cryptography? If so how? When I send a message to you, do I encrypt it using your public key? What about group messages? Does each conversation get its own key pair?

Also it’s interesting they decided to directly hit up google cloud… you’d think they would wrap it so at minimum they could tweak the underlying infrastructure without requiring every client to update.

tg180 · on July 6, 2021

> How do both parties determine the keys used during a conversation?

They don’t: public key cryptography is not initially used.

The sender generates a random AES-256 key, applies it in CTR mode and uploads the encrypted blob to GCS.

Every receiving device gets a message with the key, the URI, and the SHA-1 of the blob. These messages are encrypted as usual and sent via APNS (<n>-courier.push.apple.com:5223)

> you’d think they would wrap it so at minimum they could tweak the underlying infrastructure without requiring every client to update

Apple does this: two other endpoints are *.blobstore.apple.com and the Chinese Guizhou-Cloud Big Data.

In my logs blobstore is used less than 1% of the time.

zaphirplane · on July 6, 2021

So the bucket is open to the world to write

tg180 · on July 6, 2021

No? https://cloud.google.com/storage/docs/access-control

It just means Google may provide access to metadata outside of Apple’s control. Those metadata could be useful to do classification of anomalies on the basis of pattern of life analysis, or similar.

swiley · on July 6, 2021

>Why would Google even attempt to spy

What?

bubersson · on July 6, 2021

You underestimate the amount of Google employees spending time on useless things.

p0la · on July 6, 2021

Very true when each user is accessing its own encrypted data directly. But from what I read here in the comments, Apple is managing encryption on their own HW, which almost surely means that data is read and wrote from Apple’s machines. Such aggregation of read and write calls across users makes access traffic patterns analyses risk fairly minimal...

pjerem · on July 6, 2021

« Their own HW » is in fact « their users devices »

nuker · on July 6, 2021

Nope, its Apple servers. User devices don’t connect directly to Google

qeternity · on July 6, 2021

This is not entirely true. Additionally Apple proxies traffic to GCP but key management still resides on user devices.

NorwegianDude · on July 6, 2021

If backup is enabled(which I guess it is as things are backed up), then the key is also shared with apple.

spookthesunset · on July 6, 2021

Wonder if they delegate the keystore to third party cloud services or that is one of those things they store in-house?

nuker · on July 6, 2021

> This is not entirely true

You mean user devices may connect directly to Google storage? Did you observe it connecting to IPs in Google owned ASNs?

rapsey · on July 6, 2021

> traffic analysis tell the adversary a wealth of information.

When it comes to communication. iCloud is file storage. Data at rest encryption.

fulafel · on July 6, 2021

IO requests are communication.

rapsey · on July 6, 2021

This kind has very little useful information.

fulafel · on July 6, 2021

Exact byte sizes of files in sets of files in a io pattern, combined with timestamps, with many occurrences (=chances at correlation), are very distinguishing.

joshuamorton · on July 6, 2021

Yeah, but the design you're assuming isn't the only way to build a storage system, and other approaches are both more secure and more performant!

tinus_hn · on July 6, 2021

If Apple is anywhere near competent the traffic patterns are useless, all Google can see is ‘data stored on that day was accessed on another day’. All traffic flows through Apple servers and Google has no way of even correlating it to a user.

nuker · on July 6, 2021

> and access to traffic patterns

I would think Apple is smart enough to mix storage blobs, so one blob is not one user. Plus all requests come from Apple datacenters, not user devices.

rcarmo · on July 6, 2021

Actually, no. Or at least that wasn’t the case when I first started using Little Snitch five or so years ago.

nuker · on July 6, 2021

I’ll def check it for myself now, if you’re right thats big blunder on Apple

manigandham · on July 6, 2021

Like what? What's a practical vulnerability or leak?

fulafel · on July 6, 2021

Let's say Eve was working at Google with access to the ciphertext and io traffic, and was a jealous lover to a iPhone user Alice. They could be interested in profiling IM app usage, to know about size profile of content (pictures?) were stored at what times to know about Alice's activity profile when out of sight. They could also be interested in fingerprinting apps used so they could be matched to for example different dating apps.

It's not hard to imagine something business or government related in place of this of course. And do this analysis in aggregate and follow many people at once.

Apple could and for all we know possibly has implemented countermeasures for many of these cases, eg to make it hard to distinquish users from the mass of ciphertext.

manigandham · on July 6, 2021

How are you going to figure out a specific user's content size (of a specific app or content type) from a sea of encrypted exabytes?

What about this is practical?

fulafel · on July 6, 2021

I'm just sketching something, I'm not claiming to have developed a vulnerability research result here from my arm chair. But there's a lot of terrain between "it's encrypted so there are no attack" and "you haven't presented a fully developed attack" especially about a system we don't know much about.

Anyway, there are many scenarios that come to mind for knowing their IO sizes. Apps probably have IO fingerprints. Or you could send the set of suspected users differently sized files to probe them. Etc.

manigandham · on July 6, 2021

Basic batching or partitioning of data before offloading to storage would defeat this analysis, and that’s assuming you can access and query so much data regularly.

I asked for practical examples. I don’t need an in-depth report to see that this doesn’t qualify.

Most of this crypto stuff is completely impractical risk, especially compared to some phishing emails.

fulafel · on July 6, 2021

In security engineering we have the standard the other way around - the proponents of a design have the burden of responsibility to argue the system is secure beyond reasonable doubt, and address and elucidate any potential cracks in the security design. The subthread was about potential pitfalls even in presence of encryption.

zepto · on July 6, 2021

The traffic patterns they have access to are apple’s backend usage, not users.

gentleman11 · on July 6, 2021

I wonder if google can read the file names. Implementation dependent

simondotau · on July 6, 2021

Calling it a secret is an exaggeration. It's more like an obscure topic on Wikipedia: it's publicly available to anyone who cared to ask the question.

Stratoscope · on July 6, 2021

That's what "an open secret" means: something that may be thought to be secret but really isn't.

bravetraveler · on July 6, 2021

I guess 8 exabytes doesn't get the same number of clicks

Edit: Not really trying to be facetious, but come on. Apple using a lot of storage at a lot of places isn't too surprising

jobigoud · on July 6, 2021

Terabyte might be the largest unit people can relate to at a personal level.

daxfohl · on July 6, 2021

Ironically, yes: I missed the M in the title, first thought was "isn't that a single hard drive?" and read the article to see why anyone cared.

If it was exabytes I doubt I would have looked.

purplecats · on July 6, 2021

same here. seems misleading to use a not-as-appropriate unit. I navigated away and I was like.. wait 8TB.. I have more in my closet.

hinkley · on July 6, 2021

I’m only here because I don’t understand why you would say 8 million terabytes instead of 8 exabytes.

whoknew1122 · on July 6, 2021

Because terabytes is a good frame of reference for most computer users. I get paid well to work in the computer space and if I saw 8 exabytes, I'd have to consult a chart/list to just determine how much that is.

Most people don't know what an exabyte is. That's why you frame the amount of data in an amount of data (terabyte) that people know.

pmontra · on July 6, 2021

Same reason for we say 1000 km instead of 1 Mm, or Sun to Earth 150 M km instead of 150 Gm. Some units are usual in some contexts, some are not. In one word, habits.

jobigoud · on July 6, 2021

I've been trying to spread the use of megametre as a common distance unit. I think it's very appropriate to talk about distances between large cities.

bootlooped · on July 6, 2021

Not to mention the most unloved metric prefixes, deci and Deca. When have you ever heard someone use those?

mrob · on July 6, 2021

Those prefixes are bad because they break the 10^(3n) pattern. centi and hecto are also bad for the same reason. Most calculators have an "engineering" mode which only uses the good prefixes, because it's easier to think about prefixes when the ratios between them are consistent.

pmontra · on July 6, 2021

Centimeters are pretty common. Rulers are marked with cm and mm. We use mostly m and cm inside houses. A door width? 90 cm. A door height 2 meters and 10.

And don't forget cubic centimeters (or cc) for engines. Anything from 2000 cc down to 50 cc is very common east of the Atlantic Ocean.

karatinversion · on July 6, 2021

Decilitre is pretty common in recipes in my neck of the woods

cghendrix · on July 6, 2021

Decathlon?

hinkley · on July 6, 2021

Them why is t my hard drive four million kilobytes? We already crossed this bridge with gigabytes, and the course was laid in with terabytes, which I first heard in a business setting over 20 years ago, when that was still a rack of hard disks. Cloud providers and the LHC brought us petabytes, and that was over ten years ago. The “oh by the way the next unit is ‘exabyte’” conversations started around the same time.

kryptozinc · on July 6, 2021

because people on HN like to get pedantic about stuff that is not really relevant to the discussion topic.

sellyme · on July 6, 2021

> Same reason for we say 1000 km instead of 1 Mm

Speak for yourself, I've been trying to popularise "1Mm" for years. Admittedly not with a huge amount of success.

> or Sun to Earth 150 M km instead of 150 Gm

Surely "1 AU" is the preferred form for this one.

hanche · on July 6, 2021

I’ve been using Mm too, but only when I have a reasonable expectation that it will be understood. IOW, rarely.

But also, I still have the Earth–Moon distance internalized as 385000 km, not 385 Mm, since that is what I learned as a kid.

But stating the Sun–Earth distance as 1 AU is just a tautology, restating the definition, and hence void of information. I suspect you’re jesting. ;-)

sellyme · on July 6, 2021

> I’ve been using Mm too, but only when I have a reasonable expectation that it will be understood. IOW, rarely.

Yeah, I mainly only use it in chats (where I can immediately explain it if needed) or in person (where the actual pronunciation of "megametres" makes it pretty obvious). "Mm" and "Gm" being somewhat difficult to google due to "mm" and "gm" being different SI units makes it untenable to just expect people to work it out on their own.

> But stating the Sun–Earth distance as 1 AU is just a tautology, restating the definition, and hence void of information. I suspect you’re jesting. ;-)

To be clear, I don't mean that you'd answer "1 AU" if someone asked you how far away the sun was - obviously that's completely worthless information! The discussion was just on what units and prefixes are used for various magnitudes of measurement, and the AU is a fantastic unit to use for things that would otherwise be measured in hundreds of gigametres. For example, if you had a table containing the distances between various celestial bodies, it would be very convenient to have "Sun <-> Earth: 1 AU" in there along with things like "Sun <-> Mars: 1.5 AU".

snypher · on July 6, 2021

If someone was to ask me how far is was from sun to earth, and I answered 1AU, and they asked what an AU was?

The point would be to put it in a familiar unit, even though '150 million' isn't easy to visualize.

sellyme · on July 6, 2021

> If someone was to ask me how far is was from sun to earth, and I answered 1AU, and they asked what an AU was?

You'd probably do the same thing that you would do if they asked what a km was: keep changing the units until you get one that they recognise, and then now they can understand the original unit you used.

(Although in this specific case given that they're asking about the definition of the unit you'd probably provide that context without needing further prompting in the first place, similar to if they asked about the circumference of the earth and you gave an answer in metres)

purplecats · on July 6, 2021

pretty sure it was a joke

sellyme · on July 6, 2021

Not a joke, the AU is a pretty common unit of measurement when talking about distances too large for metres and too small for light-years, especially since it's based on one of the most widely understood astronomical distances. Of course using it to describe the distance it's derived from is more than a little tautological, but that wasn't too important in the context of discussing magnitude prefixes.

It's roughly the same thing as describing a distant potentially-habitable planet as being "1.5 times the size of Earth" instead of giving its radius in metres. It's just that AUs wrap that "n times the distance between the sun and the earth" into a nice standardised unit.

purplecats · on July 7, 2021

...thus making it tautological and extremely pertinent given the context

kergonath · on July 6, 2021

> Surely "1 AU" is the preferred form for this one.

Everybody knows it’s 4.848×10^12 attoprsecs.

SilasX · on July 6, 2021

People use "terabytes" a lot (on the order of the size of a desktop hard drive), they don't use petabytes or exabytes.

SilverRed · on July 6, 2021

Petabytes is a common unit for company scale data but I guess people reading random tech news aren't familiar with it yet.

It's also way beyond what we use in other units. Gigalitre is the biggest SI prefix I have seen outside of bytes.

hanche · on July 6, 2021

Petagrams (Pg) is commonly used in climate science. I always wondered why they don’t write it as Gt instead.

But gigalitre? Surely, calling them cubic kilometres (edit: should be hm³, see followup comments) would be preferable?

azalemeth · on July 6, 2021

One of my favourite SI units that is used in climate science is the Sverdrup, often used to describe oceanic or river flows, which is a million cubic metres per second, or one cubic hectometer per second. (1)

(1) https://en.m.wikipedia.org/wiki/Sverdrup

bluenose69 · on July 6, 2021

The Sverdrup unit is tolerated because Harald Sverdrup made many key contributions to the science of Oceanography.

He was also a high-level administrator who had a super-power that few administrators have: he understood science as well as people.

And, to top it all off, the book he coauthored with Johnson and Fleming is widely regarded as the foundational document in the field of modern Oceanography.

There is so much to say about Harald Sverdrup that students quickly become comfortable with the occasional use of "Sv" as an abbreviation for 10^6m^3/s.

jhgb · on July 6, 2021

> symbol: Sv

Oh my... https://en.wikipedia.org/wiki/Sievert

hanche · on July 6, 2021

Non-SI according to Wikipedia, but yeah …

nlclimber · on July 6, 2021

No, a gigalitre equals a cubic hectometer.

hanche · on July 6, 2021

Oops, you’re right. I think I accidentally squared 1000 instead of cubing it when converting between km³ and m³. Which only goes to show why units can be hard. In the interest of using coherent units, I wish there were a named unit equalling one cubic metre, all the better for attaching prefixes to.

Robotbeat · on July 6, 2021

I’ve seen Terawatt and Petawatt.

mdoms · on July 6, 2021

Because an average reader knows what a terabyte because it's a common hard drive size. The typical reader doesn't know how to think about exabytes.

serf · on July 6, 2021

people deal with social media images and text more often than buying hard drives -- and that's (generally) handled in kilobytes.

my take : it's because numbers with lots of digits sell clicks easier -- up until scientific notation is needed, and then at that point the general readership can't fathom the number and generally doesn't care.

in other words, I bet 8,000,000 terabytes produces more clicks than '8 exabytes' and '6.4 x 10^19 bits' would have.

If I had been born a new-age journalist i'd have gone with 8,000,000,000,000 megabytes -- but only if that fit in the link/URL slug and headline for maximum click-bait exposure.

cratermoon · on July 6, 2021

How about "8 station wagons full of tapes"?

nlclimber · on July 6, 2021

Ireland covered in 3.5’’ HD floppy disks.

SilverRed · on July 6, 2021

Are petabytes a unit people are familiar yet? 8000 petabytes might be more readable.

omegalulw · on July 6, 2021

8M terabytes isn't a big number tho. 1TB/person is fairly common these days and Apple has like 900M+ users.

988747 · on July 6, 2021

Is 1TB/person really that common? I mean, my iPhone has 64GB of storage, mostly used for storing photos, which I might want to backup. Other than that, what else do I put on iCloud?

deeter72 · on July 6, 2021

I would say the family 200GB plan is the most common and also I have to say still its way less paying people than I thought it would be.

slumdev · on July 6, 2021

It's over Eight QUADRILLION kilobytes!

habosa · on July 6, 2021

I can't tell why this is on the front page of HN. Is the implication that Google would spy on Apple customer's data stored on Google Cloud? Google is evil/dumb/both at times but that's completely outlandish.

I do find it interesting that Apple could be one of Google's biggest Cloud clients though, that's very surprising.

jtdev · on July 6, 2021

Apple has built significant brand trust and brand loyalty on the premise that Apple’s business model is built around devices and services and that your data is not something that Apple is not interested in exploiting/monetizing; Google has a reputation for an antithetical business model in terms of user data.

zepto · on July 6, 2021

These two things are totally unconnected.

It’s literally innuendo and conspiracy thinking to suggest they are.

user3939382 · on July 6, 2021

Google and Amazon recoup their investment in processes, tools, and cultivating the institutional knowledge to run massive data center infrastructure by marketing cloud services. Maybe Apple calculated that they would need to do the same to “level up” their data center infra but decided that doesn’t align with their business strategy.

karmasimida · on July 6, 2021

How much would you pay to store such amount data?

8TB hard drive is about 200 dollars nowadays, so this amount of data is about 200m dollars worthy of retail hard drives.

Ofc the calculation is super inaccurate, which doesn't take redundancies into consideration, and discounted prices for someone like Google to purchase hardware, plus the discount for Apple as a big customer. But had the scale be comparable, in which case, Apple is paying Google billions per year to handle the data storage for itself, doesn't sound quite news worthy IMO, pretty price efficient even.

Edit: Previously stated 1.6B -> 200m

DenseComet · on July 6, 2021

The article claims that Apple is on track to spend around $300 million on GCP storage services this year. With redundancies, server costs, and DC costs, the upfront price could easily touch 3 to 6 billion, or 10 to 20 years of GCP storage costs.

mlyle · on July 6, 2021

> With redundancies, server costs, and DC costs, the upfront price could easily touch 3 to 6 billion, or 10 to 20 years of GCP storage costs.

Surely it didn't cost GOOG $6B upfront worth of infrastructure to get $300M of annual revenue from Apple. Yes, there's economies of scale, but I think you overstate the case.

jfrunyon · on July 6, 2021

Google already has a lot of the infrastructure and labor needed.

discodave · on July 6, 2021

Edit: I did this calculation for the Dropbox migration out of S3 in this blog post: http://blog.drgriffin.com.au/posts/2020-06-21-the-three-fs-o...

Short answer: The cost of just the hard drives to replace an $89MM S3 storage bill was about $45 million dollars. That's not including bandwidth, racks, datacenter space, administration etc etc.

KaiserPro · on July 6, 2021

The secret sauce here is the opportunity cost of running one's own storage at that scale.

At that scale its less about sourcing the hardrives (yes its a problem, but now youre a big customer so you can ask for specialised things, and a 50% discount)

The big problem is the hosting, finding and building a number of datacentres local to where your customers are. Powering them, and connecting in a reliable way.

But even more, you need to make a storage interface that doesn't suck. I am currently working with a homegrown S3 "like" interface. It is a massive pain in the arse as it doesn't scale[1], has an entirely new set of words to describe standard things (trying to overwrite a file? it doesn't tell you that, it just says "predicate failed")

[1] large files transfer very fast, faster than s3. However all the tools are written single threaded. Add to this that each operation takes at least .7-1.5 seconds, shit gets slow very quickly. There is little to no documentation, the API is odd.

in short, much as it annoys me to say this: for general purpose storage, buy over build.

spookthesunset · on July 6, 2021

Is that stuff even stored on spinning rust these days? I mean solid state might be more expensive upfront but I imagine at scale the energy consumption required to power and cool millions of mechanical devices becomes a big issue.

Not to mention the random access performance of spiny disks ain’t too great either.

discodave · on July 6, 2021

Yes, spinning rust is still used for low cost storage. You can look up Dropbox Magic Pocket and I think some stuff that AWS has made public about S3.

selcuka · on July 6, 2021

Shouldn't that be $200M?

panarky · on July 6, 2021

Plus bandwidth

jhgb · on July 6, 2021

And locations, surely? To minimize latencies.

gruez · on July 6, 2021

>8TB hard drive is about 200 dollars nowadays, so this amount of data is about 1.6B dollars worthy of retail hard drives.

External hard drives can be obtained on sale for $15-16/TB. They can then be shucked to get internal drives. I suspect bulk buyers can get pricing equal or cheaper than this, rather than the retail of $25/TB

wincy · on July 6, 2021

Why are external drives so much cheaper anyway?

Edit: I googled it an apparently it’s an economy of scale thing. People are more likely to buy external hard drives, much more mass market thing, Costco and Walmart sell externals, therefore downward pressure on prices compared to internal drives.

ip26 · on July 6, 2021

External drives are also spec'd for lower performance and longevity.

gruez · on July 6, 2021

Do external drives even have performance/longevity specs? The specs I see on them are pretty basic (eg. capacity, interface, operating temperature), and considering that they're marketed to consumers that's not surprising. As for actual performance/longevity, my understanding is that they're just whatever drives the manufacturer has available. For low capacities they're going to be consumer drives and for higher capacities they'll usually be NAS/datacenter drives.

ip26 · on July 6, 2021

whatever drives the manufacturer has available

Exactly, that’s a spec :)

spookthesunset · on July 6, 2021

At the scale these data centers operate at, I would imagine that google / FB / AWS gets these storage devices custom built to whatever spec and tolerances they need.

hinkley · on July 6, 2021

They are probably paying wholesale prices for that quantity. In many industries that’s half the retail price.

hintymad · on July 6, 2021

Wow! It's amazing that with so much data yet Apple still finds it cheaper to store on public clouds instead of building its own storage.

iJohnDoe · on July 6, 2021

Apple finally realized they weren’t very good at hosting stuff. They never were.

Plus, it’s a no brainer leveraging Google and AWS. Their global footprint and expertise alone is worth it. Also, $300 million a year is a drop in the bucket to Apple’s bottom line. They probably make that alone from the millions of adapters they sell each year.

jbverschoor · on July 6, 2021

It's not just $300m. It's $300 for storage. I'm sure they spend more on the GCP platform.

They also use AWS, which is over $350m a year. That contract ends in a year or two.

Afaik, they don't run on azure anymore.

It's not hard to believe they're spending $1B+ on external cloud partners.

At $57B net revenue ($274B), a high margin space such as the cloud services could make sense. Although if they're not selling it to the public it would only account for 1-2% net revenue.

Apple likes to learn from companies.. Soon there will be an Apple cloud using low energy ARM (M1X) servers. Let's call it the "Digital Apple Tree". Maybe a small reference to the DAT recorder, as Apple likes music. Of course this last part is just made up ;-)

ksec · on July 6, 2021

>Afaik, they don't run on azure anymore.

why?

jbverschoor · on July 6, 2021

could only find old articles mentioning it..

joezydeco · on July 6, 2021

Everyone is acting like that $300MM expense is being paid by Apple.

I'm paying Apple $120/year for 2TB of iCloud storage. I might not be doing the math right but at $300 million/year for 8 million TB, that's $75/year in google expense for the $120 I'm paying Apple.

jsnell · on July 6, 2021

The average user with 2TB of quota probably isn't using anywhere near that much. It is more likely to be closer to the previous smallest level of quota.

jpalomaki · on July 6, 2021

Or the business is simply growing faster than Apple can (conveniently) expand their own storage capacity.

There’s long lead times for data center level expansion.

Dah00n · on July 6, 2021

I'm amazed any HN readers could believe Google would want to know what is stored if they could actually read it. That is completely useless to them. What is interesting and useful is who is connecting from where, when, with what. Yet again it feels like Snowden is forgotten. Metadata is what matters in high volume data, not content.

jtdev · on July 6, 2021

So why does Google engage in automated reading of Gmail email contents?

Dah00n · on July 6, 2021

As far as I know they don't but for arguments sake let's say that they do. A lot of smart people have worked on Gmail so any scanning of mails would be for useful metadata or if they scan all the text it would be to provide services from context (find something with a date, ask if it should put it in your calendar, etc.). Almost all emails are worthless information and I doubt even the NSA with 100 times the budget it has today could learn anything worthwhile from scanning everything even if everything is only "everything at gmail". At best they would end up with data worse than they get from metadata and more likely the need for thousands of people looking at mails that match wordlists (hate USA, bombmaking, presidents route...) and not learn anything useful at all.

harringtonjones · on July 6, 2021

they haven't done that (allegedly) in years right

jtdev · on July 10, 2021

Right…?

alister · on July 6, 2021

It would be interesting to know how that data breaks down -- like photos vs videos vs software backups vs saved movies, etc. I assume that most people stream movies, TV shows, and music these days, and that few people save them to disk. And even for those that do, Apple could deduplicate (i.e., store exactly one copy of Men in Black even if 100,000 people saved it). Software installs are easily deduplicated as well.

So that leaves personal photos and videos. Apple has about 1 billion active users, so a back of the napkin calculation suggests that each user is storing 8GB of data. That actually sounds quite low. That's 1600 5MB photos without even talking about the crazy storage that personal videos would use.

The article talks about how much Apple stores on Google's servers, but doesn't say how much they store on their own servers or with other third parties. Maybe they have multiple times as much storage on their own servers.

ocdtrekkie · on July 6, 2021

iMessage is probably secretly insane: A lot of people's iClouds are stuffed with it, so it probably has random not-worth-saving content going back several years that people don't know how to clean out.

wdb · on July 6, 2021

A bit fascinated how 8M terabyte looks like in a data centre How much physical space is needed for that?

988747 · on July 6, 2021

1 million 8TB hard drives. you can probably put at least 100 of them in the single rack, which means 10.000 racks, put in 100x100 grid that means a single 300mx300m factory hall. Treat that as an upper bound, probably it is possible to package the drives more tightly.

aaronax · on July 6, 2021

You can do 60-90 HDDs in 4U, so conservatively using 36U of a 42U rack that is 540-810 HDDs per rack. So it could be more like 1600 racks.

tiffanyh · on July 6, 2021

That equates to $1/month for 320 GB of storage.

Seems like a bargain, even at that scale.

Note: Apple charges $1/month for 50 GB of storage. That's ~84% margin Apple is making on their cloud storage service. Though granted they do give everyone 5 GB of storage for free, so the math will be lower (and they will still be generating a profit so long as 1 out of 5 are paying for iCloud storage).

avipars · on July 6, 2021

Snapchat also uses google cloud...

jbverschoor · on July 6, 2021

I'm sure many companies use google cloud...

avipars · on July 6, 2021

Yes, people were worried about apple's privacy (as it seems)...

wodenokoto · on July 6, 2021

They used to be the poster-child for app-engine.

Don’t know if they still are, but they are not only using, but deeply embedded in GCP.

rcarmo · on July 6, 2021

I don’t see why this is news, given that it’s been common knowledge to anyone who has things like Little Snitch running and tracing your network requests to all sorts of online services.

UI_at_80x24 · on July 6, 2021

8,000,000TB = 8,000PB = 8EB

swiley · on July 6, 2021

Own your data or quit gripping when this happens.

On my phone I keep all my personal stuff in git with the remote set to my desktop (it has a public IP) over ssh. My photo stream is just rsynced occasionally over ssh.

Not having to deal with this is arguably even more convenient provided your OS vendor doesn't intentionally get in the way.

djxfade · on July 6, 2021

Yes this probably works great for you. Now please explain my grandma how to set up Git and SSH on her phone.

swiley · on July 6, 2021

Obviously she would use different software, my point is iOS essentially forces people like her to use this.

CountDrewku · on July 6, 2021

Non-tech people aren't going to ever be able to set something like that up on their own and keep it working. I don't even want so mess with stuff like that. I do that all day at work, I neither have the time or motivation to worry about messing with more tech after work.

Also, I would suggest not port forwarding ssh directly to an internal machine. Use a VPN and then ssh.

Dah00n · on July 6, 2021

Which phone do you use?

sydthrowaway · on July 6, 2021

Are there any high density devices in production that can support planet-scale data storage?

suifbwish · on July 6, 2021

What do you mean by planet scale?

rootsudo · on July 6, 2021

Ironic arbitrage, Apple sells it for X, pays google for Y.

The absolute state of things.

SilverRed · on July 6, 2021

You aren't just buying dumb storage though. Its a whole service and tools to utilize the service.

What Apple and Google are selling are different products. A fruit salad is different to what the fruit store sells and receives a higher price.

malwarebytess · on July 6, 2021

What does Apple sell wrt data storage that Google doesn't? The integration into apple's product line? Seems a bit mealy mouthed to me. Both companies sell storage to consumers.

jfrunyon · on July 6, 2021

You also pay more for Google's consumer storage than for Google's "Cloud Storage".

ACAVJW4H · on July 6, 2021

Apple and Google need to give people a chance to manage their own data.

ben-gy · on July 6, 2021

I disagree - the average person is not capable of securely managing their own data, which is why they pick a consumer brand like Apple to do it for them - providing an option to do so would likely result in significant exposure of personal data at significant scale from botched DIY attempts.

To test this theory, simply ask an average iPhone or Android user what encryption is.

What you are saying here is the equivalent to me buying a plane ticket, but also having the option to fly the plane as well.

If you want to manage your own data - use something like a PinePhone - the barrier to entry is high enough that people who are capable of managing their own data securely can use the device and achieve the data sovereignty outcome they require.

That said, I believe Apple should build and manage their own infrastructure. They have the resources and capabilities to do so. The longer Apple stick’s with GCP the larger the inertia of moving away and the less leverage they have when it comes to negotiating to maintain their high standard of privacy commitments.

fieryscribe · on July 6, 2021

I don't find this so convincing.

Most normal usera may not be able to secure things effectively and they can still use the existing iCloud / Google Play infrastructure. That doesn't mean that other users, who want to manage their own data for one reason or another, shouldn't get the opportunity.

It's far more likely that building and maintaining this feature is not worth the development time for Apple and Google product teams at this point, since the possible market is small

shmoe · on July 6, 2021

Exactly why it won't happen anytime soon, unfortunately.

omegalulw · on July 6, 2021

Apple has already diversified their cloud usage so the inertia in building and switching to their own cloud may not be that big.

snowwrestler · on July 6, 2021

As far as I know, you can manage your own data with Apple devices if you want to.

By that I mean, storage in iCloud is optional. You can back up Macs locally pretty easily with Time Machine and you can back up iPhones to Macs locally as well (which then gets backed up via Time Machine). You can encrypt Time Machine and iPhone backups too if you want.

Is it as easy as using iCloud for backups, photo syncing, etc.? No, but that’s because iCloud is not just dumb storage, it’s a hosted software application (several, really).

SilverRed · on July 6, 2021

The files API on ios is open for anyone to use. You can run your own nextcloud server on your own hardware and it integrates in to the OS just like google drive or a plugged in USB would.

cunthorpe · on July 6, 2021

You can’t do backup or sync the Photos.app to a random cloud. The “integration” here is just that occasionally you can save to and load from your server. That’s all.