Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As others have mentioned, once you dig into the innards and nuances of plain text representations and formats, things can get hairy. Still, I think the author is correct in that plain text formats are certainly a better base for sharing curricula, and for knowledge production in general, than something proprietary like word docs, pdfs, etc.

I think markup languages like markdown which are both fairly easy to convert into other formats and deliciously human readable are the way to go.



PDFs are still ok though. There are many implementations of PDF parsers, many Open Source. It may not be as "universal" as plain-text, but it is definitely universal. At this point, I believe that all major operating systems ship with the ability to open PDFs for display.


PDFs presume that the writer has "control" rather than the reader. Actually I probably shouldn't put quotes around "control" -- formatting is rigidly constrained, on purpose.

In addition PDF adapts, by design, a model of paper to the web. It's a "horseless carriage" file format.

Font size, page width, cut/paste and presentation in general should be the reader's choice, not the writer's. The Web manages this, sort of.

The OP is right on in this regard. Even the TeXs of this world, while better than the binary formats, have upgrade complexity.


>should be the reader's choice, not the writer's

Sometimes. There are a lot more design options available if the creator of the content maintains control over layout, fonts, etc. Sometimes this doesn't matter--if it's a block of text for example. Fairly simple layouts also render pretty well on the web.

Different content works better or worse with different approaches. One isn't intrinsically superior.


On the other hand, how does one write a standard invoice or a tax form using only plain-text / markdown? Formats with control over the layout have their place.


Maybe not just writer/reader, but also a presenter. In the context of education, a teacher will often be using someone else's material but presenting in their own style.


"PDFs are still ok though."

They are OK in theory, for the reasons you state.

The problem is the gratuitous use of PDF which we have all experienced - here's a common (pathological) example:

Document author starts with plain text - no special formatting or fonts, no images, etc. Somehow their toolchain converts that into a PDF file that contains No text, but rather an image of text.

The result is a big, bloated, unnecessary use of PDF that cannot even be parsed or used with anything but a graphical PDF viewer since the text is now gone - there is nothing but a picture. Of text.


A PDF without text still kind of works. It not accessible which is a big problem. However, this happens when there is a mistake somewhere in the chain. This is all too common, but what is the correct way to handle tables, diagrams, formulas, images, graphs, links in plain text. There is no right way to do these things in plain text. You will eventually need html or pdf for education.


Also: PDFs have pages and thus page numbers, which are an important tool. When working with long texts you need reference points to share your state.


That's what hyperlinks are. They even have the better property that they don't break if you change the font size for the whole text.


They don't mean much in a classroom of say 30 or 40. A lecturer will have n people with n texts and will have to synchronise them. In a typical lesson you jump to different pages also non-sequentially. The best option is a PDF etc. that you can optionally print. Anything else is a recipe for chaos, especially if you're not in a CS classroom.


Are they? I can't even copy text from a PDF without the new lines and indentation getting completely messed up never mind the rest of the formatting...


PDFs are awful for the blind, most PDFs are very painful to extract text from.


    PDFs are awful.
Could have stopped right there.


I agree that PDF is effectively universal as a distribution format for graphical and formatted text. There are some cases where plain text is still nicer, like when trying to read documentation without an X server.


PDF is a terminal format.


Even Pandoc gives up on trying to read it. To be fair, though, I think that the office suite file formats are even worse.


I have done a lot of work with code for handling pdfs.

The spec is 1000+ pages, references other docs, has many omissions and contains much that is apocryphal, or at least wildly inaccurate


pshaw.

right now, today, tuesday june 13th, 2017, safari will not open a .pdf to a specific page or bookmark, neither on the desktop nor in ios.

chrome can. firefox can. but safari cannot. which means neither the iphone nor the ipad can do it.

imagine if -- on the web -- you couldn't deep-link to an anchor in the middle of a webpage, but merely to (the top of) the webpage itself.

that's not the only deficiency of the .pdf format. it's not even the most galling one. it's just the one that happens to be hamstringing a certain project of mine at the moment. and it's illustrative.


PDFs seem to render poorly on ebook readers.


Mainly because the layout code (Postscript?) in the PDF file assumes an A4 or US letter sized page. Most PDFs just cannot handle being reflowed.


> Mainly because the layout code (Postscript?) in the PDF file assumes an A4 or US letter sized page

Technically, it could be anything. The layout is specified, and everything stems from this.


ebook readers yes. But they can work quite well on full-size tablets.

Kindles etc. are great when you're mostly reading a flow of text. For anything that benefits from design layout -- positioned graphics, sidebars, footnotes, etc. PDF on a 10" tablet is often better.


Ideally you'd provide a PDF as the most convenient format, but have a latex or similar root file that could be processed into other formats, like maybe .mobi or .epub.


> I think markup languages like markdown which are both fairly easy to convert into other formats and deliciously human readable are the way to go.

I believe that MediaWiki, AsciiDoc or LaTeX are particularily well-suited for this purpose.

MediaWiki is already widely known and widely used for knowledge accumulation, namely, in Wikipedia. The downside is, of course, that this wiki language has quite some limitations.

AsciiDoc is a well-designed language with a stable definition for years. (Compare this to Markdown: Are you using the original Markdown? Or a fork of some fork of Markdown?) Also, AsciiDoc can be converted to beautiful HTML as well as beautiful PDF. Also, the clean definition of AsciiDoc means you have no trouble with nesting. For example, in a table cell you can put everything: enumerations, code listings, and so on. You can even put a new table within a table cell if you need to.

LaTeX is the de-facto standard in mathematics and parts of computer science, and has proven to be a stable standard, too. For example, arXiv doesn't accept your generated PDF as black-box, they want your LaTeX source and generate the PDF themselves. (That way, they can, for example, automatically produce PDFs with hyperlinks from documents which originally had no hyperlinks.) The downside is, of course, that LaTeX is not as readable as a plaintext-like format.

So for any "serious" / rigorous documentation purposes, either AsciiDoc, MediaWiki or LaTeX are the way to go.


The big upside to Markdown is that unprocessed readability is fully first-class; if something doesn't provide at least some utility when reading the raw document, then it doesn't exist. Of course, that's also one of its downsides.


Markdown has CommonMark and GitHub's Flavored Markdown, based on it.

Github is widely used too, and being used in code, it's likely that it will appear in more places than just online wikis.


AsciiDoc has a serious adoption problem compared to MarkDown.


Why is that? Given the comparatively low quality of Markdown (as a language), isn't GitHub the only reason why Markdown gets pushed so much? If GitHub chose AsciiDoc instead of Markdown as their base, we would be in a better position now.


> Given the comparatively low quality of Markdown (as a language)

Markdown stresses ease of readability. I don't find the language low quality at all, just limited. Limited isn't necessarily bad, depending on its purpose.

> isn't GitHub the only reason why Markdown gets pushed so much?

Reddit's use seems to significantly predate Github, and I would say reaches a far greater audience. Unique users visiting in April 2017 is 1.285 billion[1]. Some fraction of that is probably unique people (given anonymous desktop and mobile usage), but given how large the number is, I imagine it's still a very large number.

> If GitHub chose AsciiDoc instead of Markdown as their base, we would be in a better position now.

I agree that Github would be in a better place now, but I don't think that would really have changed anything for anyone else. Even if you want to make a case that Github would have influenced programmers who would have then used it in other projects, I think you need to account for Stack Overflow also. I think it was arguably much more popular and used by a far wider audience than Github for a long time (and may still be, I'm not sure).

Markdown was used widely because it was simple, and users would actually bother to learn and remember the very few options they had. People are more used to it now, and at this point, sure, they might accept something more complex (especially if it built on the rules they already internalized), but I don't think we can blindly assume they would have accepted something more complex.

1: https://www.statista.com/statistics/443332/reddit-monthly-vi...

Edit: s/Unique IPs/Unique users/


You act as if AsciiDoc is difficult to learn. It seems mostly equivalent to Markdown, just nicer in a lot of small ways


AsciiDoc is much more complex, in that it supports many more things. Check out the user guide[1]. Markdown takes a couple paragraphs to describe. You can know everything there is to know about using markdown within a couple minutes at most.

1: http://asciidoc.org/userguide.html


Isn't that comparison unfair, given that AsciiDoc gives you more functionality in a curated whole?

Nobody forces you do use all of them. If you just want to use AsciiDoc as "Markdown with more coherent syntax", stick to a small subset that is equally trivial to learn.

And if you need more, AsciiDoc's comprehensive user guide is a huge advantage. In Markdown, you'll have to look around for all kinds of forks. And god forbid if you want to use two additional features that were not designed to work together in the first place. Compare this with AsciiDoc's clear extension system where you can hook up everything and it won't interfere with each other.


> Isn't that comparison unfair, given that AsciiDoc gives you more functionality in a curated whole?

I don't see how it's unfair. I think AsciiDoc is much more complex, by just about any metric you want to use. I'm not saying it's necessarily worse by many metrics, just that by the particular metric of getting average people to use it, and not just a random subset of it, Markdown's simplicity is beneficial.

> Nobody forces you do use all of them.

No, but for random internet user there's a real trade-off between what you're trying to accomplish and what it takes to accomplish it, when you're just trying to write a simple comment. Reddit has a link that says "formatting help" below the comment box, and when you click it, it shows every option you have for special formatting with examples in a table with nine rows and two columns, including the header. They could have chosen AsciiDoc, but then they would have to make a decision about what features to elevate to the quick help and which not to, and very possibly which to disallow for their use case.

Markdown is simple for users to use, simple for user to understand, simple for developers to implement, and simple for developers to decide about. That simplicity is both why it was adopted by developers and why users bothered to learn and use it. As I alluded to before, AsciiDoc may have been a better choice in the end, but I don't think it's quite as simple as AsciiDoc does more stuff, so it's better. It's all about the trade-offs, and there's been plenty of discussion on that[1] before, from both sides.

1: Just google "worse is better".


gruber's original markdown is "simple" because it's brain-dead.

which is why so many people had to "extend" it with different "flavors", which has now created a massive mess of inconsistencies.

sometimes worse is better. and sometimes it's just plain worse. and sometimes it's the worst kind of situation you could ever imagine.


There's a difference between something that's not good and something that just doesn't go far enough for your needs. If it were that brain-dead it wouldn't be extended, it would be replaced.


gruber's brain-dead version _has_ been replaced. by better versions. the problem is these "better versions" are all inconsistent with each other. and each of them has an installed base which insists that the egg be cracked on their preferred end.

if instead of adopting markdown, people would have extracted a small subset of asciidoc (which predated markdown) or restructured-text (which also predated markdown) to serve the brain-dead use-cases that markdown claimed, those subsets would've been just as "simple" to learn, but also leveraged more cleanly when people sought to extend the light-markup toolkit to longer-form documents.

but the blogosphere thought it was hot shit back then, and took great delight in pushing things viral. ergo markdown. so now we're stuck in a bad situation.


> gruber's brain-dead version _has_ been replaced. by better versions. the problem is these "better versions" are all inconsistent with each other.

They are all mostly consistent with the core markdown. They are inconsistent in their extensions. Markdown itself does have problems in that there was no formal spec, but that's mostly been resolved with CommonMark[1]. They even go so far as to document the different extensions that have been developed with their different syntaxes[2]. You might be tempted to call CommonMark a replacement, but it's not, it's really just a formalization of a spec based on Markdown.pl the the test suite that resolved some ambiguities.

> if instead of adopting markdown, people would have extracted a small subset of asciidoc (which predated markdown) or restructured-text (which also predated markdown)

In that case, why not Setext, which is from 1991? I'll tell you why, because Markdown was meant to codify already in use norms, and to emphasize readability over all else:

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

To this end, Markdown’s syntax is comprised entirely of punctuation characters, which punctuation characters have been carefully chosen so as to look like what they mean. E.g., asterisks around a word actually look like emphasis. Markdown lists look like, well, lists. Even blockquotes look like quoted passages of text, assuming you’ve ever used email. - Markdown Syntax "Daring Fireball – Markdown – Syntax. 2013-06-13.[3]

> those subsets would've been just as "simple" to learn

I think not. For some, including me, markdown was almost zero-cost. It's how I wrote email.

> but the blogosphere thought it was hot shit back then, and took great delight in pushing things viral. ergo markdown.

I think that's highly simplistic, and ignores the realities. One of which is that it was pushed on Reddit, which has become one of the largest and most used sites on the internet. I find it hard to believe the blogospere opining on it (because it's not actually used on all that many blogs) has had more sway than them on this topic.

1: http://commonmark.org

2: https://github.com/jgm/CommonMark/wiki/Deployed-Extensions

3: http://daringfireball.net/projects/markdown/syntax#philosoph...


> They are all mostly consistent with the core markdown.

it's fairly easy to get the brain-dead part "right". even down to replicating gruber's original bugs and his corner-case complications.

> They are inconsistent in their extensions.

that's precisely my point. and the crux of the problem.

> Markdown was meant to codify already in use norms

markdown's markup did not differ significantly from that of asciidoc or restructured-text. all of them, including setext, leveraged existing conventions from e-mail and usenet.

> and to emphasize readability over all else

since nobody is meant to actually _read_ raw markdown, i've never understood why everyone cites that passage so religiously, other than that is part of the origin story mythology.

> I think that's highly simplistic, and ignores the realities.

due mostly to netnewswire, which installed gruber as its default mac-blogger, gruber's reach was phenomenal when blogging first went viral. if you don't understand the power of that reach at that time, it's probably because you weren't around. and that group of "cool internet kids" still flaunts itself, most notably recently in the nearly-immediate widespread uptake of json-feed.

the _only_ reason markdown was the choice of the masses was because it looked "easier" to a lazy tl;dr mentality. which is a false economy for which the light-markup revolution will have to continue to pay for years down the line.

well, that coupled with the fact that markdown has a catchy name. one cannot deny that. that helped too.

at any rate, kbenson, i'm off to a school reunion, so the last move here will be yours, if you choose to make it. we've hit the point of severely diminished returns anyway.


> since nobody is meant to actually _read_ raw markdown, i've never understood why everyone cites that passage so religiously, other than that is part of the origin story mythology.

Because that's not a universal feel, and some people do read it. I write a subset of markdown normally in text. I use asterisks for bold, use a hash for section headings, and use unordered and ordered lists as defined. I value that I write the same thing, and sometimes it's just text and sometimes it gets prettified, and I really don't need to care the majority of the time whether it does or not, because for the most part people understand the conventions used in the plain text.

Here's the kicker, in one job I designed a system to send email to customers that took advantage of this, and if you supplied a text message to email and the markdown version was different, automatically generated a multi-part email with the plain text part being the markdown, and the HTML part being the generated output from the markdown.

> due mostly to netnewswire, which installed gruber as its default mac-blogger, gruber's reach was phenomenal when blogging first went viral. if you don't understand the power of that reach at that time, it's probably because you weren't around. and that group of "cool internet kids" still flaunts itself, most notably recently in the nearly-immediate widespread uptake of json-feed.

I think you vastly overestimate the pull Gruber had over the general people at that time. I didn't know anything about him, but it wasn't because I wasn't around, I was already working in the industry. It was because I didn't have anything to do with Apple products and didn't care. Which is the same for most people. We're talking about three years pre-iphone here. Before the unibody macbook. Apple's core product that was tapping a wider audience was the iPod. If you weren't following Apple as a customer and fan, chances are you didn't know or care who Gruber was. I certainly didn't.

But Gruber wasn't the only author. Arron Schwartz invented it with him, and Aaron Schwartz was helping out an early Reddit a year later. Again, I think you vastly overestimate Gruber's role over actual use in popular sites, such as Reddit, and later Stack Overflow.

> well, that coupled with the fact that markdown has a catchy name. one cannot deny that. that helped too.

I won't deny that at all! I think that probably has more to do with it than Gruber's advocacy as well. :)

> at any rate, kbenson, i'm off to a school reunion

Enjoy! I've got another year before I have my 20th.

> we've hit the point of severely diminished returns anyway.

Agreed. We're really just refining our prior points but not making any headway in persuading each other.


back from the school reunion.

my only note now is that i was never trying to "persuade" you. or anyone else. think whatever you like, wrong or right.


Markdown is also what RStudio uses for markup of comments in notebooks, so it is very common in the bioinformatics and statistics worlds.


Same in Python Juypter Notebooks. I wish I could use rst though. I have to google it every once in awhile to remind myself that they won't support it (https://github.com/ipython/ipython/issues/888#issuecomment-2...)


You make some good points regarding LaTeX. In my opinion, anything that you can express in markdown is not that hard to read in TeX source. Most of md is centered around section headers, links, basic text formatting (bold/italic) and lists. Most of those are very readable in TeX source.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: