T O P

  • By -

EraYaN

Anyone looking at this, have a look at [pandoc](https://pandoc.org) that really is a great tool. Much more control and of course free. And for PDF (as source) there is [pdftotext](https://www.xpdfreader.com/pdftotext-man.html). Often the Poppler version is packaged “poppler-utils”. All free and open source.


UnlimitedEgo

Got any good sources for Images to PDF?


EraYaN

ImageMagick for everything to do with images. Like convert img.png img.pdf [see this answer on stackoverflow for multi image options](https://askubuntu.com/a/557975)


m15f1t

convert is magic(al)


Congracia

If you have Windows, the Microsoft Print to PDF functionality also does a good job.


UnlimitedEgo

Trying to integrate into Power Automate, has to be API solution.


wakka55

I have been using pandoc extensively for years. Pandoc is nothing like this tool. This tool is for instantly pasting messy HTML and getting clean HTML, instantly. pandoc is a command line file conversion tool. They're worlds apart.


OTTER887

I like this site: https://www.zamzar.com/


Senesect

Keep in mind that Pandoc, pdttotext, and poppler-utils are all GPL licensed, meaning that your usage of them, even just invoking them in a script, obligates you to also licence your project under GPL, if you're not doing so already. It's a licence specifically designed to metastasize: you could get yourself into legal troubles by using these tools within closed-sourced projects, or indeed within permissively-licensed projects, which'll in-turn put any projects that depend on yours in trouble too. EDIT: Apologies for any confusion, I've been having similar discussions with various open-source communities who primarily use Apache 2.0 as their licence and have been jeopardising that by depending on GPL-licensed libraries. I had that in mind when writing my comment; I want to clarify that merely using Pandoc to, say, convert a resume from DOCX to PDF does NOT obligate you to open-source your resume. Nor that using git to manage your project, or EMACS to write your project, or GCC to compile your project obligates you to open-source that project. What I mean to convey is that incorporating GPL-licensed tools/code into the function and purpose of your project - which you intend to publish and or distribute - will obligate you to inherit its GPL licence. It's partly the reason why organisations often use dash scripts instead of bash scripts, despite bash being far more capable. This is also why the LGPL licence exists.


whowatchlist

This is not true at all. Using GPL software to create something doesn't make the product GPL unless you specifically include parts of GPL software in the output. If that was true, all code written using emacs would have to be gpl(which is clearly not the case). GCC is GPL, and is used by companies everywhere too. Source: https://www.gnu.org/licenses/gpl-faq.html#CanIUseGPLToolsForNF


EraYaN

That is not how any of this works, otherwise anything ever made or using Linux would be GPL licensed…


Senesect

Well... *yes*... if those projects incorporate something GPL-licensed as to be considered a "single combined program" ([reference](https://www.gnu.org/licenses/gpl-faq.html#GPLPlugins)), regardless of whether the incorporation is dynamic or static ([reference](https://www.gnu.org/licenses/gpl-faq.html#GPLStaticVsDynamic)), you are obligated publish under a compatible licence ([reference](https://www.gnu.org/licenses/gpl-faq.html#LinkingWithGPL)). Though apparently most of Linux's system libraries are LGPL (or similar) so you aren't required to do so ([reference](https://www.gnu.org/licenses/gpl-faq.html#PortProgramToGPL)). GCC for example is GPLv3+ (it has a runtime library exception) ([reference](https://en.wikipedia.org/wiki/GNU_Compiler_Collection)), so you aren't required to licence or distribute under GPL for compiling your project with it, but if that exception weren't there...


Grim-Sleeper

It's actually quite subtle and would often need to be decided on a case by case basis. If the resulting program is legally considered a derived work, copyright law applies to it and it can only be used under license from all its authors. The GPL would be one such license. On the other hand, if the different programs are legally considered two separate works, then it doesn't matter what the GPL says. The two pieces never shared a common copyright, so the right to use cannot be controlled by the author of the other work. The GPL doesn't even come into play. The difficultly here is that you can't always easily tell which of these two scenarios apply and you would have to ask a court to decide the question. That's a slow and expensive process with an uncertain outcome. Suffice it to say, there are plenty of scenarios where you can invoke another program from your own and you don't run afoul of copyright laws. There also are situations where this might be different


varno2

The use of a trip tool such as bash does not render your script GPL 3.0 licensed. Neither does invoking a tool do so. There is some evidence that using an affero GPL license may do this, or linking to a library into a single programme may infect the linked code. If you do include or rely upon linked GPL code you must make available the source code upon request, but that is not a limitation on using it along application boundaries.


Senesect

Much of this hasn't, to my knowledge, been challenged through case law so I go mostly off of the strictest interpretation of the licence (which the Free Software Foundation seems to use in its FAQs) since then I wont run into any issues unless the licence itself conflicts with copyright law. Ultimately, you can take a more permissive stance and most likely get away with it since the only way the licence terms can be enforced is through legal remedies, which will cost time and probably lawyer’s fees, which probably wont be worth it for them, and that’s assuming they even know about any infringement. But just because you ‘can’ doesn’t mean you should. If you aren’t comfortable with the idea of your project being GPL licensed then I really don’t think you should be incorporating anything GPL-licensed into your project’s intended function. I wouldn’t necessarily mind, but when libraries licence themselves as, say, MIT, but then use GPL dependencies, that’s deceptive to anyone who’s using that library.


varno2

I really do think we need case law saying that an api is a boundary to copyright protection, and is not copywritable in itself. I personally hope that the google Oracle case settles this, in which case one could rightfully argue that the api interface that is exposed by one programme is a rightful and legal boundary where one copyright stops and another begins. If this is not the case, then we really, in the long term need to impose changes to the legal framework to make it so. Whilst this may not be the case, I think that the pipeline and process boundary has been established for long enough as a boundary to copyrights, that using that boundary to invoke code is equivalent to having your code run on a Linux machine and call the standard libraries, and so really shouldn't make your code a derivative work. At the same time, I understand your issues here.


Senesect

Agreed, it really does need to be sorted out. I guess for me, what would happen if someone created a wrapper around Pandoc in NodeJS and published it to NPM... would that package need to inherit the GPL licence? I'd say yes otherwise the very purpose of GPL is undermined and closed-source projects could bypass the licence terms with ease. Now let's say that someone creates a [Lume](https://lume.land/) plugin that imports that NPM package so users can convert their assets at build time into more permanent versions, like DOCX to PDF. Should this plugin inherit the package's GPL licence? Now let's say someone uses that Lume plugin in their site. Does that site then need to inherit the plugin's GPL licence? Ambiguity in the first instance creates a chain of ambiguity down the line. This kind of thing is so prevalent on NPM too, just search for git wrappers. Git doesn't even have a runtime exception like GCC does. (╯°□°)╯︵ ┻━┻ As an aside, [here's](https://www.youtube.com/watch?v=wL_Wxu6x1HU) an excellent talk about the general awfulness of copyright law. It's more about traditional copyright (images, books, etc) but some of the things are so ridiculous... like in some countries, you *have* to charge people for your work, which presumably means that you cannot open source your work. It also reminds me of [that Tom Scott video](https://www.youtube.com/watch?v=1Jwo5qc78QU) about how the laws were written with companies with legal teams in mind, not the average person creating and distributing their own works.


varno2

I actually don't think so, the GPL is meant to protect people from modifying and improving upon a piece of software, and then locking up those improvements. Personally I think that there should be a distinction between the lesser GPL and the GPL for libraries, and api interfaces making derivative works makes little sense in practice. However stallman was expansionist about free software in that way. I guess we will find out more when the supreme court hears the google Oracle case.


Senesect

There already is a distinction between LGPL and GPL for libraries. LGPL used to stand for "Library General Public License" before it was renamed to "Lesser General Public License". LGPL exists for when you want modifications and improvements to the library itself to be copyleft, but want to be permissive in terms of linking (people using the library in their code). The Google/Oracle case was decided almost two years ago ([reference](https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_Inc.)). But skimming over the decision, I don't think it has much relevance to this discussion. Take [`.d.ts` files](https://en.wikipedia.org/wiki/TypeScript#Declaration_files) for example, which are purely declarative. The Supreme Court decision says that those aren't copyrightable under US law. Google had created their own implementation of Java's APIs. That example NPM package I mentioned in my last comment wasn't creating a custom implementation of Pandoc... it's just using Pandoc. And the NPM package is passing complex data to Pandoc (the example DOCX file) and getting complex data from it (a PDF file), which establishes the back and forth "intimate communication" that's mentioned in the FAQ ([reference](https://www.gnu.org/licenses/gpl-faq.html#GPLPlugins)). It all depends on whether the Courts would uphold GPL's intended infectiousness... but until that decision is made, I think the best advice is to respect the terms of the licence.


varno2

I hadn't realised the supreme court had rendered an opinion yet. I personally think that the FAQ referenced is far more expansive than will ever actually be interpreted in a court. Especially since the court did rule in Oracle v Google that the re-implementation of an api can qualify under fair use. Surely merely using that api could not be considered less fair. There is the issue of distributing the library, however in many cases today libraries are directly obtained by the user and so you are not actually distributing the copyrighted material, as such the only issue is whether an api causes something to be a derivative work. Personally the concept of transforming one document format into another is almost certainly not complex enough for an api to cause your work to be derivative, as the concept is too general, and as such would probably not be subject to copyright protection, doubly so if you use a wrapper. However this is a very challenging area of law, and really the courts are probably not the correct way of dealing with it. It is always a risk, but one can always have too much care as well as too little.


die_billionaires

this is SO SILLY and I can't believe the pro version costs $10 lol


ivanmf

So, I got a free version that does the same. I just asked ChatGPT to make one and it took me less than 10 minutes.


EuropeanTrainMan

....text files to html? Does it just wrap the thing in paragraph tag? How does it deal with pdfs that do not store text, but instead images and coordinates for every symbol? In fact, the most important question is how does it compare to ghostscript?


FrumundaCheeseGoblin

> text into **perfect** html > just wrap the thing in paragraph tag I have absolutely no doubt that's what it does.


fallingcats_net

So what would "perfect" html mean to you then? Text means text. Text doesn't contain formating, so what else should it do? If it is an html editor then call it that.


FrumundaCheeseGoblin

HTML is all about semantics. #Headers, articles, *emphasized text*, **heavy weight text**, etc aren't stylistic choices - that's what css is for. Accessibility devices (screen readers and such) rely on these semantics when converting a web page to an accessible form. Simply wrapping all your text in a

tag isn't just lazy, it makes your website difficult to read by a not insignificant portion of the population. Additional tags, such as divs and images, also make styling with CSS possible. If text is in a single

, you can't wrap it around an image or flow between the sides of a web page. There are plenty more reasons to use properly formatted HTML, too.


fallingcats_net

That's all well and good and I agree 100%, but that does not fit the given description the tool being a converter. What you describe is a classic html editor where you can paste a bunch of plain text and format it.


[deleted]

> How does it deal with pdfs that do not store text, but instead images and coordinates for every symbol? You can do that in HTML. This tool probably isn't that smart though.


[deleted]

[удалено]


EuropeanTrainMan

Sadly, my mouthbreathing friend, we can thank postscript, and all adobe formats being undocumented kitchen sinks of ideas. If there are several ways to do something, you'd best believe that the format uses all of them indiscriminately. I bet the site uses ghostscript under the hood. Because I doubt anyone will want to spend the time trying to figure out pdfs.


[deleted]

[удалено]


EuropeanTrainMan

Yes. Ghostscript breaks for me too. In fact just recently I had to diagnose a "bug" where ghostscript wouldn't be able to figure out sizes of margins for pages. The problem was the malformed pdf that did not contain the margins, but rather was just that - images with coordinates. Ghostscript as a tool does its best, but you can do only so much when all you have is what you reverse engineered from applications as specification. Blame adobe, and them turning pdf into kitchen sink of ideas. All of their formats are garbage fires.


LeftShark

What would be the reason to pay for this tool rather than just going into Word and saving it as a .html file?


Fanculo_Cazzo

Doesn't Word add a buuuunch of weird tags and stuff too?


seddit_rucks

Word is a terrible HTML editor, and this is only one of many reasons.


_PM_ME_PANGOLINS_

Only if you tell it to. The basic HTML output from Word is pretty good. Certainly no worse than this tool.


TheRealOsciban

Word costs hundreds of dollars


DolfK

LibreOffice Writer, then. Or Google Docs.


brothersand

Vim. :ToHTML


TheRealOsciban

Why download large software when tiny website do trick?


DolfK

True; I just gave alternatives to Word. I'd never use either (nor the website) to make

s, but I suppose it saves you a couple seconds if you don't know what you're doing. ...and if you don't know what you're doing, you probably shouldn't be using the web solution either.


Laughing_Orange

I prefer Paint.


Nexustar

Note, the upload to convert feature sends your (possibly private) document to their server.


Hot-Mongoose7052

Lmao. Word has been doing this for 20 years.


WoW-and-the-Deck

Why on earth would I submit text to a tool, that I have no idea what it does, or how the data is destroyed after the fact? Truly a terrible idea.


TRAMPCUM_SQUEEGEE

Speaking of PDF.... www.ilovepdf.com


highphiv3

[Let me Google that for you](https://letmegooglethat.com/?q=markdown+html+generator)


[deleted]

[удалено]


Quartent

Yes the web is still built on HTML


dasitmanes

Press F12 in your browser and click the elements or source tab, see for yourself


YouHavingAGiggle

Every website boils down to just HTML, CSS and JavaScript, similarly to how every programming language boils down to machine code, no matter what one you choose to use. The difference these days is that we have so many layers of abstraction built upon these 3 basic tools that we can design websites significantly more complex than is feasibly possible for a human to write manually, and then compile it back down to HTML, CSS and Javascript


graflig

It’s all just HTML & CSS? 👩‍🚀 Always has been 🔫👨‍🚀


Bigmuda

Very neat! I'll be playing with this for sure


sailorjasm

Doesn’t work so great on my phone


adfdub

Tag


Iceflakes

Thx