I published a book titled Cleaning Data for Effective Data Science: Doing the other 80% of the Work over the pandemic. I honestly believe it is a very good book in a needed topic. You should buy it, and recommend it, and review it; or at least read it for free and benefit from it.
It took two publishers to get the book published, and in the end I made a bad choice about how to publish it.
It was a mistake to choose Packt for a publisher. If you are looking to publish a computer book, it will be a mistake for you to choose Packt also. They simply do not promote books in any effective or sincere way, but instead simply hope authors will do so on their behalf while they keep the large majority of the revenue.
I also wrote another book over quarantine, called The Puzzling Quirks of Regular Expressions that is also really fun, but much more whimsical. Think of it as Sudoku or crossword puzzles for programmers, but with really nice artwork added. Or maybe think of it as a coloring book, then. That one started with one publisher, but wound up being self-published for different, but equally stupid, reasons.1
Back to Packt. About a year ago, I wrote the below (lightly edited) to Jon Malysiak, who was then Senior Producer Marketing at Packt, but has since left for better pastures. Basically, he was the guy in charge of the people I directly worked with.
The letter had no effect, since Packt’s business model is fundamentally wrapped in deceit. But I was really quite detailed when I wrote it a year ago.
Hi Jon,
When I spoke with Shailesh Jain on September 24, 2020 (after corresponding in email for about a week before), I had a contract with Pearson for our recently published Packt book, but had concerns about working with Pearson. In the main, I was concerned about a very unhelpful technical reviewer they brought in and by long delays in publication on their end (I was 95% finished in June 2020, but they had not yet begun any review process until August, with technical editing not planned for numerous months more. I had also had issues with the multi-year long move of their royalty records to a different (and much worse) system that delayed and misreported my royalties on numerous things I've done for them.
At that point, I was seriously considering moving the book away from Pearson, with a consciousness that, sadly, earlier corporate purchases meant that Pearson was a pale shadow of the former rigor and quality of Addison Wesley, which they purchased and which I previously published on. So I was very seriously considering either moving publishers or self-publishing, instead of continuing with Pearson. I was leaning towards self-publication, but fearful of the self-marketing work it would require. The book was substantially finished during that call, needing only technical and copy editing (and much less of either than what most authors produce).
During our September call (and in surrounding email before and since), I raised four main concerns about Packt, in pretty much this order of priority.
1. That the main underlying content should remain available for readers, both as my obligation to the Free Software community and because that was the expectation of readers within this community.
2. That Packt be able to pursue effectively the academic market, which my book is stylistically and content-wise appropriate for. Unfortunately, my belief then remains in effect: that university systems will absolutely not consider self-published books for system wide adoption, and MIGHT consider texts from trade publishers rather than textbook-focused publishers (but that's still an uphill battle).
3. The quality of the produced book, especially in the print version, given my prior knowledge of Packt (not generally positive, unfortunately).
4. How soon Packt could bring the basically completed book to publication.
Shailesh provided answers to my concerns, all of which proved to be false, if not outright deceptions.
Before I get to that, I'll mention that after September, and before I contracted with Packt, I had a very helpful conversation with a friend of mine who has quite successfully self-published, in a somewhat different but generally related technical area. In any case, what he writes is absolutely within the range of things Packt publishes, even if not quite as close to my specific topic.
He was very helpful, but spoke at some length about the self-promotion and marketing efforts that were needed for self-published books. He even provided kind suggestions about how I might go about those steps, but did honestly warn of their difficulty. Those steps consisted of actions like arranging interviews, creating a landing page for the book , finding reviewers, sending sample copies, and a few other details.
My feeling was that I was ill-suited for those tasks, and decided largely on that basis to sign a contract with Packt. Let's return to the 4 points I raised immediately on my first conversation with Shailesh (some in the email the week before).
(1) After the book had been living in my own public repository for a year, Packt asked to move it to another repository that was also world readable. I did that promptly. and continued edits from there. I can look up the dates, but after edits were done, I was told that Packt had removed ALL the prose content from that repository, and left only a fairly worthless skeleton of code cells that make little sense.
I objected to that, and after an uncomfortable conversation with Ravit Jain, agreed to leave that in place under the agreement that I could make a landing page with a substantial share of the content to promote sales and act ethically towards the open source community. I pointed out, with numerous cited examples, that having substantial content available is something done by all the most successful books in the Free Software community, and acts as a boon to sales.
A model of selling books by hoarding their ideas as a scarce commodity is SIMPLY NOT how the open source communities or readership works. So in response to basically the DEMAND of Ravit and Shailesh that I create a landing page (they provided examples from other authors), I spent about $1000 for an UpWork contractor... who wound up doing an unsuitable job, but provided a good basic look-and-feel that I started with. Then I spent maybe 30-40 hours of my own time (I'm not a front-end web designer/developer) producing something suitable (and I think actually beautiful).
After that, I received this comment last night from Shailesh: "I'm unsure how the page 'adds value' to the entire project." Admittedly, he also noted that on some devices, the detection of screen size is apparently buggy, and his large dimension laptop incorrectly gets a message about "Device Limitation". I intended for the page to basically be "Buy the book" if viewed on phones, which is a "feature." The mis-detection of "small device" for his laptop is a bug... but in general I was VERY conscientious and careful about providing just enough content to attract readers (but in small sections making saving/printing deliberately cumbersome, hence encouraging book sales), and enough "buy the book" screens for unavailable parts to make it much more overtly marketing than I would prefer [note: these glitches subsequently fixed by me].
(2) I was assured vigorously by Shailesh that Packt had good academic connections and valued marketing to that audience. It's hard to characterize what he said as anything other than a bald-faced lie to get an author to sign a contract. Apparently, on my own I've got the text into one prestigious university library, which is more than Packt's whole marketing team has attempted.
A library copy will add prestige, but isn't the big "win" for immediate sales. Reaching out to central university systems like University of California and Indian Institute of Technology could be over 50% of sales, if successful. But it requires cold calling and finding connections. Most importantly, the way academia works, this CANNOT come from the author, but MUST come from the publisher. That's simply how the world works.
Unlike most Packt titles, mine is written well enough and at an appropriate level that it would make a good secondary textbook for a lot of courses. By intention.
(3) There are so many levels at which the production quality of Packt is substandard. I should have believed my prior experience with other Packt titles rather than the assurances of Shailesh. I finally got the two copies you ordered from Amazon on my behalf, and it's... disappointing. The paper quality looks like maybe 20 lb low-grade paper that is uncoated. The effect is that headings and charts show through from the opposite side of pages. The ink quality is low, causing many plots with filled areas to bleed and look uneven. The binding is obviously done in a cheaper way than higher quality books.
But even other than the physical materials, the production is just bad. I had to argue a great deal over draft revisions to get a few isolated things fixed, like some particularly bad kerning examples. I am pretty sure no one in production even knows what a "ligature" is in typography. Several concerns I specifically raised about the alignment of continuous elements like input/output from code falling verso/recto rather than recto/verso (overleaf) were ignored.
On page 99 is a particularly obvious example of the poor attention to detail by the Packt book production team. There is an epigraph in which the single word "it" is carried to a second line, which ends the quote. I cannot imagine anyone who CARES about book production seeing that and not recoiling in horror (there's another epigraph with the exact same issues as well). The correct way is to wrap a few words before and perhaps make small adjustments to inter-word spacing for visual balance.
This example is one of hundreds that I see looking through the book. It's done to standards of an undergraduate term paper, not to those of a professional publisher. Adobe InDesign is capable of doing better, but it requires attention. A better typesetting system like LaTeX knows to handle many of these things in a more automated way (but honestly, I'm sure InDesign has knobs and switches that simply are not being used as well).
The main point is that I should not have to be the one who finds all production problems, production should simply want to do a good job rather than nominally assure "yes, the words are on the page.
(4) I was told by Shailesh, in absolute and no uncertain terms, that Packt could bring the book to publication by January 2021, given that it was already fully written.
In reality, after the contract was signed on November 3, 2020, Packt sat on it for about two months, until I repeatedly nudged you (the company) about progress. I eventually found out that Packt didn't actually have a technical reviewer, so I found a colleague (Miki Tebeka, who is just great) who would do it. In other words, the work that should be Packt's, in this as many other cases, fell on me.
Miki did a great job. And technical editor Lucy Wan was absolutely AMAZINGLY GOOD. So that is the one truly positive element of my interaction with Packt.
But obviously, when it actually came out was the last day of March. In truth, this date issue was the least of my worries about Packt, but it's notable. Probably even Pearson with their huge delays would have published by that date.
---
So let's get to a final point that isn't a statement about Shailesh's specific promises, but in my general evaluation. I went with Packt because I didn't want to take on certain tasks that I do not feel well suited for, which would be needed in self publishing. These include:
Create an attractive book landing page, as was ultimately demanded (at my expense) by Ravit, and that I created with a bunch of work I might otherwise be billing for (but it is nice looking).
Find reviewers and blurbers for the book on my own. You contacted Alex Martelli, but he agreed (and said such positive things) largely because he knows me personally and by reputation. Naomi Cedar and Stéfan van der Walt I located and corresponded with entirely on my own; they are people with prestigious titles (and my friends) who wrote glowing comments. So 0.5 of the 3 blurbers were Packt's work rather than mine. I'm a little disappointed that a few others I recommended haven't gotten back, but I know that's how it goes with busy schedules. Monika has done a perfectly good job of handling that routine correspondence.
Arrange for media publicity, which I desperately wanted to avoid [being responsible for]. Aside from getting daily nags from Shailesh or Ravit that I need to post something daily to LinkedIn, shilling my book, I've also arranged my interview with Joe Reis for next week, for his Monday Morning Data Chat video series. Packt did find and arrange the interview with Kate Strachnyi, who was really wonderful and asked great questions to a decent sized audience. I also arranged, on my own, the upcoming talk to the Cape Town Data Science User Group (again, because it is organized by a friend/colleague of mine).
Basically, what this amounts to is that (so far) in exchange for 82% of the net revenue, Packt is doing about 25% of the book promotion work I wished to avoid with self-publishing. But of course, it is publicity on a book with much worse production quality than if I had done it myself.
And this is really unrelated, and yet it's hard to keep entirely apart in my mind. The book in a fairly related data science area that I agreed to be a technical reviewer on (but just withdrew from) is simply not up to standards for publication. That's an entirely different book by someone different (who seems very knowledgeable and teaches data science at a reasonably good university). In terms of writing quality at both the level of good prose and at that of good conceptual organization, it falls short. This kinda confirms my feeling that Packt is simply trying to throw out as many titles as you can, as cheaply as you can, and just hope something randomly sticks.... which honestly, my title could if nurtured properly.
As an also only slightly related footnote, Shailesh sent me a "marketing report" of top selling titles that he found comparable to mine. I'm friends with the top-3 sellers, and pleased for their success. But the list is dominated by O'Reilly titles... and is so specifically because of the quality concerns I mention above. O'Reilly does that right! Moreover, zero Packt titles are up there, for mostly the same reason... I think my title is one that has the potential to be there (I guess you can argue vanity by me, but honestly mine is simply better written and addresses an area that really isn't covered in other titles... while having an evergreen quality to it).
In the list of top-selling comparative titles Shailesh sent me, the top three are:
Python Cookbook: Recipes for Mastering Python 3, by David Beazley, Brian K. Jones. This one is freely available at: https://github.com/lpvcpp/learn_python/blob/master/D.%20Beazley%2C%20B.K.%20Jones%20-%20Python%20Cookbook%2C%203rd%20Edition.%202013.pdf
Fluent Python: Clear, Concise, and Effective Programming, by Luciano Ramalho. This one is MOSTLY freely available at: https://github.com/fluentpython
Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas. This one is freely available at: https://jakevdp.github.io/PythonDataScienceHandbook/
Possibly a better comparison for the specific space of my book are:
R for Data Science, by Hadley Wickham and Garret Grolemund. This one is entirely freely available at: https://r4ds.had.co.nz/
Introduction to Machine Learning with Python, by Andreas Mueller and Sarah Guido. This one is freely available at: https://github.com/amueller/introduction_to_ml_with_python
I don't have exact sales numbers for the Wickham or Mueller texts, but I know they are highly successful (probably more than many Shailesh put on the comparison list).
One of the MAIN things highly selling titles in this data science space have in common is that they NEVER pretend that the motivator of sales is trying to keep the content as secret as possible until readers shill out the purchase price.
There are several reasons why Packt is not as successful as other technical publishers. The production quality concerns I mentioned are definitely significant to that. However, at least as significant is a very misguided attitude I had to wrestle with Ravit and Shailesh about, in regards to the ethics and expectations of open source.
Yours, David...
Since I suppose this post will be too long anyway, I’ll tell you the second story in footnote. I agreed to write a title for Pragmatic Bookshelf, to be called Regular Expression Brain Teasers. it’s a series edited by my good friend Miki Tebeka.
The thing about Pragmatic Bookshelf is that they have a cargo-culted production system that is probably 15 years old. It’s not terrible by any means—it’s a hodge-podge of some Ruby and some XSLT that produces reasonably attractive final layouts from an almost-XML format that might have some Markdown embedded in it for ease. Someone wrote it for them, but that person is no longer available.
For the most part, what their black box does is attractive enough and not too difficult to use. However, in a small corner of its choices, the characters |
(vertical bar) and \
(backslash) look extremely similar in the italic font used to typeset code under their system. For most computer topics, these two occur rarely enough in proximity that it doesn’t matter. At worst, even where both can occur on a line, their place within the syntax makes the difference obvious. Regular expressions are probably the only place where clearly distinguishing those two things is critical.
So I just wanted Pragmatic Bookshelf to change the code font used. Even the non-italic version would have been fine. They are deeply afraid of their own production system though. Even though I offered to figure out how to modify it for $1000, they preferred to cancel the contract rather than touching this Stygian box whose mysteries fills them with an incalculable dread.
Anyway, that book is now self-published on Lulu, and as a result of me actually composing the layout and typography, it is truly beautiful (with artwork by some friends of mine included to help align verso and recto… no peaking at solutions without turning a physical page).
Hi David,
I appreciate you sharing the aforementioned details with the wider community. I’d like to share a few points in response to your feedback:
Packt is not an academic publisher, but we do have good distribution reach into academic markets. We do not sell face-to-face to faculty in the way a traditional academic publisher does. For the data cleaning book, we did reach out to our academic contacts; however, the book unfortunately did not seem to fit their approach.
Your inputs and insights into production issues are largely valid and valuable to us. We will dig deeper and learn. Similarly, your analysis and views on a free community version of this book are valid too. As a publisher, we are considering our approach here. We would like to make a full eBook version of the data cleaning book available to the communities that you suggest; we can together learn about the value and impact of this approach.
Further, despite recruiting a technical reviewer within the month of contracting you as the author, we could not provide sufficient value to you as the reviewer eventually dropped out once chapters were shared with them. We are now much more heavily invested in building “co-development communities” around our books (primarily by hosting user-driven servers on Discord) and providing rich feedback/input to our authors in a timely manner.
From an overall perspective, we too are disappointed with the performance of this book. We as a team worked hard during all stages of publication, right from development to production to marketing (both professional and academic). Our goal was to maintain transparency with you, which we did. Despite our efforts, the book has not found an audience and does not seem to have a market user fit. Had we developed this book with you from the start, things might have been different.
Finally, the decision of whether to self-publish or not continues to be a challenging one. It depends on you as the author, your priorities, focus, and approach. We continue to invest in a rich and deep partnership with our authors and core tech. communities. In hindsight, self-publishing might have been the better choice for you and this book.
We wish you all the success in your publishing journey!