Summary¶
It's no secret that Amazon has become a behemoth of an organization, but it started out by starting selling books. And by the end of the time period for this data set (2019), Amazon enjoys a 50% share of all book distribution and an eBook market share around 67% (source). For writers and publishers around the world, it's hard to ignore Amazon's success and it seems unwise to consider not publishing to the platform.
This case study will investgiate the top 50 bestselling authors and books over the ten year period from 2009-2019. The goal is to provide insights so writers and publishers can make data-informed decisions about long-term goals.
Guiding Questions¶
For this case study, our main guiding question is:
How can trends from the Top 50 Amazon Bestsellers over a 10-year period inform long-term goals for authors and publishers?
To address this question, we will be exploring the answering the following sub questions:
- What author(s) published the most between 2009-2019?
- What percentage of books were fiction vs. non-fiction?
- What book appears the most between 2009-2019?
Tools and Techniques¶
This analysis made use of the following tools and techniques:
- Python programming language and libraries;
pandas
,seaborn
,numpy
, andmatplotlib
- Data transformations: extraction, visualizations, summary statistics
- Data inspection: removal of duplicate/unnessary data, change format/datatype, verify unique values
Recommendations¶
The analysis yielded the following key observations:
- Most authors (75%) only published twice during this 10-year period
- A majority (73%) of the expensive books (books priced greater than 16 USD) were non-fiction
- most books (75%) published during this 10-year period were under 16 USD and 50% of them were priced between 7 USD and 16 USD
These observations led to the following recommendations outlined below.
Focus on high quality content¶
Most authors (75%) only published twice during this 10-year period. It takes a lot to publish a book -- drafting, writing, editing, and revising. This would seem to indicate that for those authors who made it into the Top 50 bestsellers list their book was most likely well-received because it was well-written.
For writers/publishers, seeking to make it on these lists in the next ten years this means focusing on writing less and writing better. By slowing down to focus on higher quality content it increases your chance of your book being received well and popular.
For higher priced books, focus on non-fiction¶
A majority (73%) of the expensive books (books priced greater than 16 USD) were non-fiction. Being that non-fiction books generally require more research and effort to write, it make sense that they would be priced higher to reflect that. For authors/publishers seeking earn more, they might consider shifting their output to non-fiction books. This could be more lucrative if combined with other sources of incomes such speaking engagements, workshops, courses, etc.
Set price between 7 and 16 USD¶
This is likely to change due to inflation and increase production costs, but it should be noted that most books (75%) published during this 10-year period were under 16 USD and 50% of them were priced in this range. In other words, when choosing the book price ensure that it is within the current range of book prices -- not too high, but not too low.
Guiding Questions¶
For this case study, our main guiding question is:
How can trends from the Top 50 Amazon Bestsellers over a 10-year period inform long-term goals for authors and publishers?
To address this question, we will be exploring the answering the following sub questions:
- What author(s) published the most between 2009-2019?
- What percentage of books were fiction vs. non-fiction?
- What book appears the most between 2009-2019?
Prepare Data¶
Dataset¶
The data we'll be working with is the Amazon Top 50 Bestselling Books 2009 - 2019 provided by Scooter Saalu on Kaggle.
Description¶
The dataset contains Amazon's Top 50 bestselling books from 2009 to 2019. There are 550 books with the data being categorized into fiction and non-fiction using Goodreads.
The data contains the following columns:
- Name: Name of the book (i.e., book title)
- Author: Name of the person/organization who wrote/published the book
- User rating: Average user rating on a scale of 1 to 5 in a given year
- Reviews: The total number of reviews the book received in a given
- Year: The year the book appeared on the bestseller list
- Price: The list price of the book in a given year
License¶
The data is made available to use via the CC0: Public Domain license which allows anyone to "copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission".
Process Data¶
Let's import our libraries and start processing our data. We will look for the following abnormalities:
- Check for missing values
- Investigate outliers
- Confirm data types
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Quick overview¶
df = pd.read_csv('amazon_bestsellers_2009-2019.csv')
df.shape
(550, 7)
df.head()
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
0 | 10-Day Green Smoothie Cleanse | JJ Smith | 4.7 | 17350 | 8 | 2016 | Non Fiction |
1 | 11/22/63: A Novel | Stephen King | 4.6 | 2052 | 22 | 2011 | Fiction |
2 | 12 Rules for Life: An Antidote to Chaos | Jordan B. Peterson | 4.7 | 18979 | 15 | 2018 | Non Fiction |
3 | 1984 (Signet Classics) | George Orwell | 4.7 | 21424 | 6 | 2017 | Fiction |
4 | 5,000 Awesome Facts (About Everything!) (Natio... | National Geographic Kids | 4.8 | 7665 | 12 | 2019 | Non Fiction |
df.describe()
User Rating | Reviews | Price | Year | |
---|---|---|---|---|
count | 550.000000 | 550.000000 | 550.000000 | 550.000000 |
mean | 4.618364 | 11953.281818 | 13.100000 | 2014.000000 |
std | 0.226980 | 11731.132017 | 10.842262 | 3.165156 |
min | 3.300000 | 37.000000 | 0.000000 | 2009.000000 |
25% | 4.500000 | 4058.000000 | 7.000000 | 2011.000000 |
50% | 4.700000 | 8580.000000 | 11.000000 | 2014.000000 |
75% | 4.800000 | 17253.250000 | 16.000000 | 2017.000000 |
max | 4.900000 | 87841.000000 | 105.000000 | 2019.000000 |
Check for missing values¶
df.isnull().sum()
Name 0 Author 0 User Rating 0 Reviews 0 Price 0 Year 0 Genre 0 dtype: int64
Observations
No missing values found.
Investigate outliers¶
columns_to_plot = ['User Rating', 'Reviews', 'Price', 'Year']
for column_name in columns_to_plot:
plt.boxplot(df[column_name])
plt.title('Box Plot:' + column_name)
plt.xlabel(column_name)
plt.ylabel('Value')
plt.show()
Observations
Price, User Ratings, and Reviews have outliers. However, this is to be expected since there would be variablity in book prices as well as the number of ratings and reviews a book receives. Further, the outliers presented in the data do not seem unreasonable given the context of the data.
Confirm data types¶
df.dtypes
Name object Author object User Rating float64 Reviews int64 Price int64 Year int64 Genre object dtype: object
Observations
No abnormal data types.
Process Conlcusions¶
- No missing values; contains 550 entries (as described in the source)
- The max and min of the year matches what is the data description
- The data types for each column match what is described (i.e., numbers are numbers, categorical data are objects/strings)
Analysis¶
Correlation Heatmap¶
correlation_matrix = df[['User Rating', 'Price', 'Reviews', 'Year']].corr()
plt.figure(figsize=(10,8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Observations
Little or no correllation between User Rating, Price, and Reviews.
Genre¶
genre = df["Genre"].value_counts()
print(genre)
percent_non_fic = round(((genre['Non Fiction'] / 550) * 100))
percent_fic = round(((genre['Fiction'] / 550) * 100))
sizes = [percent_non_fic, percent_fic]
labels = ['Non Fiction', 'Fiction']
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140)
plt.axis('equal')
plt.title('Genre')
plt.show()
Non Fiction 310 Fiction 240 Name: Genre, dtype: int64
sns.countplot(x='Genre', data=df)
plt.xticks(rotation=-45)
plt.show()
Observations
- Most books are non-fiction, but only by a slight majority
- Non Fiction: 56%
- Fiction: 44%
Price¶
Summary Statistics¶
df["Price"].describe()
count 550.000000 mean 13.100000 std 10.842262 min 0.000000 25% 7.000000 50% 11.000000 75% 16.000000 max 105.000000 Name: Price, dtype: float64
Box plot¶
plt.boxplot(df['Price'])
plt.title('Box Plot of Price Values')
plt.xlabel('Price')
plt.ylabel('Value')
plt.yticks(range(0, 110, 5))
plt.show()
Histogram¶
plt.figure(figsize=(8,6), dpi=80)
sns.histplot(data=df["Price"], bins='auto')
plt.title('Amazon Bestsellers 2009-2019: Price')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
Observations
- 50% of the price values for books are between 7 USD and 16 USD.
- Outliers are greater than 30 USD
Expensive Books¶
For our purposes we will define a book as expensive it costs more than 16 USD since 75% of books in our data set cost less than that.
expensive_books_sorted = df.loc[df['Price']>16].sort_values(by='Price', ascending=False)
num_expensive_books = len(expensive_books_sorted)
print(num_expensive_books, 'books sold for more than 16 USD')
expensive_books_sorted
122 books sold for more than 16 USD
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
69 | Diagnostic and Statistical Manual of Mental Di... | American Psychiatric Association | 4.5 | 6679 | 105 | 2013 | Non Fiction |
70 | Diagnostic and Statistical Manual of Mental Di... | American Psychiatric Association | 4.5 | 6679 | 105 | 2014 | Non Fiction |
473 | The Twilight Saga Collection | Stephenie Meyer | 4.7 | 3801 | 82 | 2009 | Fiction |
151 | Hamilton: The Revolution | Lin-Manuel Miranda | 4.9 | 5867 | 54 | 2016 | Non Fiction |
346 | The Book of Basketball: The NBA According to T... | Bill Simmons | 4.7 | 858 | 53 | 2009 | Non Fiction |
... | ... | ... | ... | ... | ... | ... | ... |
311 | StrengthsFinder 2.0 | Gallup | 4.0 | 5069 | 17 | 2016 | Non Fiction |
446 | The Pioneer Woman Cooks: A Year of Holidays: 1... | Ree Drummond | 4.8 | 2663 | 17 | 2013 | Non Fiction |
312 | StrengthsFinder 2.0 | Gallup | 4.0 | 5069 | 17 | 2017 | Non Fiction |
342 | The Big Short: Inside the Doomsday Machine | Michael Lewis | 4.7 | 3536 | 17 | 2010 | Non Fiction |
307 | StrengthsFinder 2.0 | Gallup | 4.0 | 5069 | 17 | 2012 | Non Fiction |
122 rows × 7 columns
Summary Statistics¶
expensive_books_sorted['Price'].describe()
count 122.000000 mean 27.172131 std 14.925541 min 17.000000 25% 18.000000 50% 21.000000 75% 29.500000 max 105.000000 Name: Price, dtype: float64
Box plot¶
plt.boxplot(expensive_books_sorted['Price'])
plt.title('Expensive Books (>$16)')
plt.xlabel('Price')
plt.ylabel('Value')
plt.show()
Pie chart¶
expensive_books_genre = expensive_books_sorted['Genre'].value_counts()
percent_non_fic_expensive = round((expensive_books_genre['Non Fiction'] / num_expensive_books) * 100)
percent_fic_expensive = round((expensive_books_genre['Fiction'] / num_expensive_books) * 100)
print(expensive_books_genre)
# pie chart of genre for expensive books
genre_percents_expensive = [percent_non_fic_expensive, percent_fic_expensive]
genre_labels_expensive = ['Non Fiction', 'Fiction']
plt.pie(genre_percents_expensive, labels=genre_labels_expensive, autopct='%1.0f%%', startangle=140)
plt.axis('equal')
plt.title('Expensive Books: Genre')
plt.show()
Non Fiction 89 Fiction 33 Name: Genre, dtype: int64
Highest priced book¶
df.loc[df["Price"]==105]
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
69 | Diagnostic and Statistical Manual of Mental Di... | American Psychiatric Association | 4.5 | 6679 | 105 | 2013 | Non Fiction |
70 | Diagnostic and Statistical Manual of Mental Di... | American Psychiatric Association | 4.5 | 6679 | 105 | 2014 | Non Fiction |
Observations¶
- The two highest priced books (Diagnostic and Statistics Manual... 105 USD, Twilight Saga 82 USD) are 50 USD more expensive than 75% of all the expensive books
- A majority (73%) of expensive books are non fiction
Inexpensive Books¶
For our purposes we will define inexpesive books that are priced less than 7 USD since 25% of all of the books are less than that.
inexpensive_books_sorted = df.loc[df["Price"]<=7].sort_values(by='Price', ascending=False)
num_inexpensive_books = len(inexpensive_books_sorted)
print(num_inexpensive_books, "books sold for 7 USD or less", )
inexpensive_books_sorted
148 books sold for 7 USD or less
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
147 | Goodnight, Goodnight Construction Site (Hardco... | Sherri Duskey Rinker | 4.9 | 7038 | 7 | 2013 | Fiction |
399 | The Handmaid's Tale | Margaret Atwood | 4.3 | 29442 | 7 | 2017 | Fiction |
367 | The Fault in Our Stars | John Green | 4.7 | 50482 | 7 | 2014 | Fiction |
146 | Goodnight, Goodnight Construction Site (Hardco... | Sherri Duskey Rinker | 4.9 | 7038 | 7 | 2012 | Fiction |
283 | Quiet: The Power of Introverts in a World That... | Susan Cain | 4.6 | 10009 | 7 | 2013 | Non Fiction |
... | ... | ... | ... | ... | ... | ... | ... |
381 | The Getaway | Jeff Kinney | 4.8 | 5836 | 0 | 2017 | Fiction |
116 | Frozen (Little Golden Book) | RH Disney | 4.7 | 3642 | 0 | 2014 | Fiction |
42 | Cabin Fever (Diary of a Wimpy Kid, Book 6) | Jeff Kinney | 4.8 | 4505 | 0 | 2011 | Fiction |
358 | The Constitution of the United States | Delegates of the Constitutional | 4.8 | 2774 | 0 | 2016 | Non Fiction |
71 | Diary of a Wimpy Kid: Hard Luck, Book 8 | Jeff Kinney | 4.8 | 6812 | 0 | 2013 | Fiction |
148 rows × 7 columns
Summary Statistics¶
inexpensive_books_sorted['Price'].describe()
count 148.000000 mean 4.804054 std 1.883183 min 0.000000 25% 4.000000 50% 5.000000 75% 6.000000 max 7.000000 Name: Price, dtype: float64
Box plot¶
plt.boxplot(inexpensive_books_sorted['Price'])
plt.title('Inexpensive Books (<=$7)')
plt.xlabel('Price')
plt.ylabel('Value')
plt.show()
Pie chart¶
inexpensive_books_genre = inexpensive_books_sorted['Genre'].value_counts()
percent_non_fic_inexpensive = round((inexpensive_books_genre['Non Fiction'] / num_inexpensive_books) * 100)
percent_fic_inexpensive = round((inexpensive_books_genre['Fiction'] / num_inexpensive_books) * 100)
print(inexpensive_books_genre)
# pie chart of genre for inexpesnive books
genre_percents_inexpensive = [percent_non_fic_inexpensive, percent_fic_inexpensive]
genre_labels_inexpensive = ['Non Fiction', 'Fiction']
plt.pie(genre_percents_inexpensive, labels=genre_labels_inexpensive, autopct='%1.0f%%', startangle=140)
plt.axis('equal')
plt.title('Inexpensive Books: Genre')
plt.show()
Fiction 83 Non Fiction 65 Name: Genre, dtype: int64
Free books¶
print(len(df.loc[df["Price"]==0]), "books sold for free")
df.loc[df["Price"]==0]
12 books sold for free
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
42 | Cabin Fever (Diary of a Wimpy Kid, Book 6) | Jeff Kinney | 4.8 | 4505 | 0 | 2011 | Fiction |
71 | Diary of a Wimpy Kid: Hard Luck, Book 8 | Jeff Kinney | 4.8 | 6812 | 0 | 2013 | Fiction |
116 | Frozen (Little Golden Book) | RH Disney | 4.7 | 3642 | 0 | 2014 | Fiction |
193 | JOURNEY TO THE ICE P | RH Disney | 4.6 | 978 | 0 | 2014 | Fiction |
219 | Little Blue Truck | Alice Schertle | 4.9 | 1884 | 0 | 2014 | Fiction |
358 | The Constitution of the United States | Delegates of the Constitutional | 4.8 | 2774 | 0 | 2016 | Non Fiction |
381 | The Getaway | Jeff Kinney | 4.8 | 5836 | 0 | 2017 | Fiction |
461 | The Short Second Life of Bree Tanner: An Eclip... | Stephenie Meyer | 4.6 | 2122 | 0 | 2010 | Fiction |
505 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2013 | Fiction |
506 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2014 | Fiction |
507 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2015 | Fiction |
508 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2016 | Fiction |
print(len(df.loc[df["Price"]==1]), "book sold for 1 USD")
df.loc[df["Price"]==1]
1 book sold for 1 USD
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
91 | Eat This Not That! Supermarket Survival Guide:... | David Zinczenko | 4.5 | 720 | 1 | 2009 | Non Fiction |
Observations¶
- Some books appear multiple years in a row (e.g., 'To Kill a Mockingbird')
- A majority (75%) of inexpensive books are priced between 4 USD and 7 USD
- A slight majority (56%) of inexpensive books are fiction
User Rating¶
Bar graph¶
plt.figure(figsize=(10, 6), dpi=80)
sns.countplot(x='User Rating', data=df)
plt.show()
Summary Statistics¶
df['User Rating'].describe()
count 550.000000 mean 4.618364 std 0.226980 min 3.300000 25% 4.500000 50% 4.700000 75% 4.800000 max 4.900000 Name: User Rating, dtype: float64
Box plot¶
plt.boxplot(df['User Rating'])
plt.title('Box Plot of User Rating')
plt.xlabel('User Rating')
plt.ylabel('Count')
plt.show()
Highest Rated Books¶
For our purposes, we will define the highest rated books as any books that have a 'User Rating' of 4.9 since most books (75%) have a rating below value.
highest_rated_books = df.loc[df['User Rating']==4.9]
list_of_highest_rated_books = highest_rated_books['Name'].value_counts().index.tolist()
for book in list_of_highest_rated_books:
print(book)
Oh, the Places You'll Go! The Very Hungry Caterpillar Jesus Calling: Enjoying Peace in His Presence (with Scripture References) The Wonderful Things You Will Be Goodnight, Goodnight Construction Site (Hardcover Books for Toddlers, Preschool Books for Kids) Brown Bear, Brown Bear, What Do You See? Dog Man: Brawl of the Wild: From the Creator of Captain Underpants (Dog Man #6) Dog Man: Lord of the Fleas: From the Creator of Captain Underpants (Dog Man #5) Dog Man: For Whom the Ball Rolls: From the Creator of Captain Underpants (Dog Man #7) Unfreedom of the Press Dog Man: A Tale of Two Kitties: From the Creator of Captain Underpants (Dog Man #3) The Magnolia Story The Legend of Zelda: Hyrule Historia Strange Planet (Strange Planet Series) Rush Revere and the First Patriots: Time-Travel Adventures With Exceptional Americans (2) Rush Revere and the Brave Pilgrims: Time-Travel Adventures with Exceptional Americans (1) Dog Man: Fetch-22: From the Creator of Captain Underpants (Dog Man #8) Obama: An Intimate Portrait Little Blue Truck Last Week Tonight with John Oliver Presents A Day in the Life of Marlon Bundo (Better Bundo Book, LGBT Childrens Book) Dog Man and Cat Kid: From the Creator of Captain Underpants (Dog Man #4) Humans of New York : Stories Harry Potter and the Sorcerer's Stone: The Illustrated Edition (Harry Potter, Book 1) Harry Potter and the Prisoner of Azkaban: The Illustrated Edition (Harry Potter, Book 3) Harry Potter and the Goblet of Fire: The Illustrated Edition (Harry Potter, Book 4) (4) Harry Potter and the Chamber of Secrets: The Illustrated Edition (Harry Potter, Book 2) Hamilton: The Revolution Wrecking Ball (Diary of a Wimpy Kid Book 14)
Observations
- 50% of User Rating are between 4.5 and 4.8
- Outliers are less than 4.1
Reviews¶
# create series for value counts
review_counts = df['Reviews'].value_counts()
# convert series to data frame
review_counts_df = pd.DataFrame({'num_reviews': review_counts.index, 'num_books': review_counts.values})
# create histogram of reviews
plt.figure(figsize=(10,6), dpi=80)
sns.histplot(data=review_counts_df['num_reviews'], bins='auto', binwidth=2000)
plt.title('Reviews')
plt.xlabel('Amount of Reviews')
plt.ylabel('Number of Books')
plt.show()
review_counts_df['num_reviews'].describe()
count 346.000000 mean 9786.430636 std 10871.900146 min 37.000000 25% 3362.750000 50% 6361.500000 75% 11510.250000 max 87841.000000 Name: num_reviews, dtype: float64
Books with Most Number of Reviews¶
Since 75% of books have fewer than 11,510 reviews, we will define highly reviewed books as books that have more than 11,510 reviews.
highest_reviews = df.loc[df['Reviews']>11510]
list_of_highest_reviews = highest_reviews['Name'].value_counts().index.tolist()
print(len(list_of_highest_reviews), 'book have greater than 11,510 reviews')
for book in list_of_highest_reviews:
print(book)
88 book have greater than 11,510 reviews Oh, the Places You'll Go! The Very Hungry Caterpillar The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book) Jesus Calling: Enjoying Peace in His Presence (with Scripture References) First 100 Words Wonder Unbroken: A World War II Story of Survival, Resilience, and Redemption To Kill a Mockingbird The 5 Love Languages: The Secret to Love that Lasts How to Win Friends & Influence People Giraffes Can't Dance The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing The Help The Fault in Our Stars You Are a Badass: How to Stop Doubting Your Greatness and Start Living an Awesome Life The Great Gatsby Catching Fire (The Hunger Games) Mockingjay (The Hunger Games) The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life Player's Handbook (Dungeons & Dragons) Gone Girl Milk and Honey The Book Thief The Art of Racing in the Rain: A Novel The Hunger Games (Book 1) The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics Love You Forever The Hunger Games Trilogy Boxed Set (1) The Complete Ketogenic Diet for Beginners: Your Essential Guide to Living the Keto Lifestyle School Zone - Big Preschool Workbook - Ages 4 and Up, Colors, Shapes, Numbers 1-10, Alphabet, Pre-Writing, Pre-Reading… Ready Player One: A Novel Proof of Heaven: A Neurosurgeon's Journey into the Afterlife All the Light We Cannot See A Man Called Ove: A Novel The Nightingale: A Novel The Girl on the Train The Goldfinch: A Novel (Pulitzer Prize for Fiction) Becoming The Shack: Where Tragedy Confronts Eternity Brown Bear, Brown Bear, What Do You See? If Animals Kissed Good Night Hillbilly Elegy: A Memoir of a Family and Culture in Crisis Educated: A Memoir Heaven is for Real: A Little Boy's Astounding Story of His Trip to Heaven and Back The Wonky Donkey Fifty Shades of Grey: Book One of the Fifty Shades Trilogy (Fifty Shades of Grey Series) Girl, Wash Your Face: Stop Believing the Lies About Who You Are So You Can Become Who You Were Meant to Be Divergent The Guardians: A Novel The Hunger Games 1984 (Signet Classics) The Handmaid's Tale Wild: From Lost to Found on the Pacific Crest Trail Where the Crawdads Sing When Breath Becomes Air Twilight (The Twilight Saga, Book 1) A Dance with Dragons (A Song of Ice and Fire) The Racketeer A Game of Thrones / A Clash of Kings / A Storm of Swords / A Feast of Crows / A Dance with Dragons A Gentleman in Moscow: A Novel The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness The Martian The Silent Patient Divergent / Insurgent Sycamore Row (Jake Brigance) American Sniper: The Autobiography of the Most Lethal Sniper in U.S. Military History Harry Potter Paperback Box Set (Books 1-7) Dog Man: Fetch-22: From the Creator of Captain Underpants (Dog Man #8) Fifty Shades Darker Fifty Shades Freed: Book Three of the Fifty Shades Trilogy (Fifty Shades of Grey Series) (English Edition) Fifty Shades Trilogy (Fifty Shades of Grey / Fifty Shades Darker / Fifty Shades Freed) Fire and Fury: Inside the Trump White House Go Set a Watchman: A Novel Grey: Fifty Shades of Grey as Told by Christian (Fifty Shades of Grey Series) Harry Potter and the Chamber of Secrets: The Illustrated Edition (Harry Potter, Book 2) Harry Potter and the Cursed Child, Parts 1 & 2, Special Rehearsal Edition Script Can't Hurt Me: Master Your Mind and Defy the Odds And the Mountains Echoed Inferno Last Week Tonight with John Oliver Presents A Day in the Life of Marlon Bundo (Better Bundo Book, LGBT Childrens Book) Little Fires Everywhere 12 Rules for Life: An Antidote to Chaos Origin: A Novel (Robert Langdon) Orphan Train Doctor Sleep: A Novel The Alchemist The Body Keeps the Score: Brain, Mind, and Body in the Healing of Trauma 10-Day Green Smoothie Cleanse
Top 10 Highly Reviwed Books¶
highest_reviews_sorted = highest_reviews.sort_values(by='Reviews', ascending=False)
list_of_top_10_highest_reviews = highest_reviews_sorted['Name'].value_counts().head(10).index.tolist()
for book in list_of_top_10_highest_reviews:
print(book)
Oh, the Places You'll Go! The Very Hungry Caterpillar Jesus Calling: Enjoying Peace in His Presence (with Scripture References) The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book) How to Win Friends & Influence People The 5 Love Languages: The Secret to Love that Lasts Wonder First 100 Words To Kill a Mockingbird Giraffes Can't Dance
Observations
Most books (75%) have fewer than 12,000 reviews.
Author¶
Let's investigate how many times an author published between 2009-2019 and which authors published the most during this time period.
author_counts = df["Author"].value_counts()
print(author_counts)
Jeff Kinney 12 Gary Chapman 11 Rick Riordan 11 Suzanne Collins 11 American Psychological Association 10 .. Keith Richards 1 Chris Cleave 1 Alice Schertle 1 Celeste Ng 1 Adam Gasiewski 1 Name: Author, Length: 248, dtype: int64
Summary Statistics¶
author_counts.describe()
count 248.000000 mean 2.217742 std 2.046268 min 1.000000 25% 1.000000 50% 1.000000 75% 2.000000 max 12.000000 Name: Author, dtype: float64
Observations
- 75% of the authors on the list published only twice over the 10-year period of 2009-2019
- 25% of the authors published more than twice
Box plot¶
In this box plot, we can see how many times a bestselling author published a book between 2009 and 2019.
plt.boxplot(author_counts)
plt.title('Publishing Frequency of Authors')
plt.xlabel('Author')
plt.ylabel('Number of Published Entries')
plt.show()
Most Frequent Authors¶
For our purposes, we will define the most frequent authors as those who published more than twice.
mask_authors = author_counts>2
authors_most_frequent = author_counts[mask_authors]
print(len(authors_most_frequent), 'authors published more than two books from 2009-2019')
print(authors_most_frequent)
58 authors published more than two books from 2009-2019 Jeff Kinney 12 Gary Chapman 11 Rick Riordan 11 Suzanne Collins 11 American Psychological Association 10 Dr. Seuss 9 Gallup 9 Rob Elliott 8 Stephen R. Covey 7 Stephenie Meyer 7 Dav Pilkey 7 Bill O'Reilly 7 Eric Carle 7 The College Board 6 E L James 6 Don Miguel Ruiz 6 J.K. Rowling 6 Stieg Larsson 6 Sarah Young 6 Harper Lee 6 Laura Hillenbrand 5 R. J. Palacio 5 Dale Carnegie 5 Patrick Lencioni 5 Giles Andreae 5 Roger Priddy 5 John Green 5 John Grisham 5 Marie Kondō 4 Rupi Kaur 4 Rod Campbell 4 Charlaine Harris 4 Jim Collins 4 Kathryn Stockett 4 Emily Winfield Martin 4 Stephen King 4 Jen Sincero 4 Malcolm Gladwell 4 Thug Kitchen 4 Veronica Roth 4 Glenn Beck 3 Drew Daywalt 3 Rachel Hollis 3 Walter Isaacson 3 Gillian Flynn 3 Dan Brown 3 Rebecca Skloot 3 Melissa Hartwig Urban 3 Mark Manson 3 Margaret Wise Brown 3 George R.R. Martin 3 F. Scott Fitzgerald 3 Wizards RPG Team 3 Ina Garten 3 Brandon Stanton 3 Ree Drummond 3 Carol S. Dweck 3 Francis Chan 3 Name: Author, dtype: int64
plt.figure(figsize=(8, 6), dpi=80)
sns.histplot(data=authors_most_frequent, bins='auto', binwidth=1)
plt.title('Most Frequent Authors')
plt.xlabel('Number of Times Published')
plt.ylabel('Number of Authors')
plt.show()
Top 10 Authors¶
top_ten_authors = authors_most_frequent.head(10)
print(top_ten_authors)
Jeff Kinney 12 Gary Chapman 11 Rick Riordan 11 Suzanne Collins 11 American Psychological Association 10 Dr. Seuss 9 Gallup 9 Rob Elliott 8 Stephen R. Covey 7 Stephenie Meyer 7 Name: Author, dtype: int64
Bar plot¶
# convert series to data frame
top_ten_authors_df = pd.DataFrame({'Author': top_ten_authors.index, 'Count': top_ten_authors.values})
# create bar plot of data frame
plt.figure(figsize=(10,8))
sns.barplot(data=top_ten_authors_df, x='Author', y='Count')
plt.title('Top 10 Authors')
plt.xlabel('Author')
plt.xticks(rotation=90)
plt.ylabel('Count')
plt.show()
list_of_top_ten_authors = top_ten_authors.index.tolist()
df_top_10_authors = df[df['Author'].isin(list_of_top_ten_authors)]
df_top_10_authors
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
38 | Breaking Dawn (The Twilight Saga, Book 4) | Stephenie Meyer | 4.6 | 9769 | 13 | 2009 | Fiction |
42 | Cabin Fever (Diary of a Wimpy Kid, Book 6) | Jeff Kinney | 4.8 | 4505 | 0 | 2011 | Fiction |
46 | Catching Fire (The Hunger Games) | Suzanne Collins | 4.7 | 22614 | 11 | 2010 | Fiction |
47 | Catching Fire (The Hunger Games) | Suzanne Collins | 4.7 | 22614 | 11 | 2011 | Fiction |
48 | Catching Fire (The Hunger Games) | Suzanne Collins | 4.7 | 22614 | 11 | 2012 | Fiction |
... | ... | ... | ... | ... | ... | ... | ... |
473 | The Twilight Saga Collection | Stephenie Meyer | 4.7 | 3801 | 82 | 2009 | Fiction |
474 | The Ugly Truth (Diary of a Wimpy Kid, Book 5) | Jeff Kinney | 4.8 | 3796 | 12 | 2010 | Fiction |
513 | Twilight (The Twilight Saga, Book 1) | Stephenie Meyer | 4.7 | 11676 | 9 | 2009 | Fiction |
528 | What Pet Should I Get? (Classic Seuss) | Dr. Seuss | 4.7 | 1873 | 14 | 2015 | Fiction |
545 | Wrecking Ball (Diary of a Wimpy Kid Book 14) | Jeff Kinney | 4.9 | 9413 | 8 | 2019 | Fiction |
95 rows × 7 columns
# book titles for top 10 authors
df_top_10_authors['Name'].value_counts()
Publication Manual of the American Psychological Association, 6th Edition 10 StrengthsFinder 2.0 9 Oh, the Places You'll Go! 8 The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change 7 The 5 Love Languages: The Secret to Love that Lasts 5 The 5 Love Languages: The Secret to Love That Lasts 5 Laugh-Out-Loud Jokes for Kids 5 Catching Fire (The Hunger Games) 3 Knock-Knock Jokes for Kids 3 Mockingjay (The Hunger Games) 3 The Hunger Games Trilogy Boxed Set (1) 2 The Last Olympian (Percy Jackson and the Olympians, Book 5) 2 The Hunger Games (Book 1) 2 The Lost Hero (Heroes of Olympus, Book 1) 1 The Mark of Athena (Heroes of Olympus, Book 3) 1 The Meltdown (Diary of a Wimpy Kid Book 13) 1 Breaking Dawn (The Twilight Saga, Book 4) 1 The Red Pyramid (The Kane Chronicles, Book 1) 1 The Serpent's Shadow (The Kane Chronicles, Book 3) 1 The Son of Neptune (Heroes of Olympus, Book 2) 1 The Third Wheel (Diary of a Wimpy Kid, Book 7) 1 The Throne of Fire (The Kane Chronicles, Book 2) 1 The Twilight Saga Collection 1 The Ugly Truth (Diary of a Wimpy Kid, Book 5) 1 Twilight (The Twilight Saga, Book 1) 1 What Pet Should I Get? (Classic Seuss) 1 The Short Second Life of Bree Tanner: An Eclipse Novella (The Twilight Saga) 1 The Blood of Olympus (The Heroes of Olympus (5)) 1 The Hunger Games 1 The House of Hades (Heroes of Olympus, Book 4) 1 The Getaway 1 The Five Love Languages: How to Express Heartfelt Commitment to Your Mate 1 Cabin Fever (Diary of a Wimpy Kid, Book 6) 1 Percy Jackson and the Olympians Paperback Boxed Set (Books 1-3) 1 Old School (Diary of a Wimpy Kid #10) 1 New Moon (The Twilight Saga) 1 Eclipse (Twilight) 1 Eclipse (Twilight Sagas) 1 Double Down (Diary of a Wimpy Kid #11) 1 Dog Days (Diary of a Wimpy Kid, Book 4) (Volume 4) 1 Diary of a Wimpy Kid: The Long Haul 1 Diary of a Wimpy Kid: The Last Straw (Book 3) 1 Diary of a Wimpy Kid: Hard Luck, Book 8 1 Wrecking Ball (Diary of a Wimpy Kid Book 14) 1 Name: Name, dtype: int64
Top Author: Jeff Kinney¶
df.loc[df['Author']=='Jeff Kinney']
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
42 | Cabin Fever (Diary of a Wimpy Kid, Book 6) | Jeff Kinney | 4.8 | 4505 | 0 | 2011 | Fiction |
71 | Diary of a Wimpy Kid: Hard Luck, Book 8 | Jeff Kinney | 4.8 | 6812 | 0 | 2013 | Fiction |
72 | Diary of a Wimpy Kid: The Last Straw (Book 3) | Jeff Kinney | 4.8 | 3837 | 15 | 2009 | Fiction |
73 | Diary of a Wimpy Kid: The Long Haul | Jeff Kinney | 4.8 | 6540 | 22 | 2014 | Fiction |
80 | Dog Days (Diary of a Wimpy Kid, Book 4) (Volum... | Jeff Kinney | 4.8 | 3181 | 12 | 2009 | Fiction |
88 | Double Down (Diary of a Wimpy Kid #11) | Jeff Kinney | 4.8 | 5118 | 20 | 2016 | Fiction |
253 | Old School (Diary of a Wimpy Kid #10) | Jeff Kinney | 4.8 | 6169 | 7 | 2015 | Fiction |
381 | The Getaway | Jeff Kinney | 4.8 | 5836 | 0 | 2017 | Fiction |
435 | The Meltdown (Diary of a Wimpy Kid Book 13) | Jeff Kinney | 4.8 | 5898 | 8 | 2018 | Fiction |
468 | The Third Wheel (Diary of a Wimpy Kid, Book 7) | Jeff Kinney | 4.7 | 6377 | 7 | 2012 | Fiction |
474 | The Ugly Truth (Diary of a Wimpy Kid, Book 5) | Jeff Kinney | 4.8 | 3796 | 12 | 2010 | Fiction |
545 | Wrecking Ball (Diary of a Wimpy Kid Book 14) | Jeff Kinney | 4.9 | 9413 | 8 | 2019 | Fiction |
for author in list_of_top_ten_authors:
print(author)
Jeff Kinney Gary Chapman Rick Riordan Suzanne Collins American Psychological Association Dr. Seuss Gallup Rob Elliott Stephen R. Covey Stephenie Meyer
Observations
- 75% of authors only published twice during the time period
- The Top 10 most frequent authors are:
- Jeff Kinney
- Gary Chapman
- Rick Riordan
- Suzanne Collins
- American Psychological Association
- Dr. Seuss
- Gallup
- Rob Elliott
- Stephen R. Covey
- Stephenie Meyer
Book Title¶
Let's investigate the most frequent book titles between 2009-2019.
book_titles = df["Name"].value_counts()
print(book_titles)
Publication Manual of the American Psychological Association, 6th Edition 10 StrengthsFinder 2.0 9 Oh, the Places You'll Go! 8 The Very Hungry Caterpillar 7 The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change 7 .. Humans of New York : Stories 1 Howard Stern Comes Again 1 Homebody: A Guide to Creating Spaces You Never Want to Leave 1 Have a Little Faith: A True Story 1 Night (Night) 1 Name: Name, Length: 351, dtype: int64
Summary Statistics¶
book_titles.describe()
count 351.000000 mean 1.566952 std 1.271868 min 1.000000 25% 1.000000 50% 1.000000 75% 2.000000 max 10.000000 Name: Name, dtype: float64
Box plot¶
plt.boxplot(book_titles)
plt.title('Publishing Frequency of Book Titles')
plt.xlabel('Book')
plt.ylabel('Number of Published Entries')
plt.show()
Observations
Similar to the frequency of author names, 75% of book titles appear 2 or fewer times. Meaning, that 25% of book titles appear 3 or more times during the time period of 2009-2019.
Most Frequent Book Titles¶
Based on our previous observation, we will define 'most frequent' as any book title that appears 3 or more times during the time period of 2009-2019.
mask_book_tiles = book_titles > 2
books_most_frequent = book_titles[mask_book_tiles]
print(len(books_most_frequent), 'book titles appear 3 or more times')
print(books_most_frequent)
41 book titles appear 3 or more times Publication Manual of the American Psychological Association, 6th Edition 10 StrengthsFinder 2.0 9 Oh, the Places You'll Go! 8 The Very Hungry Caterpillar 7 The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change 7 The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book) 6 Jesus Calling: Enjoying Peace in His Presence (with Scripture References) 6 The Official SAT Study Guide 5 To Kill a Mockingbird 5 The 5 Love Languages: The Secret to Love That Lasts 5 The 5 Love Languages: The Secret to Love that Lasts 5 Laugh-Out-Loud Jokes for Kids 5 How to Win Friends & Influence People 5 Unbroken: A World War II Story of Survival, Resilience, and Redemption 5 The Five Dysfunctions of a Team: A Leadership Fable 5 Giraffes Can't Dance 5 Wonder 5 First 100 Words 5 The Fault in Our Stars 4 Dear Zoo: A Lift-the-Flap Book 4 The Wonderful Things You Will Be 4 The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing 4 Good to Great: Why Some Companies Make the Leap and Others Don't 4 Thug Kitchen: The Official Cookbook: Eat Like You Give a F*ck (Thug Kitchen Cookbooks) 4 The Help 4 You Are a Badass: How to Stop Doubting Your Greatness and Start Living an Awesome Life 4 Knock-Knock Jokes for Kids 3 Catching Fire (The Hunger Games) 3 Game of Thrones Boxed Set: A Game of Thrones/A Clash of Kings/A Storm of Swords/A Feast for Crows 3 Gone Girl 3 The Day the Crayons Quit 3 Goodnight Moon 3 The Immortal Life of Henrietta Lacks 3 The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life 3 Player's Handbook (Dungeons & Dragons) 3 Milk and Honey 3 The Whole30: The 30-Day Guide to Total Health and Food Freedom 3 Mindset: The New Psychology of Success 3 Crazy Love: Overwhelmed by a Relentless God 3 Mockingjay (The Hunger Games) 3 The Great Gatsby 3 Name: Name, dtype: int64
plt.figure(figsize=(8,6), dpi=80)
sns.histplot(data=books_most_frequent, bins='auto', binwidth=1)
plt.title('Most Frequent Book Titles')
plt.xlabel('Number of Times Published')
plt.ylabel('Number of Book Titles')
plt.show()
Top 10 Book Titles¶
top_ten_books = books_most_frequent.head(10)
print(top_ten_books)
Publication Manual of the American Psychological Association, 6th Edition 10 StrengthsFinder 2.0 9 Oh, the Places You'll Go! 8 The Very Hungry Caterpillar 7 The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change 7 The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book) 6 Jesus Calling: Enjoying Peace in His Presence (with Scripture References) 6 The Official SAT Study Guide 5 To Kill a Mockingbird 5 The 5 Love Languages: The Secret to Love That Lasts 5 Name: Name, dtype: int64
Bar plot¶
# convert series to data frame
top_ten_books_df = pd.DataFrame({'Name': top_ten_books.index, 'Count': top_ten_books.values})
# create bar plot of data frame
plt.figure(figsize=(10,8))
sns.barplot(data=top_ten_books_df, x='Name', y='Count')
plt.title('Top 10 Books')
plt.xlabel('Book Title')
plt.xticks(rotation=90)
plt.ylabel('Count')
plt.show()
list_of_top_ten_books = top_ten_books.index.tolist()
df_top_10_books = df[df['Name'].isin(list_of_top_ten_books)]
df_top_10_books
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
187 | Jesus Calling: Enjoying Peace in His Presence ... | Sarah Young | 4.9 | 19576 | 8 | 2011 | Non Fiction |
188 | Jesus Calling: Enjoying Peace in His Presence ... | Sarah Young | 4.9 | 19576 | 8 | 2012 | Non Fiction |
189 | Jesus Calling: Enjoying Peace in His Presence ... | Sarah Young | 4.9 | 19576 | 8 | 2013 | Non Fiction |
190 | Jesus Calling: Enjoying Peace in His Presence ... | Sarah Young | 4.9 | 19576 | 8 | 2014 | Non Fiction |
191 | Jesus Calling: Enjoying Peace in His Presence ... | Sarah Young | 4.9 | 19576 | 8 | 2015 | Non Fiction |
... | ... | ... | ... | ... | ... | ... | ... |
505 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2013 | Fiction |
506 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2014 | Fiction |
507 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2015 | Fiction |
508 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 0 | 2016 | Fiction |
509 | To Kill a Mockingbird | Harper Lee | 4.8 | 26234 | 7 | 2019 | Fiction |
68 rows × 7 columns
Top Book: Publication Manual of the American Psychological Association, 6th Edition¶
df.loc[df['Name']=='Publication Manual of the American Psychological Association, 6th Edition']
Name | Author | User Rating | Reviews | Price | Year | Genre | |
---|---|---|---|---|---|---|---|
271 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2009 | Non Fiction |
272 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2010 | Non Fiction |
273 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2011 | Non Fiction |
274 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2012 | Non Fiction |
275 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2013 | Non Fiction |
276 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2014 | Non Fiction |
277 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2015 | Non Fiction |
278 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2016 | Non Fiction |
279 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2017 | Non Fiction |
280 | Publication Manual of the American Psychologic... | American Psychological Association | 4.5 | 8580 | 46 | 2018 | Non Fiction |
for book in list_of_top_ten_books:
print(book)
Publication Manual of the American Psychological Association, 6th Edition StrengthsFinder 2.0 Oh, the Places You'll Go! The Very Hungry Caterpillar The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book) Jesus Calling: Enjoying Peace in His Presence (with Scripture References) The Official SAT Study Guide To Kill a Mockingbird The 5 Love Languages: The Secret to Love That Lasts
Observations
Top 10 most common book titles:
- Publication Manual of the American Psychological Association, 6th - Edition
- StrengthsFinder 2.0
- Oh, the Places You'll Go!
- The Very Hungry Caterpillar
- The 7 Habits of Highly Effective People: Powerful Lessons in Personal - Change
- The Four Agreements: A Practical Guide to Personal Freedom (A Toltec - Wisdom Book)
- Jesus Calling: Enjoying Peace in His Presence (with Scripture - References)
- The Official SAT Study Guide
- To Kill a Mockingbird
- The 5 Love Languages: The Secret to Love That Lasts
Analysis Conclusions¶
After the analysis, it's time to reflect on what was observed in the data. Let's look at our investigative questions and the answers yielded by our analysis.
Initial Questions¶
Our analysis has yield answers to our intial questions from the Ask Phase:
- What author(s) published the most between 2009-2019?
- Jeff Kinney is the most common author to appear with his series 'Diarty of a Wimpy Kid'
- The top 10 authors who appeared the most published at least 7 times during the 10-year period
- What percentage of books were fiction vs. non-ficiton?
- Non-fiction: 56%
- A majority (73%) of expensive books are non-fiction; the higher the price a big is, the more likely it is to be non-fiction
- Fiction: 44%
- Non-fiction: 56%
- What book appears the most between 2009-2019?
- Top book (appearing 10 times, every year): 'Publication Manual of the American Psychological Association, 6th Edition' publsihed by the American Pscyhological Association
- The top 10 books include:
- Publication Manual of the American Psychological Association, 6th Edition
- StrengthsFinder 2.0
- Oh, the Places You'll Go!
- The Very Hungry Caterpillar
- The 7 Habits of Highly Effective People: Powerful Lessons in Personal Change
- The Four Agreements: A Practical Guide to Personal Freedom (A Toltec Wisdom Book)
- Jesus Calling: Enjoying Peace in His Presence (with Scripture References)
- The Official SAT Study Guide
- To Kill a Mockingbird
- The 5 Love Languages: The Secret to Love That Lasts
Other Observations¶
- Most (75%) bestselling books are 16 USD or less with 50% of books priced between 7 USD and 16 USD
- There's little or no correllation between User Rating, Price, and Reviews.
- A slim majority (56%) of books are non-fiction
- 73% of expensive books (greater tan 16 USD) are non-fiction
- 75% of expensive books (greater than 16 USD) are priced between 17 USD and 30 USD
- A slim majority (56%) of inexpensive books (less than 17 USD) are fiction
- 75% of User Ratings are 4.5 or above with 50% between 4.5 and 4.8
- 75% of authors only publsihed twice during the 10-year period