• Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
  • Menu

Jon Krohn

  • Home
  • Fresh Content
  • Courses
  • Resources
  • Podcast
  • Talks
  • Publications
  • Sponsorship
  • Testimonials
  • Contact
Jon Krohn

The Tidyverse of Essential R Libraries and their Python Analogues, with Dr. Hadley Wickham

Added on April 30, 2024 by Jon Krohn.

Many-time bestselling author and prolific open-source R developer Hadley Wickham is our guest today. In it, we discuss Posit's rebrand and why the Tidyverse needs to be in every data scientist's toolkit.

More on Hadley:
• Chief Scientist at Posit PBC
• Adjunct Professor of Statistics at Stanford University, Rice University and The University of Auckland.
• Is best-known as the creator of the Tidyverse suite of open-source R libraries for data science, including the essential libraries dplyr and ggplot2.
• Has written seminal books on R programming for O'Reilly, Springer and CRC Press, including the mega-bestselling "R for Data Science".

Today’s episode will primarily be of interest to hands-on practitioners like data scientists and machine learning engineers. In it, Hadley details:
• Why the iconic open-source company RStudio rebranded to Posit.
• The philosophy of the tidyverse, amusing backstories on its most iconic packages and why the tidyverse is invaluable for all data scientists to be familiar with.
• The open-source projects he’s most excited about today.
• How you can easily get involved with career-bolstering open-source projects yourself.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, SuperDataScience, YouTube, Professional Development Tags superdatascience, datascience, machinelearning, statistics, rlanguage

Mixtral 8x22B: SOTA Open-Source LLM Capabilities at a Fraction of the Compute

Added on April 26, 2024 by Jon Krohn.

Today, I’m going to do my best to give you a five-minute update on a groundbreaking new open-source Large Language Model called Mixtral 8x22B groundbreaking new open-source Large Language Model called Mixtral 8x22B out of an extremely hot French startup called Mistral.

Read More

Generative AI in Practice, with Bernard Marr

Added on April 23, 2024 by Jon Krohn.

In today's episode, Bernard Marr — world-leading futurist (>4m social-media followers) and prolific author (20+ books!) — details how GenAI will revolutionize industries, enhance our lives and solve pressing global issues.

In case he isn’t already on your radar, Bernard:

• World-leading futurist who’s consulted with NVIDIA, Google, Microsoft, Amazon and many more on digital transformation and A.I. in business.

• His 20+ books have been translated into 20+ languages and earned several business and management "book of the year" awards; many have also been bestsellers.

• His writing has been featured in The Guardian, Financial Times, The Wall Street Journal, the Harvard Business Review and many other leading media outlets.

• Has over 4 million combined social media followers.

Today’s episode will be of interest to anyone who’d like to better understand Generative A.I. and how to adopt GenAI effectively at work or at home.

In this episode, Bernard details:

• The history of GenAI.

• How GenAI will pair with other industries like energy, healthcare and education to accelerate hyper-innovation across every aspect of society.

• The regulatory and ethical challenges associated with GenAI and how we can overcome them.

• How AI paradoxically makes us more human.

• How to successfully implement GenAI both professionally and personally.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Tags superdatascience, machinelearning, ai, genai, llms, future

Deep Utopia: AI Could Solve All Human Problems in Our Lifetime

Added on April 20, 2024 by Jon Krohn.

Today’s episode focuses on Nick Bostrom's latest book, Deep Utopia. Published a couple of weeks ago, it delves into the possibilities of a future where artificial intelligence has solved humanity's deepest problems.

Read More
In Five-Minute Friday, Podcast, SuperDataScience, YouTube Tags SuperDataScience, utopia, ai, ml, llm, machinelearning

What will humans do when machines are vastly more intelligent? With Aleksa Gordić

Added on April 17, 2024 by Jon Krohn.

Aleksa Gordić — the famed A.I. educator and multilingual-LLM entrepreneur — is my guest today. Brilliant and widely-read, Aleksa opines on what it will take to realize Artificial Super Intelligence and the consequences for humans.

Aleksa:

• Is Founder & CEO of Runa AI, a startup focused on building multilingual LLMs.

• Is an online educator that has built a community of 160,000 people in the A.I. space, including through his A.I. Epiphany YouTube channel.

• Previously, he was an A.I. Research Engineer at Google DeepMind in London and a Machine Learning Software Engineer at Microsoft.

• He holds a degree in Electronics and Computer Science from the University of Belgrade in Serbia.

Today’s episode contains tidbits here and there that will appeal primarily to hands-on machine learning practitioners, but it mostly should be of great interest to anyone.

In this episode, wildly-intelligent Aleksa details:

• Why multilingual LLMs provide so much value despite the cutting-edge LLMs like Claude 3, Gemini Ultra and GPT-4 supporting so many languages.

• His frameworks for entrepreneurial success and for effective self-directed learning.

• His analogy for how humans are born as a checkpoint of a Bayesian model that’s fine-tuned with reinforcement learning from human feedback (RLHF).

• What he thinks it will take to realize artificial super intelligence and what it could mean for human society when it arrives.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, SuperDataScience, YouTube Tags superdatascience, machine learning, ai, agi, entrepreneur, llms

RFM-1 Gives Robots Human-like Reasoning and Conversation Abilities

Added on April 14, 2024 by Jon Krohn.

Today’s episode is all about an LLM trained for robotics applications called RFM-1 that completely blows my mind because of the implications for what can now suddenly be accomplished so easily with robotics.

Read More
In Data Science, Podcast, SuperDataScience, YouTube Tags ai, robotics, NLP, physics, GPT4, Covariant

Deep Reinforcement Learning for Maximizing Profits, with Prof. Barrett Thomas

Added on April 9, 2024 by Jon Krohn.

Today, Prof. Barrett Thomas blends his rich technical understanding of Deep Reinforcement Learning with his commercial savviness to eloquently detail how Deep RL can be leveraged to minimize costs and maximize profits.

Barrett:

• Is Research Professor in Business Analytics and Senior Associate Dean at the University of Iowa’s College of Business.

• As will soon be unsurprising to you when you hear how well he communicates complex concepts, he’s won multiple teaching awards (amongst other academic prizes).

• He holds a PhD in Industrial and Operations Engineering from the University of Michigan.

Today’s episode is a technical one that will appeal primarily to hands-on practitioners like data scientists, software developers and machine learning engineers.

In this episode, Barrett details:

• What Markov Decision Processes are and how they relate to Deep Reinforcement Learning.

• How operations research leverages neural networks to minimize business costs and maximize business profits.

• How same-day delivery has been made possible by machine learning.

• How aerial drones and autonomous vehicles will revolutionize supply chains and transportation.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Interview, SuperDataScience, YouTube Tags superdatascience, machinelearning, ai, deepreinforcementlearning, logistics, supplychain, drones, profit

In Case You Missed It in March 2024

Added on April 9, 2024 by Jon Krohn.

We're trying something novel on the SuperDataScience Podcast today: an ICMYI ("in case you missed it") episode that highlights the most gripping moments from my conversations with guests over the past month.

Please let me know what you think of this! Does it work for you? What would you change about it? Should we stop doing these entirely? Let me know right here on this post; your voice matters :)

For this inaugural ICYMI episode, conversation highlights include:

1. Sebastian Raschka, PhD on how Lightning AI makes LLM training and deployment easy (from Episode #767).

2. Dr. Travis Oliphant, creator of the ubiquitous NumPy and SciPy libraries, on the future of scientific computing (#765).

3. Award-winning, A.I.-focused venture capitalist Rudina Seseri letting us know what it takes to get a VC firm to invest in you (#763).

4. Prof. Zachary Lipton on his roadmap from AI startup to long-term commercial success (#769).

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, machine learning, ai, llms

Gradient Boosting: XGBoost, LightGBM and CatBoost, with Kirill Eremenko

Added on April 2, 2024 by Jon Krohn.

You wanted more of Kirill Eremenko, now you've got it! Kirill returns to the show today to detail Decision Trees, Random Forests and all three of the leading gradient-boosting algorithms: XGBoost, LightGBM and CatBoost 😸

If you don’t already know him, Kirill: 
• Is Founder and CEO of SuperDataScience, an e-learning platform that is the namesake of this very podcast.
• Launched the SuperDataScience Podcast in 2016 and hosted the show until he passed me the reins four years ago.
• Has reached more than 2.7 million students through the courses he’s published on Udemy, making him Udemy’s most popular data science instructor.

Today’s episode is a highly technical one focused specifically on Gradient Boosting methods and the foundational theory required to understand them. I expect this episode will be of interest primarily to hands-on practitioners like data scientists, software developers and machine learning engineers.

In this episode, Kirill details: 
• Decisions Trees.
• How Decision Trees are ensembled into Random Forests via Bootstrap Aggregation.
• How the AdaBoost algorithm formed a bridge from Random Forests to Gradient Boosting.
• How Gradient Boosting works for both regression and classification tasks.
• All three of the most popular Gradient Boosting approaches — XGBoost, LightGBM and CatBoost — as well as when you should choose them.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, SuperDataScience, YouTube Tags superdatascience, machine learning, decision trees, gradient boosting

The Neuroscientific Guide to Confidence

Added on March 30, 2024 by Jon Krohn.

The inspiring entrepreneur Lucy Antrobus has run confidence-building workshops for thousands of people. In today's episode, she details her neuroscience-backed formula for developing bulletproof confidence.

Lucy: 
• Advises the United Nations on innovation for impact.
• Was previously Founder/CEO of an award-winning NGO and Co-founder/COO of an edtech company.
• Critically for today’s episode, she has run confidence-building workshops for over 1000 people of 30+ nationalities, including refugees who have just arrived in Switzerland.

Today’s episode should be fascinating to anyone!

In it, Lucy details:
• The science of confidence, which we can grow through repetition and practice, much like we can develop muscles by repeating lifts at the gym.
• Concrete guidance from neuroscience research on what we can do to develop healthy confidence in ourselves and in those around us.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Tags confidence, selfconfidence, neuroscience, presentationskills, podcast

Generative AI for Medicine, with Prof. Zack Lipton

Added on March 26, 2024 by Jon Krohn.

Generative A.I. is rapidly transforming medicine. My guest today is brilliant, inspiring Prof. Zachary Lipton — Chief Scientific Officer and CTO of Abridge, a startup that has quickly raised $208m to lead the transformation!

More on Zack:

• Assoc. Prof. in the Machine Learning Dept. of Carnegie Mellon University's Computer Science school.

• Highly-cited (23k+ citations) with research spanning core ML methods and theory, as well as applications in healthcare and NLP.

• Directs the Approximately Correct Machine Intelligence (ACMI) Lab at CMU, where they build robust systems for the real world.

• Is also a jazz saxophonist! 🎷

Despite Zack being such a deep technical expert, most of today’s content will be of interest to anyone who’d like to hear about the cutting edge of generative A.I. applications in healthcare.

The tech that Zack is leading development of at Abridge, which you can hear about in today's episode:

• Initial deployment uses ambient listening and generative A.I. to reduce the cognitive burden of clinical documentation, reducing burnout as well as enabling clinicians to spend less time with computers and more with patients.

• Industry-leading automatic speech recognition engine specifically designed for healthcare applications; can accurately transcribe speech in challenging environments, e.g., when there is background noise or when multiple people are speaking.

• Supports 14+ languages including handling code-switching (where speakers shift between languages) and interpreter-mediated conversations.

• In-house LLM development allows greater customization and responsible-use features, such as transparency (e.g., links to source transcript/audio) and evidence extraction (verification process).

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, SuperDataScience, YouTube Tags SuperDataScience, machine learning, ai, genai

Is Claude 3 Better than GPT-4?

Added on March 22, 2024 by Jon Krohn.

So across a broad range of tests, of benchmarks for LLMs, like MMLU, GPQA, Grade 8 school math, and tons of other tests, Claude 3 Opus, their now, their largest and most powerful model amongst the Claude 3 models.

Read More

Open-Source LLM Libraries and Techniques, with Dr. Sebastian Raschka

Added on March 22, 2024 by Jon Krohn.

Today's superhuman guest is Dr. Sebastian Raschka,, author of the bestselling "ML with PyTorch and sklearn" book, iconic technical blogger (>350k followers) and Staff Research Engineer at Lightning AI. Hear him detail open-source libraries for LLMs.

More on Sebastian:

• Is Staff Research Engineer at Lightning AI, the company behind the popular PyTorch Lightning open-source library for training and deploying PyTorch models, including Large Language Models (LLMs), with ease.
• Iconic technical blogger (50k subscribers) and social-media contributor (>350k combined followers across LinkedIn and Twitter)
• Was previously Assistant Professor of Statistics at University of Wisconsin-Madison.
• Holds a PhD in statistical data mining from Michigan State University.

Today’s episode is technical and will primarily be of interest to hands-on practitioners like data scientists, software developers and machine learning engineers.

In it, Sebastian details:

• The many super-helpful open-source libraries that PyTorch Lightning leads development of.
• Dora parameter-efficient fine-tuning.
• Google’s “open-source” Gemma models.
• Multi-query attention.
• The leading alternatives to RLHF.
• Where he sees the next big opportunities in LLM development.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Vonnegut's Player Piano (1952): An Eerie Novel on the Current AI Revolution

Added on March 17, 2024 by Jon Krohn.

Player Piano, despite being written seven decades ago, could not be more relevant to the AI revolution that’s accelerated dramatically in the past year.

Read More

NumPy, SciPy and the Economics of Open-Source, with Dr. Travis Oliphant

Added on March 12, 2024 by Jon Krohn.

Huge episode today with iconic Dr. Travis Oliphant, creator of NumPy and SciPy, the standard libraries for numeric operations (downloaded 8 million and 3 million times PER DAY, respectively). Hear about the future of open-source, including the impact of GenAI.

More on Travis:

• Founded Anaconda, Inc., the company behind the also-ubiquitous Python package manager.

• Founded the massive PyData conferences and communities as well as its associated non-profit foundation, NumFOCUS.

• Currently serves as the CEO of two firms: OpenTeams and Quansight.

• Holds a PhD in biomedical engineering from the Mayo Clinic in Minnesota.

Today’s episode will primarily be of interest to hands-on practitioners like data scientists, software developers and machine learning engineers.

In it, Travis details:

• How his journey creating open-source software began and how NumPy and SciPy grew to become the most popular foundational Python libraries for working with data.

• How he identifies commercial opportunities to support his vast open-source efforts and communities.

• How AI, particularly generative AI, is transforming open-source development.

• Where open-source innovation is headed in the years to come.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Podcast, SuperDataScience, YouTube Tags superdatascience, datascience, numpy, scipy, python, opensource

The Top 10 Episodes of 2023

Added on March 11, 2024 by Jon Krohn.

In 2023, we had a new record of 4 million combined podcast downloads and YouTube views. That’s up from 3.3 million a year earlier; thank you for your support listening, rating, sharing, liking, commenting on episodes and so on!

Read More

The Best A.I. Startup Opportunities, with venture capitalist Rudina Seseri

Added on March 5, 2024 by Jon Krohn.

How should an A.I. startup find product-market fit? How do some A.I. startups become spectacularly successful? The renowned (and highly technical!) A.I. venture-capital investor Rudina Seseri answers these questions and more in today's episode.

Rudina:

• Founder and Managing Partner of Glasswing Ventures in Boston.

• Led investments and/or served on the Board of Directors of more than a dozen SaaS startups, many of which were acquired.

• Was named Startup Boston's 2022 "Investor of the Year" amongst many other formal recognitions.

• Is a sought-after keynote speaker on investing in A.I. startups.

• Executive Fellow at Harvard Business School.

• Holds an MBA from Harvard University.

Today’s episode will be interesting to anyone who’s keen on scaling their impact with A.I., particularly through A.I. startups or investment.

In this episode, Rudina details:

• How data are used to assess venture capital investments.

• What makes particular AI startups so spectacularly successful.

• Her "A.I. Palette" for examining categories of machine learning models and mapping them to categories of training data.

• How Generative AI isn’t a fad, but it is still only a component of the impact that AI more broadly can make.

• The automated systems she has built for staying up to date on all of the most impactful AI developments.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Podcast, Professional Development, SuperDataScience, YouTube Tags superdatascience, machinelearning, ai, startups, venturecapital, aistartup

Gemini 1.5 Pro, the Million-Token-Context LLM

Added on March 1, 2024 by Jon Krohn.

In episode, #761, we detailed the public release of Google’s Gemini Ultra, the only LLM that is in the same class as OpenAI’s GPT-4 in terms of capabilities. Well, hot on the heels of that announcement, is the release of Gemini Pro 1.5.

Read More

Gemini Ultra: How to Release an A.I. Product for Billions of Users, with Google’s Lisa Cohen

Added on February 27, 2024 by Jon Krohn.

Google recently released Gemini Ultra, their largest language model. I love Ultra and now use it instead of GPT-4 on many tasks. Today's guest, Lisa Cohen, leads Gemini's rollout; hear from her how a company with billions of users rolls out new A.I. products.

More on Gemini Ultra:

• The only LLM with comparable capabilities to GPT-4 (in my experience as well as on benchmark evaluations, although I know benchmarking has plenty of issues!)

• Ultra maintains attention across large context windows (Gemini 1.5 Pro has a million-token context, btw!), competently generating natural language and code.

• Like GPT-4V, Ultra is multi-modal and so accepts both an image and text as input at the same time.

• Piggybacking on Google's excellence at search, I’ve found Gemini Ultra to be particularly effective at tasks that involve real-time search (the Google "Bard" project that focused on real-time information retrieval was renamed "Gemini" when Gemini Ultra was released).

Lisa Cohen is perhaps the best person on the planet to be speaking to about the momentous Gemini releases because Lisa is Director of Data Science & Engineering for Google's Gemini, Assistant and Search Platforms. In addition, she:

• Was previously Senior Director of Data Science at Twitter and Principal Director of Data Science at Microsoft.

• Holds a Master's in Applied Math from Harvard University.

In this episode, Lisa details:

• The three LLMs in Google’s Gemini family and how the largest one, Gemini Ultra, fits in.

• The many ways you can access Gemini models today.

• How absolutely enormous LLM projects are carried out and how they’re rolled out safely and confidently to literally billions of users.

• How LLMs like Gemini Ultra are transforming life and work for everyone from data scientists to educators to children, and how this transformation will continue in the coming years.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

In Data Science, Interview, Podcast, SuperDataScience, YouTube Tags superdatascience, machinelearning, geminiai, ai, geminiultra, llms

Humans Love A.I.-Crafted Beer

Added on February 23, 2024 by Jon Krohn.

I recently recorded tipplers' reactions as they had their first taste of the A.I.-crafted "Krohn&Borg" lager I co-developed. Today's episode illustrates the result: Humans love A.I. beer! There's also cool content on using CRISPR-Cas9 to modify yeast genes.

Thanks again to Beau Warren, Head Brewer at Species X Beer Project, for the opportunity to collaborate on this delicious project. You can check out Episode #755 for tons of detail on the ML packages used and the models developed to craft beer with A.I.

And thanks to all of the guests/judges in today's episode:

• Rehgan Avon of AlignAI

• Alexandra Hagmeyer (Dauterman) of Path Robotics

• Kelsey Dingelstedt of Women in Analytics (WIA)

• William McFarland of Omega Yeast

• Jim Lachey of the Super Bowl XXVI-winning Washington Commanders

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.

Tags superdatascience, machinelearning, ai, beer, yeast, crispr, crisprcas9
← Newer Posts Older Posts →
Back to Top