Rehgan Avon's DataConnect conference is this week and is getting rave reviews. In this SuperDataScience episode, Jon Krohn, the silver-tongued entrepreneur details how organizations can successfully adopt A.I.
Read MoreFiltering by Category: Professional Development
Brain-Computer Interfaces and Neural Decoding, with Prof. Bob Knight
In today's extraordinary episode, Prof. Bob Knight details how ML-powered brain computer interfaces (BCIs) could allow real-time thought-to-speech synthesis and the reversal of cognitive decline associated with aging.
This is a rare treat as "Dr. Bob" doesn't use social media and has only made two previous podcast appearances: on Ira Flatow's "Science Friday" and a little-known program called "The Joe Rogan Experience".
Dr. Bob:
• Is Professor of Neuroscience and Psychology at University of California, Berkeley.
• Is Adjunct Professor of Neurology and Neurosurgery at UC San Francisco.
• Over his career, has amassed tens of millions of dollars in research funding, 75 patents, and countless international awards for neuroscience and cognitive computing research.
• His hundreds of papers have together been cited over 70,000 times.
In this episode, Bob details:
• Why the “prefrontal cortex” region of our brains makes us uniquely intelligent relative to all the other species on this planet.
• The invaluable data that can be gathered by putting recording electrodes through our skulls and directly into our brains.
• How "dynamic time-warping" algorithms allow him to decode imagined sounds, even musical melodies, through recording electrodes implanted into the brain.
• How BCIs are life-changing for a broad range of illnesses today.
• The extraordinary ways that advances in hardware and machine learning could revolutionize medical care with BCIs in the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Tools for Building Real-Time Machine Learning Applications, with Richmond Alake
Today, the astonishingly industrious ML Architect and entrepreneur Richmond Alake crisply describes how to rapidly develop robust and scalable Real-Time Machine Learning applications.
Richmond:
• Is a Machine Learning Architect at Slalom Build, a huge Seattle-based consultancy that builds products embedded with analytics and ML.
• Is Co-Founder of two startups: one uses computer vision to correct peoples’ form in the gym and the other is a generative A.I. startup that works with human speech.
• Creates/delivers courses for O'Reilly and writes for NVIDIA.
• Previously worked as a Computer Vision Engineer and as a Software Developer.
• Holds a Masters in Computer Vision, ML and Robotics from the University of Surrey.
Today’s episode will appeal most to technical practitioners, particularly those who incorporate ML into real-time applications, but there’s a lot in this episode for anyone who’d like to hear about the latest tools for developing real-time ML applications from a leader in the field.
In this episode, Richmond details:
• The software choices he’s made up and down the application stack — from databases to ML to the front-end — across his startups and the consulting work he does.
• The most valuable real-time ML tools he teaches in his courses.
• Why writing for the public is an invaluable career hack that everyone should be taking advantage of.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
Today, the wildly intelligent Dr. Matar Haller introduces Contextual A.I. (which considers adjacent, often multimodal information when making inferences) as well as how to use ML to build moat around your company.
Matar:
• Is VP of Data and A.I. at ActiveFence, an Israeli firm that has raised over $100m in venture capital to protect online platforms and their users from malicious behavior and malicious content.
• Is renowned for her top-rated presentations at leading conferences.
• Previously worked as Director of Algorithmic A.I. at SparkBeyond, an analytics platform.
• Holds a PhD in neuroscience from the University of California, Berkeley.
• Prior to data science, taught soldiers how to operate tanks.
Today’s episode has some technical moments that will resonate particularly well with hands-on data science practitioners but for the most part the episode will be interesting to anyone who wants to hear from a brilliant person on cutting-edge A.I. applications.
In this episode, Matar details:
• The “database of evil” that ActiveFence has amassed for identifying malicious content.
• Contextual A.I. that considers adjacent (and potentially multimodal) information when classifying data.
• How to continuously adapt A.I. systems to real-world adversarial actors.
• The machine learning model-deployment stack she uses.
• The data she collected directly from human brains and how this research relates to the brain-computer interfaces of the future.
• Why being a preschool teacher is a more intense job than the military.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Business Intelligence Tools, with Mico Yuk
Today's guest is the straight shooter Mico Yuk, who pulls absolutely no punches in her assessment of, well, anything! ...but particularly about vendors in the business intelligence and data analytics space. Enjoy!
Mico:
• Is host of the popular Analytics on Fire Podcast (top 2% worldwide).
• Co-founded the BI Brainz Group, an analytics consulting and solutions company that has taught over 15,000 students analytics, visualization and data storytelling courses — included at major multinationals like Nestlé, FedEx and Procter & Gamble.
• Authored the "Data Visualization for Dummies" book.
• Is a sought-after keynote speaker and TV-news commentator.
In this episode, Mico details:
• Her BI (business intelligence) and analytics framework that persuades executives with data storytelling.
• What the top BI tools are on the market today.
• The BI trends she’s observed that could predict the most popular BI tools of the coming years.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Automating Industrial Machines with Data Science and the Internet of Things (IoT)
Despite poor lighting on my face in today's video version (my bad!), we've got a fascinating episode with the brilliant (and well-lit!) Allegra Alessi, who details how data science is automating industrial machines.
Allegra:
• Is Product Owner for IoT (Internet of Things) devices at BOBST, a Swiss industrial manufacturing giant.
• Previously, she worked as a Product Owner and Data Scientist for Rolls-Royce in the UK and as a Data Scientist for Alstom, the enormous train manufacturing company, in Paris.
• She holds a Master’s in Engineering from Politecnico di Milano in Italy.
In this episode, Allegra details:
• How modern industrial machinery depends on data science for real-time performance analytics, predicting issues before they happen, and fully automating their operations.
• The tech stack her team uses to build data-driven IoT platforms.
• The key methodologies she uses to be effective at product management.
• The kinds of data scientists that might be ideally suited to moving into a product role.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
52nd St. Gallen Symposium Recap
The St. Gallen Symposium, held annually in Switzerland since student riots in the 1960s, promotes cross-generational dialogue. This year's theme of "A New Generational Contract" set a path for a more resilient, sustainable future. Throughout the week, I reconnected with many inspiring old friends from previous Symposia and met many exceptional new ones, particularly a large number of electrifying social-impact-oriented entrepreneurs and business leaders. A *lot* happened over my three days there; below are the highlights.
Read MoreGPT-4 Has Arrived
SuperDataScience episode #666 — appropriate for an algorithm that has folks (quixotically) signing a letter to pause all A.I. development. In this first episode of the GPT-4 trilogy; in ten minutes, I introduces GPT-4's staggering capabilities.
A Leap in AI Safety and Accuracy
GPT-4 marks a significant advance over its predecessor, GPT-3.5, in terms of both safety and factual accuracy. It is reportedly 82% less likely to respond with disallowed content and 40% more likely to produce factually correct responses. Despite improvements, challenges like sociodemographic biases and hallucinations persist, although they are considerably reduced.
Academic and Professional Exam Performance
The prowess of GPT-4 becomes evident when revisiting queries initially tested on GPT-3.5. Its ability to summarize complex academic content accurately and its human-like response quality are striking. In one test, GPT-4’s output was mistaken for human writing by GPTZero, an AI detection tool, underscoring its sophistication. In another test, the uniform bar exam, GPT-4 scored in the 90th percentile, a massive leap from GPT-3.5's 10th percentile.
Multimodality
GPT-4 introduces multimodality, handling both language and visual inputs. This capability allows for innovative interactions, like recipe suggestions based on fridge contents or transforming drawings into functional websites. This visual aptitude notably boosted its performance in exams like the Biology Olympiad, where GPT-4 scored in the 99th percentile.
The model also demonstrates proficiency in numerous languages, including low-resource ones, outperforming other major models in most languages tested. This linguistic versatility extends to its translation capabilities between these languages.
The Secret Behind GPT-4’s Success
While OpenAI has not disclosed the exact number of model parameters in GPT-4, it's speculated that they significantly exceed GPT-3's 175 billion. This increase, coupled with more and better-curated training data, and the ability to handle vastly more context (up to 32,000 tokens), are likely contributors to GPT-4's enhanced performance.
Reinforcement Learning from Human Feedback (RLHF)
GPT-4 incorporates RLHF, a method that refines its output based on user feedback, allowing it to align more closely with desired responses. This approach has already proven effective in previous models like InstructGPT.
GPT-4 represents a monumental step in AI development, balancing unprecedented capabilities with improved safety measures. Its impact is far-reaching, offering new possibilities in various fields and highlighting the importance of responsible AI development and use. As we continue to explore its potential, the conversation around AI safety and ethics becomes increasingly vital.
The SuperDataScience GPT-4 trilogy is comprised of:
• #666 (today): an introductory overview by yours truly
• #667 (Tuesday): world-leading A.I.-monetization expert Vin Vashishta joins me to detail how you can leverage GPT-4 to your commercial advantage
• #668 (next Friday): world-leading A.I.-safety expert Jeremie Harris joins me to detail the (existential!) risks of GPT-4 and the models it paves the way for
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Open-Source Tools for Natural Language Processing
In today's episode, the brilliant Vincent Warmerdam regales us with invaluable ideas and open-source software libraries for developing A.I. (particularly Natural Language Processing) applications. Enjoy!
Vincent:
• Is an ML Engineer at Explosion, the German software company that specializes in developer tools for A.I. and NLP such as spaCy and Prodigy.
• Is renowned for several open-source tools of his own, including Doubtlab.
• Is behind an educational platform called Calmcode that has over 600 short and conspicuously enjoyable video tutorials about software engineering concepts.
• Was Co-Founder and Chair of PyData Amsterdam.
• Has delivered countless amusing and insightful PyData talks.
• Holds a Masters in Econometrics and Operations Research from Vrije Universiteit Amsterdam (VU Amsterdam)).
Today’s episode will appeal primarily to technical listeners as it focuses primarily on ideas and open-source software libraries that are indispensible for data scientists, particularly those developing A.I. or NLP applications.
In this episode, Vincent details:
• The prompt recipes he developed to enable OpenAI GPT architectures to perform tremendously helpful NLP tasks such as data labeling.
• The super-popular open-source libraries he’s developed on his own as well as with Explosion.
• The software tools he uses daily including several invaluable open-source packages made by other folks.
• How both linguistics and operations research are extremely useful fields to be a better NLP practitioner and ML practitioner, respectively.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Hg Capital's "Digital Forum"
At Hg Capital's "Digital Forum" in London, I delivered a keynote on "Getting Value from A.I." — my slides and the slickly-edited video production on YouTube are available now.
With a focus on B2B SaaS applications, over 45 minutes I covered:
1. What Deep Learning A.I. is and How it Works
2. Tasks that are Replaceable with A.I. vs Tasks that can be Augmented
3. How to Effectively Implement A.I. Research into Production
The audience engagement was terrific and the on-stage Q&A carried on afterward for an energizing 30 additional minutes. It felt like we could have kept on going much longer!
NLP with ChatGPT (and other LLMs)
Over 1400 people registered for yesterday's "NLP with ChatGPT (and other LLMs)" conference that I hosted in the O'Reilly Media platform. Kudos to speakers Sinan, Melanie and Shaan for making it a smashing success 🎉
This screenshot is a taste of what it looked like from inside the broadcasting platform, captained flawlessly by producers Joan Baker and Nurul Ishak, PMP.
The presenters each spent 30 minutes presenting on their topics and then engaged in riveting Q&A with the highly engaged attendees:
• Sinan Ozdemir: The A.I. entrepreneur and author introduced the theory behind Transformer Architectures and LLMs like BERT, T5, and GPT.
• Melanie Subbiah: A first author on the original GPT-3 paper, she led interactive demos of the broad range of capabilities of LLMs like ChatGPT.
• Shaan Khosla: A data scientist on my team at Nebula.io, he detailed practical tips on training, validating, and productionizing LLMs hands-on in Python.
I've heard word that, unusually for a live event in O'Reilly, the footage of this conference will be made available as a video within the platform. Stay tuned for details!
Getting Value From A.I.
My keynote on "Getting Value from A.I." — which I delivered at Hg Capital's "Digital Forum" in London — is now live on YouTube!
With a focus on B2B SaaS applications, over 45 minutes I covered:
1. What Deep Learning A.I. is and How it Works
2. Tasks that are Replaceable with A.I. vs Tasks that can be Augmented
3. How to Effectively Implement A.I. Research into Production
The audience engagement was terrific and the on-stage Q&A carried on afterward for an energizing 20 additional minutes. All of this is captured in the slickly-edited video production.
A.I. Talent and the Red-Hot A.I. Skills
What skills and traits do the best A.I. talent have? And how do you attract the best A.I. talent to your firm? Jaclyn Rice Nelson of Tribe AI, the world's most prestigious ML collective, fills us in in today's episode.
Jaclyn:
• Is Co-Founder/CEO of Tribe A.I., a "collective" of ML engineers and data scientists that drop into companies to accelerate their A.I. capabilities.
• Previously worked in senior roles at Google and CapitalG, Alphabet's growth equity fund.
In today's episode, she details:
• What characterizes the very best A.I. talent.
• What skills you should learn today to be tomorrow’s top A.I. talent.
• How to attract the top engineers and data scientists to your firm.
• The specific category of A.I. project that her clients are suddenly demanding tons of help with.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Simplifying Machine Learning
Today, Mariya Sha — host of the wildly popular "Python Simplified" YouTube channel (140k subscribers!) — taps her breadth of A.I. expertise to provide a fun and fascinating finale to SuperDataScience guest episodes for 2022.
Mariya:
• Is the mind behind the "Python Simplified" YouTube channel that makes advanced concepts (e.g., ML, neural nets) simple to understand.
• Her videos cover Python-related topics as diverse as data science, web scraping, automation, deep learning, GUI development, and OOP.
• Is renowned for taking complex concepts such as gradient descent or unsupervised learning and explaining them in a straightforward manner that leverages hands-on, real-life examples.
• Is pursuing a bachelor's in Computer Science (with a specialization in A.I. and Machine Learning) from the University of London.
Today’s episode should appeal to anyone who’s interested in or involved with data science, machine learning, or A.I.
In this episode, Mariya details:
• How the incredible potential of ML in our lifetimes inspired her to shift her focus from web-development languages like JavaScript to Python.
• Why automation and web scraping are critical skills for data scientists.
• How to make learning any apparently complex data science concept straightforward to comprehend.
• Her favorite Python libraries and software tools.
• One rarely-mentioned topic that every data scientist would benefit from.
• The pros and cons of pursuing a 100% remote degree in computer science.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
How to Influence Others with Your Data
If you ever use data to make decisions or to persuade those around you to make data-driven decisions, today’s episode is jam-packed with relevant, practical tips from data presentation guru Ann K. Emery.
Ann:
• Is an internationally-acclaimed speaker who delivers 100+ keynotes, workshops, and webinars each year to enable people to share data-driven insights more effectively.
• She has consulted on data visualization, data reporting, and data presentation with over 200 organizations — the likes of the United Nations, the US Centers for Disease Control, and Harvard University.
• She holds a BA in Psychology and Spanish from the University of Virginia and a Masters in Educational Psychology Evaluation, Assessment, and Testing from George Mason University.
I rarely say that everyone should listen to an episode, but this is one of those rare cases.
In this episode, Ann details:
• What data storytelling is.
• Best practices for data visualization.
• Surprising tricks you can pull off with spreadsheet software.
• How to report on data effectively.
• Her top tips for presenting data in a slideshow.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Analytics Career Orientation
Considering a Data Analytics career? Today's episode with YouTube icon Luke Barousse (273k subscribers) will be particularly appealing to you, but the terrifically interesting guest makes for an episode that anyone will love.
Luke:
• Is a full-time YouTuber, creating highly educational — but nevertheless hilarious — videos focused on Data Analytics.
• Previously worked as a Lead Data Analyst and Data Engineer at BASF.
• Worked for seven years in the US Navy on nuclear-powered submarines.
• Holds a degree in mechanical engineering, a graduate qualification in nuclear engineering, and an MBA in business analytics.
In this episode, Luke details:
• The must-have skills for entry-level data analyst roles.
• The data analyst skills mistakenly and erroneously pursued by many folks considering the career.
• How his submariner experience prepared him well for a data career.
• His favorite tools for creating interactive data dashboards.
• His favorite scraping libraries for collecting data from the web.
• The skills to learn now to be prepared for the data careers of the future.
• The benefits of CrossFit beyond just the fitness improvements.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Data Analyst, Data Scientist, and Data Engineer Career Paths
Keen to become a Data Analyst? Get promoted to Sr Data Analyst? Or explore Data Engineer/Scientist options? Shashank, a YouTube expert on these questions (>100k subscribers!) tackles them in today's episode.
Shashank:
• Has an exceptional YouTube channel focused on helping people break into a data analyst career.
• Works as a Senior Data Engineer at digital sports platform Fanatics, Inc.
• Was previously Data Analyst at luxury retailer Nordstrom and other firms.
• Holds a degree in chemistry from Emory University in Atlanta.
Today’s episode will appeal primarily to folks who are interested in becoming a data analyst, or who are interested in transitioning from a data analyst role into a data science or data engineering role.
In this episode, Shashank details:
• How you can land an entry-level data analyst role in just a few weeks, regardless of your educational and professional background.
• The hard and soft skills you need to progress from a junior data analyst to a senior data analyst position.
• What it takes to transition from data analyst to a typically more lucrative role as a data scientist or data engineer.
• His favorite resources for learning the essential skills for data scientists.
What he looks for when he’s interviewing candidates.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
TEDx Talk: How Neuroscience Inspires A.I. Breakthroughs that will Change the World
My first TED-format talk is live! In it, I use (A.I.-generated!) visuals to color how A.I. will transform the world in our lifetimes, with particular emphases on climate change, food security, and healthcare innovations.
Thanks to Christina, Banu, and everyone at TEDxDrexelU for inviting me to speak, organizing a slick event, and masterfully editing the footage of my talk.
Thanks to Ed, Andrew, and Shaan at Nebula.io for providing invaluable feedback on drafts of my talk. It's only due to your constructive criticism that the final version turned out as well as it did. Thanks as well to Steven and Alex at Wynden Stark for kindly covering the travel costs of any employees that came down to Philadelphia to see the talk in-person.
Finally, thanks to Taya and Hannah at OpenAI for providing me with early access to custom images from their DALL-E 2 model. These were critical to me being able to tell the effectively convey the narrative I yearned to.
Data Science Interviews with Nick Singh
For an episode all about tips for crushing interviews for Data Scientist roles, our guest is Nick Singh — author of the bestselling "Ace the Data Science Interview" book and creator of the DataLemur SQL interview platform.
Nick:
• Co-authored “Ace the Data Science Interview”, an interview-question guide that has sold over 16,000 copies since it was released last year.
• Created the DataLemur platform for interactively practicing interview questions involving SQL queries.
• Worked as a software engineer at Facebook, Google, and Microsoft.
• Holds a BS in engineering from the University of Virginia.
Today's episode is ideal for folks who are looking to land a data science job for the first time, level-up into a more senior data science role, or perhaps land a data science gig at a new firm.
In this episode, Nick details:
• His top tips for success in data science interviews.
• Common misconceptions about data science interviews.
• How to become comfortable with self-promotion and increase your chances of landing your dream job.
• Strategies for when interviewers ask if you have any questions for them.
• The subject areas and skills you should master before heading into a data science interview.
The SuperDataScience show's available on all major podcasting platforms, YouTube, and at SuperDataScience.com.
Who Dares Wins
Even if we don’t achieve what we originally set out to achieve, by having dared to achieve it, by having taken action in the direction of the achievement, we learn from the experience and we gain invaluable information about ourselves and the world. Having dared, we find ourselves at a new, enriched vantage point that we otherwise would never have ventured to. From there, whether we achieved the original goal or not, we can iterate — dare again — perhaps to achieve success at the original objective or perhaps we identify some entirely new objective that would have otherwise been inconceivable without having dared.
Read More