The Emergence of Large Multimodal Models

Rich Washburn
Oct 6, 2023
3 min read

With the arrival of AI agents capable of not only understanding human language but also processing images and performing human-computer interactions, we're standing on the cusp of a revolution in the job market.

LMMs vs. Traditional Language Models

To grasp the magnitude of LMMs, let’s take a step back. Traditional large language models, like GPT-4, were already impressive, interpreting and generating human-like text. But LMMs go a step further. They integrate text and vision capabilities, making it possible for AI to understand content in multiple formats – from images of handwritten notes to interactive webpages.

This transformative capability was recently spotlighted in a Microsoft paper aptly titled, "Dawn of LMMs." The poetic resonance of this title hints at a new dawn for technology, with LMMs as the rising sun.

Opportunities & Implications

With LMMs, the potential applications seem boundless:

Insurance & Damage Evaluation

By analyzing images, LMMs could assess damage from car accidents or natural disasters, offering immediate estimates and accelerating claims processes.

Medical Assistance

Picture an AI that not only reads medical prescriptions but can also interpret X-rays or MRI scans. This could democratize access to basic diagnostic services.

Real-time Document Verification

Imagine an AI agent capable of instantly verifying IDs or licenses without breaching privacy – making processes at airports or government offices seamless.

However, with great power comes great responsibility, and potential challenges:

Job Disruption

Jobs that involve repetitive tasks, such as data entry or basic computer operations, might face obsolescence. Workers will need to adapt and reskill.

Ethical Concerns

If an AI can interpret sensitive documents, privacy concerns will undoubtedly rise to the forefront. Regulation will be paramount.

Dependence on Technology

As LMMs become integral to operations, businesses must guard against over-reliance, ensuring that human judgment and oversight remain central.

So, is the job market heading for an overhaul with LMMs at the helm? Definitely. But rather than a takeover, envision a partnership. As LMMs handle data-heavy tasks, humans can focus on strategy, creativity, and innovation – areas where we naturally excel.

In essence, the dawn of LMMs doesn’t signify the sunset for human roles. Instead, it heralds a brighter day where human potential is amplified, not replaced. The challenge lies in harnessing this potential responsibly and ethically, ensuring that as we race ahead with AI, no one is left behind.

Job roles that could be impacted:

Data Entry Clerks: Manual data entry, especially from images or scans, could be fully automated. For instance, transcribing handwritten forms into digital records.
Customer Support Representatives: Basic troubleshooting and query resolution can be done by AI. Virtual assistants and chatbots can handle a significant portion of customer inquiries.
Medical Imaging Specialists: For preliminary scans and diagnoses, AI can identify anomalies in X-rays, MRIs, and other medical images, aiding radiologists.
Insurance Claim Processors: As demonstrated by GPT-4 Vision's ability to evaluate damages from images, much of the car or property damage assessment could be automated.
Web Developers: Basic website design and troubleshooting could be automated, especially with AI understanding web interfaces and navigation.
Research Assistants: Summarizing scientific papers, sourcing relevant articles, and even data analysis could be streamlined with AI.
Marketing Analysts: Analyzing customer sentiments, preferences from visual content on social media, and optimizing ad placements could be AI-driven tasks.
Retail Jobs: With the advent of cashier-less stores, the need for human cashiers might decrease.
Quality Control Inspectors: For basic quality checks, especially in manufacturing, AI can spot defects or inconsistencies in products.
Translators: With advanced language models and vision, translating written content, even in image format, can be automated.
Librarians and Archivists: Sorting, archiving, and retrieving information, especially in digital formats, can be handled by AI.
Real Estate Agents: Virtual tours, property valuations based on images, and matching properties to client preferences can be AI-driven.
Security Guards: Surveillance, especially analyzing CCTV footage and identifying threats, could be automated.
Basic Teaching & Tutoring: While the human touch in education is irreplaceable, AI can handle basic queries, grading, and even create customized study plans based on individual student performance.
Recruitment & HR: Screening resumes, especially when provided in varied formats, and initial rounds of interviews might become AI-driven processes.
Content Moderators: AI can identify inappropriate or flagged content across platforms, making the moderation process faster.

With the continuous evolution of AI, particularly models that can "see" and "understand" images and diagrams alongside textual data, several professions and job roles might face the threat of automation. While no technology can fully replace human intuition, creativity, and contextual understanding, more and more tasks will be automated.

The Emergence of Large Multimodal Models

Recent Posts

Comments