AI News: A Busy Week in AI Advancements and Developments

Discover the latest AI advancements and developments from the past week, including OpenAI's advanced voice feature, GPT-4's long output capabilities, Microsoft's AI competition claims, and updates from Google, Anthropic, Meta, and more. Stay ahead of the curve in the ever-evolving world of AI.

September 15, 2024

party-gif

This blog post provides a comprehensive overview of the latest advancements in the world of AI, covering a wide range of topics from new voice features in OpenAI's ChatGPT to the acquisition of Leonardo AI by Canva. Readers will gain insights into the rapidly evolving AI landscape and the exciting developments that are shaping the future of this technology.

Open AI's Advanced Voice Feature

Open AI has started rolling out an advanced voice feature to a select few users. This new feature allows users to generate human-like voices, including the ability to mimic voices like Scarlett Johansson.

Some key highlights of the advanced voice mode:

  • Users with access can try the "Advanced Voice Mode" option at the bottom of the chat window.
  • It can generate very realistic-sounding voices, including the ability to mimic voices of celebrities and public figures.
  • Users can interrupt the voice while it is speaking, a feature not available in the standard chat GPT app.
  • Demos show the voice model can count very quickly, even simulating the need to take a breath.

However, this advanced voice feature is currently only available to a limited number of users. Most people still do not have access to try it out themselves. Open AI has announced the new feature, but is rolling it out slowly to a select group for now.

GPT-4 Long Output

Open AI has recently rolled out an experimental version of GPT-4 called "GPT-4 Long Output". This new model has a maximum output of 64,000 tokens per request, allowing for much longer and more detailed responses compared to the standard GPT-4 model.

The GPT-4 Long Output model is currently only available to a select group of Alpha participants and is not yet accessible to the general public. This experimental version is designed to provide users with the ability to generate extremely long and comprehensive outputs in response to their queries.

While the details of the model's architecture and training process have not been publicly disclosed, the increased output length is likely achieved through advancements in the model's memory and reasoning capabilities. This allows the model to maintain context and coherence over a much longer span of text, enabling it to provide more detailed and in-depth responses.

The potential applications of the GPT-4 Long Output model are vast, ranging from extended research and analysis tasks to the generation of long-form content such as reports, essays, or even books. However, as with any powerful AI technology, there are also concerns about the potential misuse or unintended consequences of such a model.

Open AI has stated that they are working closely with regulatory bodies and other stakeholders to ensure the responsible development and deployment of the GPT-4 Long Output model. This includes implementing safeguards and guidelines to prevent the model from being used for harmful or unethical purposes.

Overall, the release of the GPT-4 Long Output model represents a significant milestone in the advancement of large language models and their ability to engage in more complex and nuanced forms of communication and information processing. As the technology continues to evolve, it will be crucial for researchers, policymakers, and the public to closely monitor its development and impact.

Open AI as a Competitor to Microsoft

This week, Microsoft is now claiming that Open AI is a competitor in AI and search. This is interesting because Microsoft has famously invested $13 billion into Open AI and owns 49% of the company.

In their financial reports, Microsoft listed companies including Anthropic, Open AI, Meta, and other open-source offerings as competitors for their AI offerings. This is weird to see, as Open AI is 49% owned by Microsoft and they have partnership deals with Meta as well.

It seems that Microsoft is now seeing Open AI as a competitor to their own search and news advertising business, even though they have a major stake in the company. This is a very interesting dynamic that has developed between the two companies.

Open AI's Endorsement of AI Regulation

Open AI this week endorsed a few Senate bills related to AI regulation and safety. This includes the Future of AI Innovation Act, which would formally authorize the United States AI Safety Institute as a federal body to set standards and guidelines for AI models.

Open AI also endorsed the NSF AI Education Act and the CREATE AI Act, which provide federal scholarships for AI research and establish AI educational resources within colleges and K-12 schools.

These endorsements from Open AI likely serve to help the company gain a seat at the table in future conversations about AI regulation. As a major AI company, Open AI is a likely candidate to face regulatory scrutiny moving forward. By endorsing these bills, Open AI can help shape the direction of the regulation and ensure its interests are represented.

Additionally, Open AI pledged to give the US AI Safety Institute early access to its next model. This appears to be an effort to counter the narrative that Open AI has deprioritized AI safety in pursuit of more powerful generative AI technologies.

Overall, Open AI's moves suggest the company is working to get closer to the US government and position itself as a key stakeholder in the development of AI regulation and safety standards.

Anthropic Launches Claud in Brazil

Good news for those in Brazil - Anthropic has launched its AI assistant Claud in the country this week. Claud is now available for users in Brazil to access and interact with.

Google's Gemini 1.5 Pro and Other AI Models

Google has been making some big waves in the AI world this week as well. They released a new version of Gemini 1.5 Pro, calling it version 0801 which is available to use right now inside of Google's AI Studio.

To access it, you can go to AI Studio .g google.com and under the "Model" dropdown, you'll see "Gemini 1.5 Pro experimental 0801" - that's the model you want to use.

When you chat with this new Gemini 1.5 Pro model, it has topped the leaderboard over on lm.org, even outperforming GPT-4, GPT-4 Mini, CLAE 3.5, and Sonet.

Google also released a new smaller version of their Gemini 2 model this week - a 2 billion parameter model built for faster performance and efficiency, likely for mobile devices. Interestingly, this 2 billion parameter model outperforms much larger models like Mixdral 8X 7B, GPT-3.5 Turbo, and LLaMA 2 70 billion.

In addition to the new Gemini models, Google added some impressive new Chrome AI features this week, including Google Lens integration that can identify and search for objects in images, and a new comparison feature that can compare products across different websites.

Overall, Google has been pushing the boundaries of large language models and AI capabilities in Chrome, demonstrating their continued innovation and leadership in the AI space.

Google's New Chrome AI Features

This week, Google added some new AI-powered features to its Chrome browser:

  1. Google Lens in Chrome Desktop: You can now use Google Lens to search for information about objects in images directly from the Chrome browser. Simply select an area of an image and Lens will search for similar products or identify the object.

  2. Product Comparison: Chrome now has a built-in feature that allows you to compare products across different tabs and websites. This makes it easy to research and compare items without having to switch between tabs.

  3. Natural Language Search History: You can now use natural language to search your Chrome browsing history. For example, you can ask "What was the ice cream shop I looked at last week?" and Chrome will surface the relevant information from your search history.

These new AI-powered features in Chrome demonstrate Google's continued efforts to integrate intelligent capabilities directly into its core products and services. By leveraging technologies like computer vision and natural language processing, Google is making it easier for users to find information, compare products, and navigate their browsing history - all without leaving the Chrome browser. As AI continues to advance, we can expect to see more of these types of intelligent features become commonplace across Google's suite of tools and applications.

Meta's Killed AI Chatbots and New AI Studio

This week, Meta killed one of the features it announced at last year's Meta Connect. They had shown off AI chatbots that looked like famous people, but were not the actual chatbots of those famous people - they just used their face but were trained on different information. However, nobody really liked those, so Meta got rid of them.

But they replaced it with something else - now, anybody can create their own custom AI. Meta rolled out an AI Studio, and one of my friends, Don Allen Stevenson, is one of the people who got early access.

This new feature allows anyone to create AI characters based on their interests. You can go to ai.meta.com/AI-Studio and create your own custom AI character, choosing options like AI pet, private tutor, fellow fan, imaginative artist, sounding board, creative designer, personal stylist, and more.

The process generates a character image with AI, gives it a name and tagline, and then you can further customize and design what you want this AI to do. Right now, it seems a bit like a novelty, as you can't easily pull in large documents or transcripts to allow people to chat with an AI avatar version of you. But that's likely where they're trying to take this in the future.

The more impressive thing Meta rolled out this week is their new Segment Anything Model 2 (SAM 2). This is a model that can segment out certain sections of an image or video with impressive accuracy, even tracking objects as they move around. It's a big improvement over previous segmentation models, and could be very useful for video editing tasks like rotoscoping. You can try out SAM 2 at sam2.metademolab.com.

Overall, Meta is continuing to push the boundaries of what's possible with AI, even if some of their consumer-facing features may seem a bit gimmicky at the moment. It will be interesting to see how their AI Studio and segmentation tools evolve over time.

Meta's Segment Anything Model 2

Meta has released a new version of their Segment Anything Model, called SAM 2. This updated model demonstrates significant improvements in its ability to accurately segment objects in images and videos.

Some key features of SAM 2 include:

  • Improved ability to track objects through occlusion - the model can continue to follow an object even when it temporarily goes behind another object.
  • Enhanced segmentation accuracy, allowing it to more precisely outline the boundaries of detected objects.
  • Faster processing speed, enabling real-time segmentation in video applications.
  • Expanded versatility, with the model able to segment a wide range of objects, from people and animals to more complex shapes and structures.

The demos provided by Meta showcase SAM 2's impressive capabilities. For example, the model can accurately track a skateboarder as they move through a scene, maintaining the segmentation even as the skateboarder passes behind a tree. Similarly, it can isolate and follow multiple balls in a video, distinguishing each one individually.

These advancements in segmentation technology have exciting implications for video editing, visual effects, and other media production workflows. By automating the tedious process of rotoscoping, SAM 2 has the potential to significantly streamline and accelerate these tasks. Integration with tools like Adobe Premiere and DaVinci Resolve could make SAM 2 a valuable asset for content creators.

Overall, Meta's Segment Anything Model 2 represents a significant step forward in computer vision and image/video processing capabilities. As AI continues to evolve, we can expect to see even more impressive feats of visual understanding and manipulation in the near future.

Perplexity Publishers Program

Perplexity, the AI-powered search engine, has announced the Perplexity Publishers Program. This program aims to share revenue with specific partners whose content is used as a source of news on the Perplexity platform.

The initial batch of partners included in this program are:

  • Time
  • Der Spiegel
  • Fortune
  • Entrepreneur
  • The Texas Tribune
  • WordPress.com

While this program currently only includes larger publishers, Perplexity has expressed hope that in the future, it will be able to incentivize normal bloggers and content creators to license their content to the platform as well. However, as of now, the Perplexity Publishers Program is focused on established news organizations.

The goal of this program is to provide a way for Perplexity to share the revenue generated from using partners' content, rather than simply aggregating and displaying it without compensation. This represents an effort by Perplexity to build mutually beneficial relationships with content creators whose work is featured on its platform.

Leonardo AI Acquired by Canva

This week, the big news is that Leonardo AI, one of the leading AI image generation tools, has been acquired by the design platform Canva. This is a significant development for a few reasons:

  1. Integration with Canva: With Leonardo AI now part of the Canva ecosystem, users will eventually be able to access the powerful image generation capabilities directly within the Canva platform. This will make it easier than ever to create high-quality, AI-generated images without having to jump between multiple tools.

  2. Improved Canva AI: Canva's current AI image generation capabilities have been somewhat lacking compared to other tools like DALL-E and Midjourney. By integrating Leonardo's proprietary "Phoenix" model, Canva's AI image generation is poised to improve dramatically, allowing users to create even more impressive visuals.

  3. Advisor's Perspective: The author notes that he has been an advisor for Canva for over a year, so this acquisition also benefits him personally from an equity standpoint. However, he still believes the integration of Leonardo's technology will be genuinely helpful, regardless of his advisory role.

  4. Continued Innovation: Leonardo AI will continue to operate as an independent entity, with plans to continue updating and improving their own app and features. This means users can expect to see ongoing innovation and development in the Leonardo AI tool, even as it becomes part of the Canva ecosystem.

Overall, this acquisition represents an exciting step forward for both Canva and the world of AI-powered design tools. By bringing together Canva's design expertise and Leonardo's advanced image generation capabilities, users can look forward to even more powerful and accessible creative tools in the future.

Mid Journey Update 6.1

This week, Mid Journey released version 6.1, which greatly improves image quality, coherence, and text handling. Some key highlights:

  • Significant improvements in image quality and coherence. The examples shown demonstrate a high level of realism that is hard to distinguish from real images.

  • Better handling of text prompts, even for nonsensical or made-up words. The model seems to understand the intent behind the text and generates appropriate imagery.

  • A new upscaling and personalization model that further enhances the generated images.

To try out the new Mid Journey 6.1 model, you can head over to midjourney.com, click on the settings, and ensure the model is set to 6.1. From there, you can enter prompts and see the impressive results for yourself. The community has been sharing many striking examples showcasing the advancements in this latest update.

New 3D Model Generators

There are a few new advancements in the world of 3D model generation using AI:

  1. Edify 3D by NVIDIA and Shutterstock:

    • Edify 3D is a new model developed in collaboration between NVIDIA and Shutterstock.
    • It allows you to generate 3D models from text prompts on the website build.nvidia.com.
    • For example, you can enter a prompt like "a grey wolf howling at the moon" and it will generate 3D model previews.
    • You can then select the preview you like and generate the full 3D model.
  2. Stable Fast 3D by Stability AI:

    • Stable Fast 3D is a rapid 3D asset generation model from Stability AI.
    • It can generate 3D models from single images in under a second.
    • The model is available via the Stable Assistant API and on Hugging Face.
    • While the results are not as polished as the Edify 3D model, the speed is impressive.
  3. New 3D Model from Black Forest Labs:

    • Black Forest Labs, a new AI research company, has released a new open-source text-to-image model called Flux.
    • Flux can be used to generate 3D models from text prompts, in addition to 2D images.
    • The model is available to use on platforms like Glyph and Hugging Face.
    • Early examples show Flux can generate decent 3D models from simple prompts.

These new 3D model generation tools demonstrate the rapid progress in AI's ability to create 3D content from text and images. While the quality may still have room for improvement, the speed and accessibility of these models are quite impressive.

Black Forest Labs' New Flux Model

Black Forest Labs, a team of distinguished AI researchers and engineers, has released a new open-source text-to-image model called Flux. This model was developed by the same team that created VQ-GAN, latent diffusion, and other advancements in visual generative AI.

Flux is now available for anyone to use through platforms like Glyph and Hugging Face. Some key points about the new Flux model:

  • It is a text-to-image model that can generate high-quality images from text prompts.
  • The model is openly available, allowing developers and creators to easily integrate it into their own projects.
  • Initial demos and examples show Flux producing impressive, realistic images across a variety of subjects and styles.
  • By open-sourcing Flux, Black Forest Labs aims to further the progress and accessibility of advanced text-to-image AI capabilities.

Creators can experiment with Flux through no-code tools like Glyph, which provide a simple interface to generate images from text prompts. Alternatively, the model is also available on Hugging Face for more technical integrations.

Overall, the release of the Flux model by Black Forest Labs represents another step forward in the rapid advancement of open-source, high-quality text-to-image AI technology. It will be exciting to see how this new model is adopted and utilized by the broader AI and creative communities.

Runway's Image to Video and Gen 3 Alpha Turbo

Runway, the AI-powered video creation platform, has made significant advancements this week with the introduction of two new features: image to video and Gen 3 Alpha Turbo.

Image to Video

Runway has now rolled out an image to video model, allowing users to convert static images into dynamic video content. This feature is a significant addition to Runway's capabilities, as it enables users to bring their images to life in a seamless and engaging manner.

The examples showcased by Runway demonstrate the versatility of this new tool. Users can transform images of water falling, plants moving, or even a Suzuki Samurai being covered in paint into captivating video clips. This feature opens up new creative possibilities for content creators, allowing them to breathe life into their visual assets.

Gen 3 Alpha Turbo

In addition to the image to video model, Runway has also announced the release of Gen 3 Alpha Turbo, a faster and more efficient version of their Gen 3 Alpha video generation model. According to Runway's demonstrations, the Gen 3 Alpha Turbo can generate video outputs much more quickly than the previous version, with a generation time of just 11 seconds.

This improvement in speed and efficiency is a significant development, as it allows users to iterate and experiment with their video creations more rapidly. The faster turnaround time can enhance the overall video production workflow, enabling creators to explore more ideas and refine their content with greater agility.

Overall, Runway's latest advancements in image to video and Gen 3 Alpha Turbo showcase the company's commitment to pushing the boundaries of AI-powered video creation. These new features have the potential to revolutionize the way content creators approach video production, opening up new avenues for creativity and experimentation.

AI-Generated Avatars and Influencers

The rise of AI-generated avatars and influencers is a fascinating and concerning trend. While the technology behind these tools is impressive, the potential for abuse and the spread of misinformation is worrying.

On one hand, AI-powered avatars can be used to create highly realistic digital representations of people, allowing for the creation of "digital twins" that can be used for various applications. This could be beneficial in areas like entertainment, where AI avatars could be used to create new forms of content and experiences.

However, the potential for these tools to be misused is significant. AI-generated influencers, for example, could be used to spread propaganda, promote products without proper disclosure, or even impersonate real people. This could erode trust in online content and make it increasingly difficult for people to distinguish between what is real and what is fabricated.

Moreover, the ease with which these avatars can be created raises concerns about the potential for the proliferation of low-effort, low-quality content. As the technology becomes more accessible, we may see a flood of AI-generated content that lacks the depth and authenticity of human-created work.

Ultimately, the rise of AI-generated avatars and influencers is a double-edged sword. While the technology has the potential to be used for good, it also poses significant risks that need to be carefully considered and addressed. As these tools become more widespread, it will be crucial for policymakers, tech companies, and the public to work together to establish clear guidelines and safeguards to ensure that they are used responsibly and ethically.

Vimeo's Automatic Video Translation

Vimeo, the popular video hosting platform, is rolling out a new feature that allows users to automatically translate their videos into any language using the speaker's own voice. This feature is particularly useful for creators who want to localize their content and make it accessible to a global audience.

The way it works is that Vimeo's system will take the audio from the original video and translate it into the desired language, while preserving the speaker's voice. This means that the translated version will sound natural and seamless, without the need for re-recording or hiring a voice actor.

One of the key benefits of this feature is that it eliminates the need for manual subtitling or dubbing, which can be time-consuming and expensive. With Vimeo's automatic translation, creators can quickly and easily make their videos available in multiple languages, expanding their reach and potential audience.

Additionally, this feature can be particularly useful for educational or instructional videos, where clear and accessible communication is crucial. By offering translations, Vimeo is making it easier for viewers from diverse linguistic backgrounds to engage with the content.

Overall, Vimeo's automatic video translation is a welcome addition to the platform, providing creators with a powerful tool to reach a global audience and make their content more inclusive and accessible.

Anthropic's Response to Lawsuits

Anthropic has responded to the lawsuits filed against them, claiming that they used publicly available data across the internet for training their models. They acknowledge that this data may have included some copyrighted information, but state that was not their intention.

Anthropic argues that their models learn in a similar way to how humans learn - by consuming large amounts of publicly available information. They claim to have put guardrails in place to prevent the generation of content that directly replicates copyrighted works.

Anthropic also states that they were surprised by the lawsuits, as they had been working collaboratively with many in the recording industry who were excited about the technology. Overall, Anthropic seems to be defending their practices and arguing that their use of publicly available data falls within fair use principles.

The full response from Anthropic can be found in the link provided in the description. It provides their perspective on the legal challenges they are facing regarding the training data used for their AI models.

The "Friend" AI Necklace Controversy

There has been an interesting story unfolding around a new AI-powered necklace device called "Friend". Here's a summary of the key points:

  • A company called Avi Shiffman launched a new product called "Friend" - a necklace that uses AI to listen to the wearer and send them text messages with observations and comments.

  • However, it turns out there was an existing "Friend" product launched earlier by a different company, Nick Shenko. Shenko accused Shiffman of copying his concept and style.

  • Shenko even released a rap "rebuttal" video, calling out Shiffman for "jacking his style" and investing $1.9 million to buy the "friend.com" domain.

  • Further drama emerged when Macy Gilliam, who works for one of Shiffman's investors (Morning Brew), tweeted that the $1.8 million domain purchase was a waste of money.

  • The whole situation has become a bit of a spectacle, with speculation that it may all be part of a clever marketing strategy to generate buzz around the new "Friend" product.

  • Overall, it's an interesting case study in the challenges and controversies that can arise around new AI-powered consumer products, especially when there are similarities or perceived copying between competing offerings.

Other AI News

This week was filled with a variety of other AI-related news and updates:

  1. Video Game Performers Strike Over AI Concerns: Video game performers are going on strike over concerns that game companies could use AI to replicate their voices or create digital replicas of their likeness without consent or fair compensation.

  2. Taco Bell Rolls Out AI in Drive-Throughs: Taco Bell is planning to use voice AI technology in hundreds of their drive-through locations in the US by the end of 2024. However, past attempts by companies like Wendy's and McDonald's with AI-powered drive-throughs have had mixed results.

  3. AI Toothbrush Claims to Improve Dental Health: A new AI-powered toothbrush claims to use advanced algorithms and companion apps to help users brush their teeth better. Some are skeptical about the need for AI in such a basic task.

  4. AI Heavily Used in the Olympics: AI is being leveraged extensively at the Olympics, from identifying objects on the playing field to analyzing athlete movements and tracking the ball. The widespread use of AI in the Olympics could be an interesting topic for a future video.

  5. Qualcomm Showcases Cutting-Edge AI Technology: During the sponsor segment, the video highlighted Qualcomm's work in enabling companies to run AI on-device, including features like real-time language translation, facial expression-based computer control, and AI-powered music remixing.

Overall, these news items demonstrate the continued expansion of AI into various industries and applications, raising both excitement and concerns about the implications of this technology.

FAQ