From Photography to Jokes – Basic Models and the Next Generation of AI | Art and entertainment

If you’ve seen tea pot photos like Avocado or read a well-written article that leads to some weird tantrums, you’ve probably come across a new trend in artificial intelligence (AI).

Machine learning systems called DALL-E, GPT, and PaLM splash with their incredible ability to create creative work.

These systems are known as “basic models” and are not all hip and party moves. So how does this new approach to AI work? And will this be the end of human creativity and the beginning of a deep false dream?

1. What are the basic models?

Basic models work by training a large amount of general information in a single large system then applying the system to new problems. The previous models wanted to start from scratch for each new problem.

The DALL-E 2, for example, has been trained to scan hundreds of millions of instances to match images with photos (such as a pet cat photo) with “Mr. Fawzy Boots relaxing in the sun like a Toby cat.” . Once trained, this model knows what cats (and other things) look like in photos.

But the model can also be used for many other interesting AI tasks such as just creating new images from the title (“Show me a basketball hook”) or editing images based on written instructions (it looks like). That this port pays. Tax “).

2. How do they work?

The basic models run on “deep neural networks”, which are gently inspired by how the brain works. It involves complex mathematics and a large amount of computer power, but they fit into very complex types of patterns.

For example, by looking at pictures of millions of instances, the deep neural network can connect the word “cat” to pixel patterns that often appear in cats’ photographs – such as soft, fuzzy, hairy bulbs of texture. The more examples the model sees (the more data is displayed), and the larger the model (the more “layers” or “depths”), the more complex these patterns and relationships can be.

The foundation models are in a sense extending the concept of “deep learning” that has dominated AI research for the past decade. However, they show unplanned or “urgent” behaviors that are both shocking and novel.

For example, Google’s PaLM language model seems to produce explanations for complex metaphors and jokes. It simply goes beyond mimicking the data types that it was originally trained to process.

A user interacts with the PaLM language model by typing questions. The AI ​​system responds by typing back answers.

3. Access is limited – for now

It’s hard to imagine the scale of these AI systems. PaLM has 540 billion parameters, meaning that even if everyone on the planet memorized 50 numbers, we would still not have enough storage to reproduce the model.

The models are so large that their training requires extensive computer and other resources. An estimated cost of training the OpenAI language model GPT-3 is around 5 5 million.

As a result, only large tech companies such as OpenAI, Google, and Baidu can now build infrastructure models. This limits the companies who can access the systems, which makes economic sense.

Restrictions on usage may give us some relief that these systems will not be used for malicious purposes (such as creating fake news or defamatory content) anytime soon. But it also means that independent researchers cannot investigate these systems and share the results openly and accountably. So we still don’t know the full effect of using them.

4. What will this model mean for ‘creative’ industries?

Other basic models will be produced in the coming years. Small models have already been published in open source farms, tech companies are beginning to experiment with licensing and commercializing these devices, and AI researchers are working hard to make the technology more efficient and accessible.

Significant creativity has been demonstrated by models such as PaLM and DALL-E 2 showing that creative professional tasks can be affected by this technology much faster than initially expected.

Traditional wisdom has always said that robots would first dislodge “blue collar” tasks. The “white collar” work was intended to be relatively safe from automation – especially professional work that requires creativity and training.

In-depth learning AI models already demonstrate tremendous accuracy in tasks such as X-ray examination and detection of macular degeneration of the eye condition. Foundation models will soon provide inexpensive, “good enough” creativity in areas such as advertising, copywriting, stock photography, or graphic design.

The future of professional and creative work may look a little different than expected.

5. What this means for legal evidence, news and media

Foundation models will inevitably affect law in areas such as intellectual property and evidence because we will not be able to accept that creative content is the result of human activity.

We must also address the challenges of misinformation and misinformation created by these systems. We are already facing major problems with misinformation as we see the apparent Russian invasion of Ukraine and new problems with deep fake photos and videos, but the basic models are ready to supercharge these challenges.

Preparation time

As researchers studying the effects of AI on society, we think that fundamental models will make a big difference. They are strictly controlled (for now), so we may have little time to understand their effects before they become a major issue.

Jane isn’t out of the bottle yet, but the base model is a very large bottle – and there’s a very smart Jane inside.

Aaron Sanswell is a postdoctoral research fellow at Computer Law and AI Accounting, Queensland University of Technology; Dan Hunter is the Executive Director of the Faculty of Law at Queensland University of Technology. This article was republished from a discussion under the Creative Commons license. Read the original article at

Leave a Comment