Axis: ARTS, RESEARCH AND PRODUCTION OF KNOWLEDGE; artistic decolonialisms: insurgent practices and thoughts.
The Work of Art in the Age of Its Technological Generatability
Nicolás Ruarte (UNA)
SUMMARY: This research addresses the impact of Artificial Intelligence (AI) in the paradigm of contemporary Latin American art. Its objectives are to explore new methodologies of artistic production, reflect on the entity of the work of art, preserve and promote the style of Latin American art, combat Eurocentric bias in artistic/technological production, expand representation and promote a more inclusive perspective. in the field of AI and finally prepare an artistic production manual for students and young artists. It is expected to contribute to the development of more inclusive and representative AI, promoting the appreciation and dissemination of Latin American artistic styles and expressions. It seeks to challenge the stereotypes and biases rooted in commercial AI developments, opening a space for Latin American artists to be recognized, contributing to the training of young artists of the next generation who will grow up with these tools at their disposal and must compete in this new world panorama. The work methodology combines research and artistic production to evaluate the risks and implications of these new technologies.
Keywords:Artificial intelligence; Eurocentrism; Latin America; Colonialism; Generative art.
Introduction
We are at the beginning of a technological revolution that will transform the economy, society, politics and the arts in a way that has not happened since the industrial revolution. The latest advances in AI promise to revolutionize the way we think, design and produce art. In this context of great opportunities and challenges, there is also great concern among many artists and artistic groups about the impact that this revolution will have on their work. This research seeks to challenge this idea and demonstrate how the use of these technologies will allow an unprecedented explosion of creativity, shortening production times, reducing costs, democratizing access and increasing the quality of productions. Allowing artists to train and tackle more complex projects, expanding their creative horizons.
The conception of this project arose from research into tools such as Stable Diffusion (SD) and Midjourney, where the absence of the styles of Latin American artists in the works became evident in the outputs of these commercial AIs. This fact reveals the Eurocentric bias and the exclusion of artistic and cultural expressions from our region in the development and training of these technologies. The architecture of AI, its neural network (NN), is inherently bias-free. The bias arises from the developers who designed and filtered the datasets with which these AIs were trained. All images that are omitted from this dataset, whether voluntarily or involuntarily, will be unknown by the AIs. But also, all those images that are included must be accompanied by a tag or associated description. Both in the selection of images and in their description, biases may arise that the AI will then reflect in the images it produces.
If we want AI tools that reflect the idiosyncrasies, styles and faces of Latin America, we must develop them ourselves. This project seeks to challenge and combat Eurocentrism, not only from an artistic perspective, but also as a way to question the biases and discriminations inherent in these systems and in the international art market.
Development
This work has two fundamental axes. First, research on AI development and training. Secondly, own artistic production through the training and use of these tools and the analysis of the public's reactions. As for the work methodology, it consists of approaching my own artistic work trying to incorporate all the artificial intelligence tools at my disposal.
What is Artificial Intelligence?
AI is a scientific discipline that studies and develops theories, techniques and application systems to simulate and/or extend human intelligence. The term was first coined by John McCarthy in 1956 who defined it as the science and engineering of developing intelligent machines, especially intelligent computer programs. The premise of AI is to allow machines to learn from the data collected to simulate the knowledge and skills of a human being. Implementing decision-making similar to that of a person, continually adapting their knowledge structures and improving their performance. Today AI has evolved as an interdisciplinary field that involves various branches of science, industry and culture.
There are three major schools of AI thought, Symbolism, Connectionism and Actionism that have developed different architectures or types of AI over time, each with its particular characteristics and objectives. The object of study of this work is image generating AI of the type diffusion models, which originate from the studies of deep learning (DL) of the Connectionist school. The algorithms ofDL They have the objective of establishing a NN that simulates the functioning of the human brain. The neurons and synapses of the human brain are represented with units and weights (connection weight) in the NN. They are particularly effective for working with images, voices and natural language text.
Although AI has existed as a scientific discipline since the mid-20th century, today we see the deployment of these technologies within the reach of anyone who has a cell phone or a computer. There were certain fundamental changes in the last twenty years that enabled this recent revolution. To identify them, it is necessary to first mention the four basic elements of all AI:
Data: All information that is used for the development, training and operation of AIs. It can be images, numbers, natural language, sensor information, transactional data, etc.
Algorithms: A set of systematic operations and computer procedures that provide the logical architecture of AI.
Processing power: Charging and data processing capacity of the hardware on which an AI works.
Application scenarios: Specific environments and tasks for which the AI was designed.
Advances in the processing power of our computers have allowed AI systems to be implemented millions of times more powerful than those of the 1980s and 1990s, which were generally nothing more than small laboratory cases. Thanks to this, the advances in algorithms derived from scientific research could be applied in NN with millions of times more neurons than their predecessors, giving them the capacity to store and process enormous amounts of data although this data is in truth the key to the current panorama. We humans have become tireless machines for generating content. All the interactions, texts, photos and publications that we make on the Internet are available to develop, train, and provide feedback to AI systems. It is we, in our eternal desire to leave a digital mark on the world, who have allowed AI models such as ChatGPT or SD to improve the quality of their outputs until they became almost indistinguishable from those of a human being.
Despite all these advances, AI developers are still very far from achieving a true singularity, an AI with consciousness that can perform any task that a human performs and pass the famous Turing test. All current AIs are what is known as specific or weak AIs. This means that they are designed for a specific task in a given application scenario, within that scenario they can achieve incredible things but they cannot execute any task outside of it. ChatGPT's responses may seem brilliant to us, but they do not have true consciousness, they do not know the true meaning of what they write, they are just a statistical construction of text strings. A general or strong AI, capable of replicating any human function to the point of being self-sustaining and not depending on user intervention, is just science fiction at the moment, and many experts believe it will always be that way.
How does an AI learn?
There are various methodologies and procedures to train an AI. For the purposes of this work I am not going to make a theoretical description of techniques and their application environments, but rather I will develop a practical case based on the training of Stable Diffusion models. I chose SD for this research as it is an open source AI model developed by Stability IA, so all data on its operation and architecture are publicly available.
SD uses a type of diffusion model (DM) called latent diffusion model (LDM). It is mainly used to generate images conditioned by a prompt. It works by diffusing a 2D map of Gaussian noise within a latent space over a series of iterations or steps, removing noise from the image conditioned by the prompt entered until it reaches a result or output.
SD was trained with 5 billion pairs of images and descriptions taken from LAION-5B, a publicly accessible dataset derived from data fromCommon Crawl extracted from the internet. The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services, with a total of 150,000 GPU hours, at a cost of $600,000.
It is terribly expensive to train an AI model from scratch but SD, being open source, allows us to use the NN developed by Stability IA and retrain it for our needs. Such retraining requires a much smaller number of images at a GPU cost than can be afforded by a freelance artist with a video card on his or her personal computer or by renting low-cost GPU usage from Amazon Web Services, Google Colab, or some other cloud service.
I decided to start this project with the goal of retraining SD models with the aesthetics of Latin American artists ignored by Stability AI. Producing reduced SD models called Low Rank Adaptation (LoRAs). I have successfully trained six models of Benito Quinquela Martin, Leonor Finí, Ernesto de la Carcova, Julio Le Parc, Raquel Forner and El Eternauta by Héctor Germán Oesterheld and the cartoonist Francisco Solano López. Image 1 –Giocondas ∞ / Images produced by the AIs from left to right Carcova_IA, Fini_IA, Forner_IA, Quinquela_IA, Eternauta_IA and LeParc_IA.
The first step to be able to train an AI is to create the dataset. A dataset is made of two fundamental data, the images and the descriptions that accompany them or tags. To achieve an AI that works correctly and faithfully imitates an artist's style, it is very important to try to collect as many works by said artist with the best possible quality. Based on my tests, at least 150 different images are needed to obtain good results. The broader and more varied our dataset the better.
The second step is to carry out the tags. Each image of the dataset must be accompanied by a description that is as detailed as possible. This allows the model to correctly interpret the prompt. For example, because Le Parc had a long career with diverse styles, I managed to collect more than 450 images of his works. In order for LeParc_IA to produce images of each distinct style instead of a mix of all of them I had to separate the images from the dataset in groups and associate a keyword or keyword to each group to differentiate them. In this way when the keyword is present in the prompt the AI will produce images of that artist's specific style.
Image 2 – Dataset LeParc_IA / 454 images of works by Le Parc accompanied by each of their tags
With the dataset complete, training can begin. It is important to highlight that the preparation of the dataset is the most important process when it comes to obtaining good results in training. No amount of training can replace a poorly made dataset. It is precisely in this step where the possible biases that our AI will have are generated. Our own subjectivity when selecting and describing the images of the dataset leads to AI associating certain words with certain images and can produce biased or malicious results.
Training parameters vary depending on the characteristics of the dataset and the expected results. To date, there is no standard procedure for all cases. There is a lot of trial and error in this instance as each NN and each dataset has its peculiarities. Broadly speaking, the LoRA training process consists of retraining a part of the SD NN so that it produces images similar to the images of the dataset. This training is done in a series of iterations known as epochs. The usual process takes between 10 or 20 epochs although there is no direct relationship between more repetitions and higher quality. On the contrary, NNs can be overtrained, a phenomenon known as overfitting, which ends up generating unwanted results. In the case of SD LoRAs, by overtraining them with too many epochs we end up generating NNs that reproduce the images of the dataset to the point that they almost copy the original images without being able to produce anything significant new, practically ignoring the prompt of generation.
Image 3 – Precision vs Epochs graph in AI training / After the optimal point, the greater the number of epochs, the larger the validation error.
Once the process is finished, the result is a LoRA SD that we can use together with other larger SD models to reproduce and generate images based on the style we select. Anecdotally regarding the AIs carried out in this research, I would like to mention that each one seems to have its own “personality”. Quinquela is stubborn, just as Quinquela only painted port motifs, it is practically impossible to avoid the fact that in Quinquela_IA's images there is water, reflections or boats. Also striking is the relationship between the painting method and the ideal configurations to reproduce his technique. Quinquela_IA works best with a short sample, between 20 and 30 steps and a Classifier-Free Guidance Scale (CFG_scale) low of around 3.5. This parameter controls the creativity of the generated image in relation to the description provided, the higher the AI the more conditioned by the prompt. This results in a fast generation time, something that is consistent with the fast style of Quinquela, who could finish a painting in less than a day by applying large amounts of material directly with the spatula. On the other hand, Carcova_IA works better with a high sampling, from 69 to 96 steps and a CFG_scale of 7.5. We could draw a parallel between this reiteration of layers with the meticulous work of applying material with soft brushstrokes and glazes by Ernesto de la Carcova. Something similar happens with LeParc_IA, the ideal configuration of steps is extremely high, between 120 and 150 steps also including a detail-refining model. This large amount of steps results in long generation times that allows the AI to correctly reproduce the geometric abstraction style of Le Parc, famous for its meticulous paintings of opt art.
AI Image Generation
SD models produce images by diffusion within a latent space from a 2D map of Gaussian noise. This process can be understood as a way of interpreting that initial image of pure random noise in a series of steps in which noise is removed little by little until reaching a satisfactory image that is conditioned according to the dataset of images with which the model was trained and the generation parameters. The specific image generation process varies depending on the AI model and the specific implementation we are using, but, as a general rule, we can point out that SD has four main architectural components:
Variational AutoEncoder (VAE): Encoder responsible for compressing the pixel space of the image into latent space and then decoding the latent space to restore the full size of the image.
Forward and Reverse Diffusion: The forward diffusion algorithm progressively adds Gaussian noise to an image until all that is left is random noise. During training, all images go through this process and then the inverse diffusion algorithm iteratively undoes the forward diffusion and attempts to reconstruct them from the noise. The goal of training is to minimize the generation error until the AI can produce the images of the dataset.
Noise predictor (U-Net): The noise predictor estimates the amount of noise in the latent space and subtracts it from the image. This process is repeated depending on the number of steps that we configure in the generation.
Text conditioning: The algorithm responsible for interpreting the prompt. SD uses a CLIP tokenizer which analyzes each word of the prompt and inserts this data into a vector within the latent space, conditioning the generated image according to the semantic field.
Latent space is a reduction of the image pixel space into a much smaller n-dimensional information space. This means that SD does not attempt to directly produce the 512x512 pixels of an image, but rather works within a reduced information space that is then scaled to the final resolution through a convolution operation, a mathematical operation that combines two functions to describe the superposition. between both. This work process within the latent space greatly reduces the load of operations that the NN must perform to interpret the prompt and produce an image. Reducing the computational processing load and generation times.
SD works by diffusing Gaussian noise according to a seed or seed, which implies that every time we use the NN the image produced is made at the moment based on a 2D map of random noise. Even if we use exactly the same parameters, the result is always unique and presents specific details that are unrepeatable. This random characteristic is essential to understand and explain the limits and possibilities of these technologies and counteract the typical criticisms of “AI do not generate anything new” or “they generate stolen images”, on the contrary, AI only generates completely new images.
Image 4 – Example of generating an image
The objective of this work is not to make an in-depth explanation of the theoretical algorithmic elements that explain the operation of these neural network systems, but rather we will demonstrate the specific practical case of SD implemented using software such as Automatic1111 (A1111) or ConfyUI.
The creative possibilities of SD are enormous and we can differentiate different generation processes depending on the audiovisual product we want to obtain. These processes include:
Text-to-image: Generation of an image from a prompt. It is the most common way of using SD. In its most basic form, one can use a prompt along with a quantity of steps and CFG_scale as well as controlling the composition of the image through the use of other conditions such as ControNet, LoRAs, Negative Embeddings, VAEs, etc.
Image-to-image: Generation from a conditioning image together with a prompt optional.
Inpainting and outpainting: SD can be used to retouch or expand images by selecting sectors and limiting the AI generation to that specific sector. A very popular example of this process is Photoshop's new AI Generative Fill that works in the same way.
Text-to-video: Using SD extensions such as Deforum or AnimateDiff it is possible to create animations, videos and stop motions.
Image-to-video: With these same extensions it is possible to introduce an initial image and animate it in its entirety, animate different sectors of it or use it as a starting frame then transform it into something else.
Video-editing: Using all the frames from a video likeheatIt is possible to edit it by changing the style, transforming characters into others, adding special effects or retouching the definition, lighting and colorimetry.
To work with SD in A1111 or ConfyUI we can install these on our personal computer or use one of the numerous online distributions that exist today. In addition to the software we will need an SD model. We can use the base model from Stability.AI or download from the huge number of models trained by artists around the world, as well as train our own custom models with Kohya software or merge different models in A1111. The possibilities are endless and depend only on our will and creativity.
Prompt Engineering
Perhaps the most discussed but least understood aspect is the famous prompt. A prompt is simply a line of text or string, that is, it is a set of characters grouped into words that has a specific meaning in the context in which it is used. They are not a recent invention or exclusive to AIs. Prompts are as old as programming languages and command-line interface computers like UNIX or DOS. They are used to communicate effectively with the computer system you are working with so that it executes a specific task.
In the context of image-generating AIs,prompt is a sequence of words that the algorithm text conditioning transforms into vectors of semantic information to condition the generated image.
The engineering of prompt It is a discipline that seeks the optimization of prompts to communicate efficiently and effectively with AI models. It is an important skill for developing, training and using AI models. The quality of the AI results will depend on how much information one provides to the model and how well the model is designed.prompt. Both the lack of information and its overabundance can lead to undesirable results. Knowing how to write or intelligently design a prompt will help the output of AI correctly reflect the intentions of the artist who is using it.
There is an unfounded belief that writing a prompt is something simple like describing an image to another human, but reality is far from being like that. First of all, even among humans, it is very difficult for one to be able to describe an image and another to reproduce it as we have it in our minds. A typical case of this are police portrait artists. But it's even harder to explain an image to a computer system that doesn't recognize the elements of an image the same way we do.
The ideal syntax of a prompt depends on many factors, mainly the AI model we are using. It is very different to design a prompt for SD, for DallE or Midjourney and a dedicated artist must read the documentation for each AI to know what keywords, arguments and syntax each system uses. For example, to know how to specify a certain style, the compositional plan, number of characters, etc. It is also necessary to understand that in a prompt The order of the factors alters the product and the arrangement of the words in the sentence will alter the way I interpreted it. On the other hand, most AIs have ways of adding a differentiated weight to each word through which we can indicate which are the most important parts of our prompt and it is important to know the specific syntax for each AI.
All these details make writing a prompt Adequately reflecting our intentions is not a simple task when the artist's or client's requirements are very specific. There are numerous criticisms of the creative work of AI image generation, pointing out things like “writing a prompt doesn't make you an artist”, ignoring that it is not that easy to communicate effectively with an AI and even if it were, AIs do not produce anything on their own. They do not have true autonomy or creativity, creativity is born from the artist who decides to use them and creates the prompts and conditioning necessary to control the compositions of the generated images.
Controversies
It is public knowledge that advances in AI are generating a series of controversies associated with the danger that its indiscriminate use can cause. The technology is so new that there is no legislation regarding it in Argentina or in most of the world. Fears of AI range from the automation of jobs, the loss of individual privacy, the manipulation of public opinion through deep fakes, the problems regarding the copyright of the images of the datasets and the generation of illegal content. It is not the objective of this work to provide answers to all these unknowns, but I believe that we must separate the true dangers from unfounded fears.
I decided to carry out a survey associated with this research work among my followers on social networks. At least 82.9% of those surveyed said they were a little worried about the advancement and future of AI. 65.9% have some concern about being replaced in their job. 81.6% are somewhat concerned about the use of images of individuals and at least 80% are concerned about the use of copyrighted images.
Image 5 – IA Survey / Survey results as of December 7, 2023
In my opinion there is a part of these fears that are unfounded, mainly conditioned by prejudices and works of science fiction. Like any other technological advance, its use and its possibilities depend on us, the users. It is important to establish a critical and informed view on these issues and advocate for coherent regulations that limit the predatory actions of large corporations, governments or malicious individuals, but encourage research, development and free work by independent professionals and artists.
In relation to the criticism regarding the copyright of the images used in the datasets To date, this type of image use does not constitute dishonest use under the regulations of Argentina, the United States or Europe. It must be taken into account that these images are only used at the time of training and are not distributed along with the AI once it is implemented or shared. As the neural network generates completely new images, the images of the dataset At no time are they reproduced or falsified.
It can be claimed that artists whose works are used to train AI should receive financial compensation or profits for that use, but, since there is no legislation in this regard, this discussion remains entirely within the framework of morality and not legality.
My personal opinion within this new paradigm is that there can be no greater honor for an artist than for his art to be used by others to produce new art. It is something desirable and expected that if his style is so iconic and significant the new artist wants to use it and reproduce it to become part of the digital aesthetic ecosystem. Benito Quinquela Martin is my favorite painter and I can't think of a better way to honor him than to do everything possible so that his style lasts over time and is used by all young artists who want it. If each artist had to pay a fee to be inspired by the style of the great masters, access to artistic production would be relegated to those who can pay it and not to those who have the desire, ability and creativity to do so. Being inspired by the artists who came before us is a fundamental part of artistic work since art is art.
Certainly the loss of jobs is a valid concern. But just look at the examples of other industries that were automated decades ago, such as the automotive industry, which, despite being automated, did not end up destroying all jobs. On the contrary, numerous jobs that were better qualified than those were created. We can also take for example the 3D animation industry, which at the time put traditional 2d animation at bay, but in the long run did not end up triggering a catastrophe for the industry but on the contrary, allowed costs to be lowered and numerous productions to emerge that would have been impossible to afford with traditional methods. Traditional animation, far from disappearing, also grew thanks to the new 3D tools and today a large number of projects combine both techniques, taking advantage of the best of each tool.
Even so, it is evident that abuses can arise and it is necessary to establish regulations that take care of individuals without going against the progress and general well-being of the community. Image-generating AIs have a certain possibility of destroying jobs for artists or graphic designers but, at least today, the opposite is happening. There is still no news of large numbers of layoffs in the industry, instead of destroying jobs they are opening new ones. In this context of social networks, automation makes it possible to accelerate and democratize artistic production in the face of the constant need to generate new content. There are more and more job offers for artists and designers trained in the use of AI and, in my opinion, the only ones who will lose their jobs are those artists and designers who do not train in the use of these new technologies due to their own reluctance. to use them.
There are two truly worrying aspects of image-generating AIs. Thedeepfakes and the generation of illegal content such as child pornography.
If we wanted to develop an equivalent of the Turing Test for image-generating AIs, there is no doubt that it has already been largely passed. There are countless cases of images generated by AI that have managed to deceive the casual eye of viewers on social networks, and also the expert eye of critics, such as the case of the images generated by AI by the artist Boris Eldagsen which won the Sony World Contest. of Photography 2023.
We're just at the beginning of this revolution and we're already seeing an overabundance of AI-generated content being used in political campaigns around the world. Some of these images are perfectly valid and expected. The political propaganda poster that was once made by a cartoonist or a screen printer is now made by a digital artist with an AI. In this era of content curation algorithms on social networks, where the immediacy of viralization is succeeded by the immediacy of oblivion, the topics on the public agenda last less and less, and we artists are forced to generate more and more content fast.
A priori this is not a problem and it is expected that AI will be used to create images that function as communicators of the message that each politician wants to transmit to the population or that each company wants to advertise its product or service. We face the serious problem before the deep fakes, that is, those images generated by AI which are specifically made with the objective of confusing or distorting public opinion. An emblematic case of this is face impersonation or face swaps which is done to alter an image or video and make a sector of the population believe that this content is true.
The other worrying issue is the generation of immoral and illegal content such as child or zoophilic pornography. There are cases of AIs trained to generate this type of content on AI developer forums, which are in the same legal limbo as the rest of AI content. As the images do not represent real minors, in many cases they are not illegal and it is up to the moderators of these forums if they want to cancel these AIs and report or ban its creators.
Unfortunately, both this problem and deep fakes are very difficult to control a priori, it is only after the image has been produced and went viral that we can take actions to counteract their effect. In the case of the deep fakes It is almost impossible to stop the creators, what can be done, and in many cases is already being done, is to establish control measures on social networks that flag falsified videos and images so that users can identify them. I think that instead of eliminating them it is better to leave them so that all users know that they were altered with the aim of trying to educate the population as best as possible to recognize future images. In cases of generation of child pornography, it is urgent to establish legislation in this regard to be able to investigate the trainers of these AIs, it is evident that in order to train them they must have made datasets with real pornographic images and must be prosecuted for their possession.
The entity work of art and its relationship with the new viewer
We are at a turning point in the way we think and produce art. We are just at the beginning of a technological revolution unlike anything seen in photography, film or printing. A technological revolution that will give rise to a new artistic renaissance. But along with this advance, at times uncontrolled and overwhelming, there are numerous critical voices that seem to regurgitate the same arguments from Duchamp's time to oppose and detract or devalue artistic productions that use these tools. Returning to that almost idealistic and religious conception of creation from nothing, if the creative genius does not use paper and pencil he is less of an artist. If you do not bring images to life exclusively from the world of ideas you are a thief, ignoring that our very ideas are conditioned and are a product of the visual content that we have seen and absorbed from the culture that surrounds us.
As Jiménez says "In this context, images: of art or any other segment of reality, are available with an overabundance that has become fragmented." (Jiménez, 2002, p. 43) This overabundance of information allows anyone today to have access to practically the entire work of an artist and with these images train their own AI to produce content with that aesthetic, although there may be paintings hidden in private collections, what is on the internet is mass culture, therefore, for said culture if it does not exist on the internet it could have not been painted at all. I find many similarities between what we are experiencing today and the time when the technical reproducibility of art was achieved. It became massive and put in check the artistic conception of that moment.
“Art in general only makes sense if one is willing to accept the auratic value of works of art, but the nature of this aura, and the works in which it will be recognized, are things that have not stopped changing since there is art. Our era has associated the artistic aura on the one hand with the institution (the firm) and on the other hand with the “historically important” character of the works of the past; There is no doubt that this double definition is destined to be displaced in the future. In any case, it is worth neither more nor less than the conceptions that have preceded it.” (Aumont, 1992, p. 320).
Each era, each specific cultural situation, has understood different things as art, our definition changes, evolving along with human thought. The proliferation of image production procedures, characteristic of our century, has been profoundly transformed, and for quite some time now an artist can produce an infinite number of techniques beyond the brush, chisel and pencil.
AI puts us in a new paradigm with the relationship with the viewer. Thanks to AI, anyone with access to the internet can easily start producing images. Since the 20th century, artists have faced a different, active spectator. The idea of the public as a mere passive contemplator of art was lost a long time ago. “And precisely the configuration of what we can call a “new spectator” is one of the elements that most importantly intervenes in the profile of the art of our time.” (Jiménez, 2002, p. 49) What better way then to address this new active spectator than to make him a necessary participant in the creation of the work itself? AIs give us the possibility of unleashing the creativity of the masses in an unprecedented way. There is no longer a need for schools, teachers and hundreds of hours in front of a stage, simply by having an idea and access to the Internet you can create a work. This radically democratizes artistic production. Enabling production not only for anyone beyond their experience but also enabling access for people with reduced mobility or motor disabilities. It is no longer necessary to move or use our hands to produce images, it is as simple as thinking about the work to make it exist.
This productive capacity is what defines this new era. The generative era of art. The entity work of art as such ceases to exist. It had already lost its aura at the time of its technical reproducibility, now it has directly lost all entities. How much is a Mona Lisa worth if one can create infinite ones in an instant?
The artwork object no longer matters. The important thing to be an artist in this digital age of social networks is not the artwork itself but the ability to generate new artwork. We live in a time where the autonomy of artistic authorship is separated from the work of art. The images gain importance not because of their authorship but because of their level of dissemination on networks and media. The living image is the viral image, even memes, regardless of who the author was.
Artificial aesthetics
The current panorama of digital production forces artists and viewers to rethink the conventions that we understand by contemporary styles and/or aesthetics. One of the criticisms made against AI production is the lack of innovation or the regurgitation of old styles, dictated by the dataset with which the AI was trained.
AI trained with the style of a single artist, for example, Quinquela_IA trained for this work, found it almost impossible to produce images outside of this style. If the dataset and training were done correctly these AIs are usually excellent at producing new images within the style of the chosen artist. An expert eye may be able to identify that the image is not a real work by the artist, but the vast majority of viewers will not be able to differentiate it. We could hastily affirm that it is true that AIs cannot produce new content outside of the style with which they were trained, since the process of diffusion of the latent space tries to bring this vector construction of information as close as possible to the images. of the dataset. This produces images that can look very convincing but often have digital artifacts such as mears or blurs that a trained eye can easily detect. Therefore, these AIs cannot generate a completely new style and produce images within certain repetitive aesthetic parameters, what Lev Manovich calls computational mannerism.
Although in the first instance this criticism could be valid, it is very easy to demolish it due to the simple fact that AIs are almost never used in isolation but rather as part of a more complex AI system. For example, returning to the case of the Mona Lisas, Benito Quinquela Martin practically never painted portraits. Although at the beginning of his career he made some naturalistic portraits, once he developed his characteristic style of high-material palette painting, the rest of his career was dedicated almost exclusively to the urban landscaping of the La Boca neighborhood. This is why if one tries to make a portrait with Quinquela_IA exclusively through prompting These will be of poor quality. But this impossibility is easily overcome by combining AI with other neural networks such as ControlNet, developed specifically to control image composition. In this way one can use the image of Leonardo Da Vinci's Mona Lisa as a reference through a ControlNet model called OpenPose which analyzes the reference image, identifies the location of the face, body and hands of the portrayed subject and introduces it into the latent space of the NN this information. In this way the result will be conditioned to the extent that we want by the position of the body of the analyzed image of the Mona Lisa, resulting in a Mona Lisa in the style of Benito Quinquela Martin. In this way we can create something completely new that goes outside the images of the dataset by Quinquela_IA. This is just a simple example, there are countless ways besides ControlNet to combine and condition AI models. We can mix different AIs with each other, we can condition different parts of the image with other images or videos, we can use Negative Embeddings or prompt negatives to condition the AI negatively and distance it from what we do not want it to produce. The possibilities and results can be as vast as anything our imagination can conceive.
Image 6 – ControlNet example - OpenPose / From left to right La Gioconda by Leonardo Da Vinci (1503-1519), ControlNet analysis result – OpenPose, Gioconda produced with Quinquela_IA conditioned by OpenPose
What at first glance seems to be a valid criticism is once again another criticism that is made out of ignorance, belittling the artistic work of those who wish to use these tools to their full potential to create images that seem aesthetically pleasing. Not only is it possible to combine AI models with each other to create completely new styles, but it is also possible to use any number of reference images and videos in any number of ways. Combining as many styles, references and digital techniques as one wishes. For example, an increasingly common practice among AI artists to control composition is the photobashing, through which it is carried out a quick digital collage in any digital design tool and then use that image as the basis for AI generation. Is someone who makes a collage by cutting out magazines less of an artist than someone who uses Photoshop and then an AI?
It is only a matter of time before all these aesthetic details that a critic can recognize in an image produced by AI are surpassed and even the best trained eye is easily fooled. Not only in relation to two-dimensional images, but also enormous advances are being made in the field of video generation, 3D models, audio effects, music, voice and text. Instead of replacing human creativity, AI has the ability to enhance it exponentially. We are just at the beginning of this renaissance and the sky is the limit.
“Artificial aesthetics can be described as an augmentation of our aesthetic abilities, deepening both our creative processes and our understanding and sensitivity to cultural artifacts. These advanced systems are then a further evolution of devices already used in the creative field, such as graphic programs, computer-aided design technology, music software, etc.
If in the traditional sense the media are extensions of the human senses,
then AI is a further extension of human capabilities to mediate between
us and the world.” (Arielli and Manovich, 2021, p. 9).
Conclusions
Both this work and these conclusions are partial and will continue to be reviewed and edited in successive installments as my research progresses. The speed of technological advances forces us to constantly train ourselves in the latest developments as well as to rethink our own dogmas countless times. Although we are facing a profound dilemma, my personal opinion is optimistic. I think we are heading towards a panorama of enormous individual and collective creativity.
Artists will be able to use AI as a tool to outsource production processes, allowing them to focus on the areas in which they are strongest, producing higher quality content in less time. Enormously lowering the production costs associated with the production time of traditional work. By delegating the generation of the image, the artist can give free rein to his creativity to think about the work, previously tied by the laborious task of training the hands to respond correctly to the whims of the mind. Additionally, the use of AI will allow artists to quickly train themselves in the new tools and techniques they wish to learn, giving them the ability to expand their range of skills and creative horizons.
Artificial intelligence is nothing but intelligence created by humans, therefore art created by AI is art created by humans. Although in certain sectors there is a lot of rejection of works generated with AI, our concept of art is fluid, it is a product of context and history, it is not limited to what one or more people consider art, nor to the canons that we have been reproducing. As time passes and these technologies are increasingly adopted, society's aesthetic preferences will adapt and accept these new aesthetic products.
The Semantic Web 3.0 will be dominated by AI tools. Given this scenario, it is important to attack possible biases and it is necessary for artists and inhabitants of Latin America to appropriate these tools and transform them with our own imprint. Otherwise we run the risk of being relegated and discriminated against by the central economies again.
When photography emerged, the world of painting undoubtedly suffered a shock. But rather than bringing about the end of painting, photography emancipated painting from representation, allowing artists to move away from portraiture and naturalism to explore new creative horizons throughout the 19th and 20th centuries. Process that culminated in abstract expressionism, perhaps the most essential form of painting for painting's sake, devoid of any attempt at representation.
I believe AI will lead to a similar process, allowing creative horizons to expand in ways impossible to predict today. Less than a decade ago, AIs were still cases of scientific study within a laboratory; today they are available to anyone. Who can imagine what we artists will be producing thanks to AI in ten, twenty or fifty years?
It is our duty as artists to appropriate technological advances so that they do not remain exclusively in the hands of the market. Art is art because it bothers, because it transgresses our theoretical and regulatory frameworks. Behind all the dogmatic arguments, such as the theft of images from other artists or the devaluation of digital work, hides the same quasi-religious thought that criticized Duchamp a hundred years ago and is regurgitated every time a new paradigm is on the horizon.
We are entering the era of the technical generativity of the work of art, we will see what dogmas will be put in check this time.
Bibliographic references
Jiménez, J. (2002) Theory of art. Technos Publishing House
Aumont, J. (1992) The image. Paidos Editions
Benjamin, W. (1936). The work of art at the time of its technical reproducibility (Das Kunstwerk im Zeitalter seiner technischen Reproduzierbarkeit in its original German). Translated by Andrew E. Weikert (2003). Ithaca Publishing House.
HUAWEI. (2023). HCIA-AI V3.5 Training Material.
Manovich, L., Arielli E. (2021 - 2024) Artificial Aesthetics: Generative AI, Art, and Visual Media.
Wiggers, K. (August 12, 2022). This startup is setting a DALL-E 2-like AI free, consequences be damned. Techcrunch. Retrieved on December 1, 2023 inhttps://techcrunch.com/2022/08/12/a-startup-wants-to-democratize-the-tech-behind-dall-e-2-consequences-be-damned/
Prompt Engineering Guide. Retrieved on December 1, 2023 inhttps://www.promptingguide.ai/es
An Introduction to Diffusion Models for Machine Learning. Retrieved on December 1, 2023 inhttps://encord.com/blog/diffusion-models/
Stable Diffusion Guide. Retrieved on December 1, 2023 inhttps://stable-diffusion-art.com/beginners-guide/
Stable Diffusion Documentation. Retrieved on December 1, 2023 inhttps://huggingface.co/stabilityai
Midjourney Prompt Guide. Retrieved on December 1, 2023 inhttps://docs.midjourney.com/docs/prompts-2
The Other LoRA Rentry Guy. (August 7, 2023). The Other LoRA Training Rentry. Retrieved on December 1, 2023 inhttps://rentry.org/59xed3#starting-settings
Ruarte, N. (December 7, 2023). AI Survey. Retrieved on December 7, 2023 inhttps://drive.google.com/file/d/1tYrFk48QTWnEaQUzMKTxcX2KjdLq3zys/view?usp=sharing
He won a world prize for photography using artificial intelligence, felt guilty and told the truth. Retrieved on December 1, 2023 inhttps://cnnespanol.cnn.com/video/inteligencia-artificial-premio-fotografia-sony-boris-eldagsen-rechazo-controversia-alejandra-oraa-cafe-cnn/#:~:text=Tecnolog%C3%ADa-,Gan%C3%B3%20un%20premio%20mundial%20de%20fotograf%C3%ADa%20usando%20inteligencia%20artificial%2C%20sinti%C3%B3,por%20inteligencia%20artificial%20(IA).