Sign In

Behind "Deceiving World": A Simulated Interview with Director Mohamed Beriky | Part 2

3
Behind "Deceiving World": A Simulated Interview with Director Mohamed Beriky | Part 2

Clarification: In this article, we’ve chosen to present the content in a conversational format, simulating an interview with director Mohamed Beriky. This creative approach is intended to bring his insights and vision behind "Deceiving World" to life in an engaging and interactive way, while also serving as an educational tool for those interested in exploring the creative process and AI-driven production techniques behind the scenes of "Deceiving World".

Behind Deceiving World | Part 2: Exploring AI Production Techniques and Tools

Host: In the first part of our interview, we talked about the songwriting and narrative of Deceiving World. If you haven’t read it yet, you can find it here. Now, let’s begin the second part of our interview with this question: Mohamed, how did you bring the narrative of Deceiving World to life through the AI production process?

 

Mohamed Beriky: The main challenge was creating an AI-driven music video that truly felt cinematic. It wasn’t just about using AI for the sake of it or following trends—it was about leveraging AI tools to serve the narrative, evoking both emotional and visual resonance. My ultimate goal was to craft a cohesive and impactful AI-driven music video while preserving the timeless essence of classic cinematic storytelling.

 

Host: That sounds fascinating. Could you explain how you achieved this?

 

Mohamed Beriky: Would you prefer that I start with my directing vision for maintaining the cinematic look, or should I begin with the AI capabilities utilized in the production?

 

Host: Let’s start with AI production and save the cinematic vision for the third part of our interview, which will be added as an article here on Civitai next week.

 

Mohamed Beriky: Ok, then. I began the AI production process by crafting scenes that didn’t require consistent characters. Using AI visual generation tools like Leonardo and Flux, I created images that perfectly aligned with those scenes, and then animated them into motion videos using image-to-video technique with tools like Kling and Minimax Hailuo. Precision was critical at every step—highly detailed prompts guided everything, from character actions to set design, lighting, camera movements, and specific visual effects. This meticulous approach ensured that every scene stayed true to the vision I had in my mind.

 

The real challenge arose with the main characters, who formed the emotional core of the story. Maintaining their consistency across multiple scenes was paramount. I faced a crucial decision: Should I use real people photographed in custom-designed settings, then animate these scenes into motion videos with AI tools? Or should I rely entirely on AI-generated fictional characters in fully fictional settings?

Producing photographic scenes with real people offered authenticity and realism but presented logistical challenges, such as contracting and obtaining permissions, managing photography costs, and creating custom-designed settings—all of which would significantly impact the budget and timeline. On the other hand, AI-generated fictional characters provided greater creative freedom but risked losing the realism I aimed to achieve.

Ultimately, since the central characters were all inherently similar, with subtle differences as if they were twins or derived from one character, I chose to blend both approaches. I used my own likeness for the main characters, integrating them into AI-custom-designed settings created through AI photography. This decision provided creative control, reduced costs, and preserved the personal, authentic touch essential to the story.

This approach was implemented through two key solutions: First, training the Flux-Lora AI model with personal photos of myself; second, utilizing Art Flow AI photography to transform my likeness into customized scenes that included character actions, locations, and outfits. The resulting images were then animated into motion videos using image-to-video technique with tools like Kling and Minimax Hailuo. This combination allowed us to create consistent and authentic characters capable of carrying the emotional weight of the narrative.

This process wasn’t just a technical challenge—it was a commitment to preserving the story’s integrity. By merging my personal identity with the adaptability of AI, we bridged the gap between innovation and authenticity. This approach allowed Deceiving World to stand out, not just as a technological achievement, but as a testament to how storytelling can be elevated through innovation.

Host: That’s truly remarkable, Mohamed. It’s inspiring to see how deeply committed you were to preserving the story’s integrity while pushing the boundaries of innovation. Bridging the gap between your personal identity and the adaptability of AI adds such a unique and authentic touch to Deceiving World. Were there other steps or elements you took to ensure authenticity and realism in the project?

Mohamed Beriky: Yes, we ensured authenticity by grounding the locations in reality. Scenes were set in familiar and relatable places such as a bus station, inside a café, on a bus, in a car, in a home, in a place of worship, on streets, and more. Additionally, we generated a large number of images—sometimes reaching hundreds—for each scene, carefully selecting the perfect ones that aligned with the authenticity and realism we aimed to achieve. This grounded approach was balanced with a restrained use of visual effects, applied only when necessary, to maintain an authentic and immersive feel.

Role/Character: Singer (Hero) | Location: Bus Station

Role/Character: Silent Observer (Villain) | Location: A top of skyscraper

Role/Character: Stubborn (Supporting Character) | Location: Office

Role/Character: Repentant (Supporting Character) | Location: Place of Worship

Role/Character: Manipulated (Supporting Character) | Location: Media Stage

Role/Character: Seeker (Supporting Character) | Location: Café

Role/Character: Traveler (Supporting Character) | Location: in Car

Role/Character: Worker (Supporting Character) | Location: in Bus

Mohamed Beriky (Continued): Another element that added to the authenticity and realism was integrating the art cover of the audio release of the song Deceiving World into various scenes in the story. We approached this process very professionally by using real photographs rather than AI-generated images. These photographs were carefully edited and enhanced with visual effects before being animated into motion videos using image-to-video technique with tools like Kling and Minimax Hailuo.

Art Cover of the Audio Release

Scenes Integrating the Art Cover into the Music Video Story

1. Music Catalog in a Store

The art cover was prominently featured as the cover of a music catalog in a store. This placement tied the visual design of the audio release to a familiar and relatable physical setting, creating a tangible connection for the audience and enhancing the narrative's realism.

2. Article in a Newspaper

The art cover also appeared as an article in a newspaper, symbolizing the dissemination of information and highlighting its importance as a recurring visual motif. This integration emphasized the narrative’s themes while grounding them in everyday life.

"For more on the use of ‘silent’ instead of ‘silence,’ see Part 1 of the interview on "grammar as a storytelling tool"

3. Drawing with Milk on a Coffee Cup

In a creative integration, the art cover was depicted as a drawing made with milk on top of a coffee cup. This subtle yet meaningful inclusion added an artistic and symbolic layer, tying the imagery to ordinary moments and enhancing the narrative’s connection to reality. The symbolism behind this creative choice represented the comforting lies society habitually accepts daily.

Mohamed Beriky (Continued): This meticulous integration ensured that the art cover’s presence felt intentional and seamlessly connected to the narrative’s visual and thematic identity.

Host: Your dedication to authenticity and the meticulous integration of AI technology is truly inspiring, Mohamed, and it’s evident when watching the video. It’s fascinating to see how these elements enhance the storytelling. Could you walk us through the process for one scene—from text prompt to motion video?

 

Mohamed Beriky: Of course. Let me share an example:

Prompt: Nighttime interior scene set in a vintage, dimly lit room with dark wooden furniture and textured walls. The warm, ambient glow of a desk lamp casts soft, diffused shadows, creating a dramatic interplay of light and darkness. The atmosphere is heavy with tension and mystery, with shadows dominating the corners. The supporting character, playing the role of a man burdened with guilt, is in his mid-40s. He sits at a desk, wearing a white shirt and a dark vest. His slicked-back, salt-and-pepper hair and neatly groomed beard add to his commanding presence. Leaning slightly forward, he grips a lit cigar between his teeth as smoke spirals gently into the air. His intense gaze, directed slightly off-camera, conveys suppressed emotion. The camera captures this moment in a static medium close-up (MCU) at eye level, emphasizing intricate details of his face, the curling smoke, and the textured ambient lighting.

To bring this scene to life, we began with a detailed text prompt describing every element—The prompt includes lighting (warm ambient glow, diffused shadows, interplay of light and darkness), textures (dark wooden furniture, textured walls), character appearance (mid-40s man with slicked-back salt-and-pepper hair, neatly groomed beard, white shirt, dark vest), emotional undertone (tension, mystery, guilt, suppressed emotion), props (lit cigar with spiraling smoke), set design (vintage, dimly lit room), camera composition (static medium close-up at eye level), and symbolic elements (cigar smoke representing inner turmoil). Using Leonardo, we generated an initial visual concept.

Based on Leonardo's output, we refined and enhanced the prompt. To produce the scene, including my likeness as the character, we used AI photography with Artflow, trained on the Flux-Lora model using my personal images, while also using Leonardo's output image as a pose reference in Artflow.

Once the static scene (image) was finalized, we upscaled it using Leonardo’s universal upscaler. From there, we transitioned to motion using image-to-video technique into Kling and Minimax Hailuo. We produced multiple versions with each tool, carefully selecting the best output.

 

Host: What criteria did you use to determine the best motion video output in this particular example?

Mohamed Beriky: There are some general criteria for all outputs to ensure quality. These include avoiding common generation mistakes, such as inaccuracies in the character’s body—like hand fingers or facial details—and ensuring smooth camera movements. Additionally, the scene must feel cohesive, with no out-of-place or strange elements. Most importantly, each output must align precisely with the input prompt. For example, in this specific scene, the best output was determined by smooth smoke diffusion, accurate shadow interplay, and consistency in the character’s interaction with elements like the cigar. It was essential to avoid any errors and create a scene that felt alive, with an atmosphere perfectly aligned with the narrative’s tone. The character, who plays the role of someone burdened with guilt, needed to convey this emotion through the environment and his actions, ensuring the scene resonated emotionally with viewers.

 

Mohamed Beriky (Continued): I will share with you some examples of the outputs we received.

1- Looking Away (Incorrect)

In one output video, the character was looking away from the camera, which diminished the intended focus and emotional intensity. As a result, we decided to reject this output.

2- Looking Directly at the Camera (Incorrect)

In one output video, the character looks directly at the camera, creating an engaging moment that suggests a reaction or decision-making. However, this shifts the tone and disrupts the intended narrative flow and immersion.

3- Smoke Unrealistically Emanating from the Mouth (Incorrect)

In one output video, the smoke effect appeared artificial, failing to align with the realistic tone of the scene.

4- Fire Emanating from the Mouth (Incorrect)

The fire effect is highly unrealistic, breaking the grounded tone and authenticity of the scene.

5- Character’s Pose Gazing Off to the Side (Incorrect)

In one output video, the character’s pose was incorrect, as gazing off to the side detracted from the scene’s intended focus and emotional impact.

6- Perfectly Convey the Tone and Intensity of the Scene (Correct)

The character’s gaze and posture perfectly convey the tone and intensity of the scene, aligning seamlessly with the narrative's emotional depth and weight.

Host: You’ve highlighted some impressive techniques so far. I’d love to hear more about the technical processes. How did you address the duration constraints of AI-generated clips? Also, how did you achieve such precise lip-syncing for the singer character?

Mohamed Beriky: During the production phase of the music video (October–December 2024), most AI tools could only generate short clips, typically lasting around 5 to 10 seconds. To overcome this, we used the extended option in Kling. In addition, we applied the Last Frame Linking Technique in Minimax Hailuo. This approach allowed us to successfully extend the duration of AI-generated scenes.

Prompt: A slow tracking shot reveals the scene: in the background, two large screens dominate the stage, one projecting a superhero in bold red armor, the other a villain in a green suit, symbolizing duality. In the foreground, a man stands on stage, dressed in a sharp blue blazer. As the camera moves forward, it smoothly focuses on the man as he turns his face toward the lens. The tracking stops as his eyes fill the frame in an extreme close-up. His expression is serious and contemplative, with dramatic lighting casting deep shadows across his face. The shot captures the weight of his internal struggle as he stands in the spotlight, embodying the tension between the hero and villain archetypes displayed behind him.

Host: Sorry to interrupt, but what do you mean by the Last Frame Linking Technique?

 

Mohamed Beriky: At the end of a scene planned to be extended, we captured the final frame and used it as the starting point to generate the next clip. This ensured visual continuity, allowing us to seamlessly stitch together multiple clips in post-production to create extended, fluid scenes that felt cohesive and uninterrupted.

 

Prompt: This image is a last frame from previous scene set in a war-torn city at night, the scene is drenched in heavy rain, with streets lined by rain-soaked tents and makeshift shelters, faintly illuminated by distant, flickering fires casting eerie shadows across the devastation. The camera starts low, focusing on a young child sitting alone beside one of the tents, his small figure hunched over with hands clasped tightly around his knees. Tears stream down his face, his expression distant and filled with grief, as he stares blankly ahead, lost in the pain of losing his family. in this scene that we need to generate, The camera then begins a slow upward-right movement, panning away from the child to reveal a massive, imposing billboard on a nearby building. The billboard displays a group of deceivers dressed in sleek black attire, wearing golden masks with sinister, deceiving smiles. Around and Above in the dark sky, reaction icons of social media drift and fly through the air, illuminated by the fires below, creating an unsettling media-heavy aesthetic that hovers over the ruined city. This rising shot captures the haunting contrast between the manipulation of the powerful and the profound sorrow below, amplifying the tension between deception and despair.


Host: Great! And what about the lip-syncing?

 

Mohamed Beriky: For the singer character, achieving precise lip-syncing was crucial to maintain the performance’s emotional resonance and realism. We utilized the Lip Sync or Dub feature in Kling to synchronize lip movements and facial expressions with the song’s lyrics. The process was straightforward and easy: we uploaded the audio in parts corresponding to the generated scenes for the singer character. Kling then enabled us to produce a polished and authentic connection between the audio and visuals. This meticulous synchronization elevated the overall production, making the singer’s performance both engaging and emotionally impactful.

Host: As we wrap up, could you quickly delve into the visual effects created by AI in Deceiving World? How did they enhance the narrative and elevate the overall impact of the music video?

Mohamed Beriky: The visual effects created by AI in Deceiving World were essential in amplifying the narrative’s symbolism and emotional depth. They allowed us to visually translate abstract themes into impactful, memorable imagery, deeply connecting with the audience.

For instance:

  • The dragon shielding the villain symbolized the false refuge of passivity and avoidance, reflecting how silence can act as a protective barrier for complicit individuals.

  • The media studio flooded with murky water represented the unchecked spread of lies and misinformation, a metaphor for how deception corrupts even trusted platforms meant to inform and educate.

  • The tigers marching with a determined character depicted the first steps toward collective awareness and the courage to confront entrenched falsehoods.

  • The dinosaur shattering a golden mask on a billboard, revealing a sinister face beneath, underscored the necessity of exposing deeply embedded lies and hidden truths.

  • The massive flood engulfing the city symbolized the overwhelming consequences of silence, deception, and inaction. Like water behind a dam, these forces build up over time until they become unstoppable, engulfing everything in their path.

  • The silent villain escaping with a parachute portrayed the futile attempts of complicit individuals to evade accountability.

  • The transformation of creatures, from tigers to lions, represented the collective shift from passivity to action, symbolizing courage and unity.

  • The villain falling into the flood served as a final, dramatic reminder of the destructive consequences of complicity and inaction.

Mohamed Beriky (Continued): Each of these AI-generated effects was not just visually striking but also deeply connected to the overarching themes of the music video. They enriched the storytelling by making the abstract tangible, immersing the audience in the emotional and symbolic layers of Deceiving World.

Host: Thank you, Mohamed, for sharing these fascinating insights into your AI production process. You’ve demonstrated how seamlessly cutting-edge technology can blend with creative vision, making Deceiving World an even more compelling experience.

Before we wrap up, I’m excited to announce that the final part of our conversation with director Mohamed Beriky, exploring his journey of directing and producing Deceiving World, will be published next week on Civitai. In Part 3, we’ll dive into the cinematic techniques and the director's style that brought Deceiving World to life. Stay tuned—it’s a conclusion you won’t want to miss!

 

The Project Page on FilmFreeway: https://filmfreeway.com/DeceivingWorld

🔷 Behind "Deceiving World": A Simulated Interview with Director Mohamed Beriky | Part 1: https://civitai.com/articles/10489/behind-deceiving-world-a-simulated-interview-with-director-mohamed-beriky-or-part-1

🔷 Behind "Deceiving World": A Simulated Interview with Director Mohamed Beriky | Part 3: https://civitai.com/articles/10630/behind-deceiving-world-a-simulated-interview-with-director-mohamed-beriky-or-part-3

🔷"Deceiving World" Music Video Submission for Odyssey Project: https://civitai.com/images/49189873

🔷 Behind-the-scenes video: https://civitai.com/images/52663645

🔷 Video Release (On Beriky Studios YouTube Artist Channel)

🔷 Video Release (On VEVO)

Audio Release (On Spotify, Apple Music, Amazon Music, and YouTube Music): https://artists.landr.com/055855700827

3

Comments