VINO

A Unified Visual Generator with Interleaved OmniModal Context

1Shanghai Jiao Tong University, 2Kling Team, Kuaishou Technology, 3Nanyang Technology University, ✉️Corresponding Author

Instruct Image Editing
Hover to Edit: Move your mouse over the images to execute the editing instructions.
Before After
Add a small, dark rowboat to the left of the sun's reflection, casting a subtle dark reflection on the water.
Before After
Change the robe to deep emerald silk, matching the lighting and folds.
Before After
Zoom in to center the graffiti face's eyes and nose, highlighting the details and texture while keeping the dark, moody aesthetic.
Before After
Change the Smart car to vibrant matte lime green with black trim, matching the bright daylight and shadows.
Before After
Extend the canvas with light beige wall and light cream floor, keeping the soft lighting and shadows.
Before After
Change the baseball cap to a dark blue knit beanie, matching the lighting and texture.
Before After
Add a small dark blue ceramic water bowl with clear water next to the dog, ensuring reflections and shadows match.
Before After
Change the overcast sky to bright, clear blue, casting warm shadows and illuminating the brickwork and snow with golden sunlight.
Before After
Add Van Gogh-style swirls, focusing on clouds, water, and sunset hues.
Before After
Add rainy, overcast lighting, rain streaks, wet field with water reflections, and damp uniforms.
Before After
Add frost and snow to the water and dragon statue, cool down the colors, and add winter lighting.
Before After
Change the sky to a dramatic overcast NYC dusk with muted art deco skyscrapers and soft lighting, matching Grand Central Terminal's ambiance.
Before After
Change the plain blue background to a vibrant, blurred garden with red, orange, and purple flowers and green foliage, complementing the hummingbird's details.
Before After
Add soft, shimmery golden-bronze eyeshadow, glossy coral lipstick, warm peach blush, and bright, sun-kissed natural daylight.
Before After
Change the trumpet to gleaming gold, with realistic reflections of the surroundings.
Before After
Add a rustic, light-colored wooden table with soft, warm natural light from the left and a subtly blurred, cozy cafe background.
Before After
Replace the orange sky with cool blue-grey, add snow to rocks and land, show partially frozen ocean with ice floes, and dress the figure in a thick coat, scarf, and hat.
Before After
Replace the plain white background with a dynamic, dimly lit gaming desk environment, featuring a high-resolution monitor displaying a vibrant game scene, a mechanical keyboard with subtle RGB backlighting, and a textured mouse pad, ensuring the gaming mouse remains sharply focused and realistically illuminated with appropriate reflections and shadows on the new surface.
Before After
Draw bold, dark outlines on buildings and the carved head, simplify textures, and use vibrant colors for high contrast.
Before After
Add open, raised hands with palms up, maintaining realistic texture and lighting.
Before After
Change the kitchen sink to shiny stainless steel, removing all stains and reflecting light naturally.
Before After
Rotate the player's head slightly forward for a natural, direct gaze.
Before After
Dye Princess Aurora's dress from pink to soft light blue, matching the light and shadows.
Before After
Add dramatic golden hour lighting, enhancing contrast and deepening shadows.
Before After
Add bold outlines, vibrant cel-shaded colors to the cake and wine glass, exaggerated candle flames, and dramatic lighting.
Before After
Change the background to a sunny promenade with cafes, blurred people, and potted plants, keeping natural daylight and vibrant atmosphere.
Before After
Add a thick layer of snow, bare trees, and cool lighting to transform the scene into winter.
Before After
Change the sky to a vibrant dusk skyline with illuminated skyscrapers and a warm orange-purple gradient, keeping the foreground train and platform intact.
Multi-Image Conditioned Generation
ref1
Input Condition
Generate Image 1 walking on a forest path, reaching out to touch leaves while the camera follows from the side.
ref1
Input Condition
Generate Image 1 cutting fruit in a kitchen, filmed from the side.
ref1
Input Condition
Generate Image 1 nodding gently and swaying slightly to the rhythm while listening to music.
ref1
Input Condition
Generate Image 1 leaning on the railing, gazing at the river while the wind moves the edge of their coat.
ref1
Input Condition
Generate Image 1 holding the bus handrail and swaying slightly with the vehicle's motion.
ref1
Input Condition
Make Image 1 weave through the forest at speed, with sunlight flickering through the trees
ref1
Input Condition
Make Image 1 sit on a glowing sled at high speed through a cyberpunk night city, neon reflections shimmering on the snow as the camera tracks the motion.
ref1
Input Condition
Make Image_1 run through a steampunk factory with scattered metal parts and subtle mech-enhancements.
ref1
Input Condition
Make Image_1 push a round stone slab, causing it to slide slowly across the ground.
ref1
ref2
Input Condition
Make the person in Image_1 wearing the dress in Image_2 turn back and look at the camera.
ref1
ref2
Input Condition
Make the person in Image_1 wearing the dress in Image_2 turn back and look at the camera.
ref1
ref2
Input Condition
Make the person in Image_1 wearing the vest in Image_2 raise their arms for a gentle stretch indoors.
ref1
ref2
Input Condition
Make the person in Image_1 wearing the vest in Image_2 raise their arms for a gentle stretch indoors.
ref1
ref2
Input Condition
Image_1 lifts Image_2 to inspect the color of the liquid and gently swirls it.
ref1
ref2
Input Condition
Make the person in Image_1 wear the headphones in Image_2 and hold the ear cups lightly to listen closer.
ref1
ref2
Input Condition
Make the person in Image_1 wear the headphones in Image_2 and nod gently to the rhythm.
ref1
ref2
Input Condition
Make the person in Image_1 wearing the headphones in Image_2 nod rhythmically with comic motion lines.
ref1
ref2
Input Condition
Make the person in Image_1 hold the cosmetics in Image_2 and observe how the case reflects light.
ref1
ref2
ref3
Input Condition
Make the person in Image_1 hold the object in Image_3 while wearing the clothing in Image_2 and briefly show it to the camera.
ref1
ref2
ref3
Input Condition
Make the person in Image_1 hold the object in Image_3 while wearing the clothing in Image_2 and briefly show it to the camera.
ref1
ref2
ref3
Input Condition
Image_1, wearing Image_2, hold Image_3 and briefly show it to the camera.
ref1
ref2
ref3
Input Condition
Image_1, wearing Image_2, hold Image_3 and briefly show it to the camera.
ref1
ref2
Input Condition
Image_1 and Image_2 sit on a park bench, passing a small sketchbook between them. One draws while the other watches closely, smiling with gentle admiration.
ref1
ref2
ref3
Input Condition
Image_1 walks down a shaded alley, swinging Image_2 lightly at their side while tightening the strap of Image_3 over their shoulder, their movements steady and relaxed.
Instruction-Based Video Editing
Edit videos through instruction.
Reference Video
VINO
Transform the scene into a hand-drawn anime cel-shaded aesthetic with bold outlines, vibrant gradients, and exaggerated atmospheric depth, as if from a fantasy film.
Reference Video
VINO
Transform the interaction into a stylized comic book panel with bold outlines, speech bubbles, and exaggerated motion lines that emphasize the gesture and shared focus.
Reference Video
VINO
Render the entire video in the style of a 3D-animated stop-motion diorama, where the boombox is a meticulously crafted miniature, and the people are tiny papercraft figures in a cozy, dollhouse-sized room.
Reference Video
VINO
Replace the large hoop earrings with glowing crystal earrings that pulse with the room’s lighting.
Reference Video
VINO
Change the large glossy green leaves to translucent, glowing leaves with internal veins of light.
Reference Video
VINO
Replace the mustard-yellow knit beanie with a deep burgundy wool one with a pom-pom
Reference Video
VINO
Add a small, decorative ceramic jar filled with rose petals on the nigh stand beside the bed
Reference Video
VINO
Transform the scene into a surreal, surrealistic still life where the brushes are giant, living entities with personalities, and the hand is a miniature sculptor crafting masterpieces from light.
Reference Video
VINO
Style the video as a 3D animated fantasy where the gift boxes are portals to different worlds, and the girl is a chosen hero opening enchanted doors.
Reference Video
VINO
Infuse with the visual elements of the 3D Chibi style.
Reference Video
VINO
Render the entire video in a high-gloss, luxury fashion editorial style with dramatic lighting, soft shadows, and a diamond-dust texture overlay on the dancer’s leotard.
Reference Video
VINO
Let it be like the Ghibli style.
Reference Video
VINO
Replace the red jackets with deep indigo hooded coats with reflective zippers.
Reference Video
VINO
Change the fluffy white dog to a sleek silver-haired poodle with a red collar.
Reference Video
VINO
Render the entire video in a high-definition, surreal glass sculpture aesthetic, where every object and person appears as if carved from transparent, iridescent crystal with internal light reflections.
Reference Video
VINO
Replace the black long-sleeved top with a translucent gray hoodie that reveals faint outlines of the body.
Reference Video
VINO
Change the curly-haired individual’s hair to a flowing, silver-white cascade with faint sparkles.
Reference Video
VINO
Replace the sunscreen sun-shaped design on the back with a tattoo of a tropical bird.
Reference Video
VINO
Replace the metallic structure of the bridge with a glass and steel arch that reflects the sky and ice.
Reference Video
VINO
Alter the woman's hair from loose waves to a neat low bun.
Image Ref Video Editing
Edit videos by providing reference image
Ref Image
Reference Video
VINO
Put the hat from the image on the man in the video
Ref Image
Reference Video
VINO
Let the anime characters in the image sleep on the green space in the video
Ref Image
Reference Video
VINO
Put the cowboy hat from the image on the man wearing white clothes in the video
Ref Image
Reference Video
VINO
Replace the man in the video with the magical woman in the image
Ref Image
Reference Video
VINO
Let the man with mask in the video wear the mask in the image
Ref Image
Reference Video
VINO
Change the women's clothes in the video into those in the image
Ref Image
Reference Video
VINO
Replace the woman in the video with the female character in the image
Ref Image
Reference Video
VINO
Replace the woman at the back of the video with the game character in the image
Ref Image
Reference Video
VINO
Transform the figures of the two girls in the video into the robust figures of the girls in the image
Ref Image
Reference Video
VINO
Replace the black sunglasses on the woman's head in the video with the glasses in the image
Ref Image
Reference Video
VINO
Replace the clothes of the woman in the middle of the video with those in the reference image
Ref Image
Reference Video
VINO
Replace the pumpkins in the video with Halloween pumpkins in the image, and replace the surrounding environment with Halloween atmosphere
Ref Image
Reference Video
VINO
Replace the clothes of the woman in the video with the red assault suit shown in the image
Ref Image
Reference Video
VINO
Replace the woman in the video with the movie character in the image
Ref Image
Reference Video
VINO
Replace the scene style in the video with the anime style in the image
Ref Image
Reference Video
VINO
Replace the white chicken in the video with the cartoon style in the image, but keep the scene unchanged
Ref Image
Reference Video
VINO
Transform the guitar played by the girl in the video into the cartoon guitar in the image
Ref Image
Reference Video
VINO
Turn the Rice and vegetable roll eaten by children in the video into steamed buns in the picture
Ref Image
Reference Video
VINO
Replace the white backpack in the video with the gray backpack in the image
Ref Image
Reference Video
VINO
Put the gold chain from the image on the man in the video
Ref Image
Reference Video
VINO
Put helmets on everyone in the video as shown in the image
Ref Image
Reference Video
VINO
Put the watch from the image on the wrist closest to the camera in the video
Ref Image
Reference Video
VINO
Change the video style to the post apocalyptic style in the image, and a tsunami appears in the distance
Ref Image
Reference Video
VINO
Replace the house in the video with the temple in the image
Video generation driven by reference video
Generative videos by providing reference video (motion/expression/camera clone)
Ref Image
Reference Video
VINO
Based on the camera motion in the video, transfer that effect to this image to animate it.
Ref Image
Reference Video
VINO
Based on the camera motion in the video, transfer that effect to this image to animate it.
Ref Image
Reference Video
VINO
Based on the camera motion in the video, transfer that effect to this image to animate it.
Ref Image
Reference Video
VINO
Based on the camera motion in the video, transfer that effect to this image to animate it.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Refer the video's camera movements, apply those effects to this image.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.
Ref Image
Reference Video
VINO
Create a video from the image that replicates the motion in the video.