視頻生成小綜述起稿

Year 2018

March

1. Probabilistic Video Generation using Holistic Attribute Control https://arxiv.org/pdf/1803.08085.pdf

? ?a. Videos express highly structured spatio-temporal patterns of visual data. two factors:

? ? ? ? (i) temporally invariant (e.g., person identity), or slowly varying (e.g., activity), attribute-induced appearance, encoding the persistent content of each frame

? ? ? ? (ii) an interframe motion or scene dynamics (e.g., encoding evolution of the person ex- ecuting the action).

? ?b. VideoVAE

? ? ? ?video generation + future prediction.

? ? ? ?generates a video (short clip) by:

? ? ? ? ? ?decoding samples sequentially drawn from a latent space distribution into full video frames.

? ? ? ? ? ? ? -VAE: encoding/decoding frames into/from the latent space

? ? ? ? ? ? ? -RNN: model the dynamics in the latent space. ? ?

? ? ? ? improve the video generation consistency through temporally-conditional sampling and quality

? ? ? ? ? ? ? -structuring the latent space with attribute controls

? ? ? ? ? ? ? -ensuring that attributes can be both inferred and conditioned on during learning/generation


2.Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks


3.Every Smile is Unique: Landmark-Guided Diverse Smile Generation?



Year 2017


-By the Way

?I like this stanford homework paper http://cs231n.stanford.edu/reports/2017/pdfs/323.pdf

1. Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image


-spatial constructs <---- target image; dynamics <------source video sequence

?To preserve the spatial construct of the target image:

? ? ? ? ? ? ?- the appearance of the source video sequence is suppressed

? ? ? ? ? ? ?- only the dynamics are obtained before being imposed onto the target image. ?(using the proposed appearance suppressed dynamics feature.)

?the spatial and temporal consistencies are verified via two discriminator networks. ?

? ? ? ? ? ? ?- discriminator A validates the fidelity of the generated frames appearance,

? ? ? ? ? ? ?-? B validates the dynamic consistency of the generated video sequence.

Results:

? ? ? ? ? ? ?- successfully transferred arbitrary dynamics of the source video sequence onto a target image

? ? ? ? ? ? ?- maintained the spatial constructs (appearance) of the target image while generating spatially and temporally consistent video sequences.

Note: It is ### everything (Literature Review in its intro) because it is quite new.




2. Deep Video Generation, Prediction and Completion of Human Action Sequences https://arxiv.org/pdf/1711.08682.pdf


3. Video Generation from Text https://arxiv.org/pdf/1710.00421.pdf

-Hybrid VAE plus GAN

-Two parts:

-Static( Using gist to sketch text-conditioned background color and object layout (LSTM, RNN structure)

-Dynamic ( A text2Filter. )

-3.3 Text2Filter

-Note: Quite compact. Need time to digestilter


4. Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks

? ?https://arxiv.org/pdf/1709.07592.pdf



5. MoCoGAN: Decomposing Motion and Content for Video Generation

? ?https://arxiv.org/pdf/1707.04993.pdf





6. To Create What You Tell: Generating Videos from Captions

? ? https://www.microsoft.com/en-us/research/wp-content/uploads/2017/11/BNI02-panA.pdf


-Temporal GANs conditioning on Captions, namely TGANs-C

? ? ?- transformed into a frame sequence with 3D spatio-temporal convolutions.

? ? ? -? GAM evaluation metric ( Section 3.4 Experimental Setting)

- ?Model Architecture

? ? ? ? ? ? -3.1.1 Generator

? ? ? ? ? ? ? ? ? ? ?-Given a sentence ??, a bi-LSTM is utilized to contextually embed the input word sequence, ?+ a LSTM- based encoder to obtain the sentence representation S.?+ concatenated input of the sentence representation S and random noise variable z.synthesize realistic videos with these

? ? ? ? ? ? ?-3.1.2 The discriminator network ?? includes three discriminators:

? ? ? ? ? ? ? ? ? ? ? ? ? ?a.video discriminator classifying realistic videos from generated+ optimizes video-caption matching? ? ? ? ? ?

? ? ? ? ? ? ? ? ? ? ? ? ? ?b. frame discriminator( between real and fake frames)and aligning frames with the conditioning caption

? ? ? ? ? ? ? ? ? ? ? ? ? ?c. motion discriminator emphasizing that the adjacent frames in the generated videos run smoothly

? ? ? ? ? ? ? -3.1.3 The whole part trained with 3 losses:video-level matching-aware loss, frame-level matching-aware loss and temporal coherence loss



? ? ? ? ? ? ? ? ? ?.

? ?Year 2016


1. Generating Videos with Scene Dynamics

? ?? https://arxiv.org/abs/1609.02612

- a spatio-temporal convolutional architecture

- untangles the scene’s foreground from the background.

- experiments show the model internally learns useful features for recognizing actions with minimal supervision,

- scene dynamics are a promising signal for representation learning.

- Slides : https://pdfs.semanticscholar.org/presentation/7188/6726f0a1b4075a7213499f8f25d7c9fb4143.pdf

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯(lián)系作者
【社區(qū)內容提示】社區(qū)部分內容疑似由AI輔助生成,瀏覽時請結合常識與多方信息審慎甄別。
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發(fā)布,文章內容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內容

  • 在人生中,錯了就是錯了,沒有那么多的人會愿意來原諒你,來承擔責任,如果有,那一定是你還有利用的價值,如果你恰好被利...
    y楊葉子閱讀 155評論 0 0
  • 上天讓你放棄和等待,是為了給你最好的 而猶豫是最冷酷的殺手
    胖乎乎先生閱讀 135評論 0 10
  • “游客私自下車被虎襲致一死一傷動八達嶺物園關閉” 一個全程不到一分鐘的視頻,事發(fā)不過幾秒鐘的時間,就釀成難以挽救的...
    愛晚睡閱讀 530評論 0 6
  • 霓虹枯冷 悠悠古巷 單薄著蒼老寧靜的風 風? 它不動聲色 驚擾了紗窗里的夢 夢? 攜一鋤煙火 燈光在遼闊的田野探路...
    秋水識心閱讀 218評論 0 2

友情鏈接更多精彩內容