Image captioning formulated as a multimodal translation task — input image features are extracted using various encoder models and used for caption generation.
Used multimodal transformer to capture both intra-modal and inter-modal interactions in a unified attention block.
Compared C-LSTMs (a combination of CNN and LSTM), transformers and deep averaging networks for classifying text documents.
Used self-attention mechanism at output, dynamic meta-embeddings at input and experimented with encoder blocks, positional embeddings, bi-gram embeddings to improve model performance.
Implemented the SEIR and SEIRV (with vaccinations) algorithms to model the spread of COVID-19 in Karnataka, India using data from Jan 2021 to Sep 2021.
Analysed the effect of immunity waning, contact rate, vaccine efficacy, and parameters like mean incubation period, mean recovery period on the rate of new infections.
Constructed and analysed the user-item interaction graph from the Movielens dataset.
Implemented matrix factorization, content-based recommender, collaborative filtering and neural collaborative filtering to recommend relevant movies to users.