1 Introduction/background

The increasing health-conscious lifestyle trend has heightened the demand for accessible and effective solutions to support individual exercise routines. However, engaging a personal trainer or subscribing to a gym membership can be financially burdensome for many individuals. Some people are shy or feel uncomfortable exercising in public, especially beginners. Many beginners decide to follow the online exercise video at home, this can cause them unwanted injuries which are affected by exercising incorrectly. Furthermore, since the COVID-19 pandemic, the significance of maintaining physical fitness within one's home has drastically increased (Lin and Jian 2022; Neetu et al. 2023). Ensuring proper exercise form is crucial for preventing injuries, a balanced and healthy physique. To achieve this without expert guidance is challenging.

A review of the literature indicates that there have been several studies focusing on estimating human posture for exercise activities such as yoga (Krishnan et al. 2022; Lobo et al. 2022; Mohan et al. 2022), while research on weight training exercises has been relatively limited. The existing research on weight training (Bhatambarekar et al. 2022; Li et al. 2022; Lin and Jian 2022; Samhitha et al. 2021) mainly explores specific aspects or techniques, leaving gaps in the development of comprehensive techniques such as real-time posture analysis, feedback systems, stages of the body, and counting repetitions algorithms tailored to this domain.

In response to those gaps designed to make accurate exercise guidance more accessible and affordable, this study aims to prevent users from potential injuries and imbalances, ensure they can continue to follow their fitness goals even in the absence of a professional, and enable users to conserve time and minimize travel-related expenses while maintaining their fitness and well-being. The proposed solution employs image processing real-time posture analysis, feedback systems, stages of the body during workout, and counting repetition algorithm to analyze users’ exercise posture in real-time, providing instantaneous feedback and corrective recommendations to enhance posture and technique. To evaluate the efficacy of this approach, a systematic performance study is conducted, assessing the usability, efficiency, and accuracy of the proposed algorithm. By demonstrating the practicality and advantages of the real-time weight training detection and correction system, this research seeks to contribute to the ongoing development of accessible and effective fitness solutions, addressing the gap in the literature and expanding the applicability of posture estimation techniques to weight training exercises. The primary application of this work lies in its potential to revolutionize the way individuals engage in weight training and exercise. By employing image processing, real-time posture analysis, and feedback systems, this technology can provide users with immediate guidance and correction during their workouts. This has the profound implication of not only preventing potential injuries and imbalances but also empowering individuals to pursue their fitness goals independently, without the need for a professional trainer. Additionally, this system can help users save time and reduce travel-related expenses while maintaining their fitness and overall well-being. This innovative approach can significantly enhance exercise safety and efficacy, ultimately contributing to a healthier and more active population.

The main contributions and innovations of this paper are centered on providing accessible and effective fitness solutions through real-time posture analysis and feedback systems. It revolutionizes the way individuals engage in weight training and exercise by offering immediate guidance and correction during their workouts, ultimately enhancing exercise safety and efficacy. Furthermore, the research systematically evaluates the usability, efficiency, and accuracy of the proposed algorithm, providing empirical evidence of its practicality and advantages, addressing a significant gap in the literature. This innovative technology not only helps prevent injuries and imbalances but also empowers users to pursue their fitness goals independently and cost-effectively, thus contributing to a healthier and more active population.

2 Related works

This section shows our literature reviews regarding how previous research is applied.

2.1 Body pose estimation using MediaPipe

Body pose estimation using MediaPipe (Pauzi et al. 2021; Al Moustafa et al. 2023) has been applied in various exercise domains. Developed system using OpenCV and MediaPipe (Bhatambarekar et al. 2022) allowed patients to perform exercises such as bicep curls, lateral raises, and squats at their convenience, without requiring in-person assistance from a physiotherapist. This highlights the flexibility and accessibility provided by MediaPipe for exercise guidance and monitoring.

In (Mohan et al. 2022), four deep learning frameworks (i.e., EpipolarPose, MediaPipe, PoseNet, and OpenPose) were selected and experimented for estimating five yoga poses using only one camera. The result was reported that MediaPipe gave the highest accuracy compared to other architectures. But they also found that the accuracy of some postures estimated by MediaPipe is lower, caused by the lack of neck key point detection.

The versatility of MediaPipe in building perception pipelines for computer vision tasks such as object detection and tracking is presented (Camillo et al. 2019). The framework's parallel processing capability, result synchronization, and cross-platform support make it suitable for real-time applications. The examples presented demonstrate MediaPipe's effectiveness in real-time object detection from live camera feeds and its adaptability to different platforms, including desktop and mobile devices.

2.2 Posture analysis

Posture analysis research has utilized MediaPipe models in conjunction with machine learning algorithms to predict and estimate yoga poses. (Krishnan et al. 2022) investigated the real-time prediction and estimation of yoga poses using MediaPipe models combined with five machine learning algorithms. Their results showed that MediaPipe combined with SVM provided the best prediction by utilizing key point coordinate features. Another study (Pardos et al. 2022) focused on real-time posture analysis in both static and dynamic exercises. They leveraged MediaPipe Pose for human posture assessment and detection of misalignments based on vector geometry evaluation. MediaPipe Pose, with its advanced pose estimation algorithms and normalized coordinate predictions for key point locations, was selected as the pre-trained real-time system for the application. The system proposed by Valentin et al. (2020) employed BlazePose’s pose estimation capabilities, utilizing 33 key points on the human body for accurate estimation of rotation, size, and position. By combining heatmap, offset, and regression approaches, the system achieved real-time performance and lightweight design suitable for mobile applications. This research demonstrates the effectiveness of MediaPipe in pose estimation for highly articulated poses.

2.3 State diagram explanation while performing

The work of Li et al. (2022) proposed a system that categorized and quantified fundamental fitness actions using the MediaPipe platform and the K-nearest neighbors (KNN) algorithm. This work differentiated itself from previous studies by focusing on two-state fitness movements and utilizing KNN for classification. In the context of state diagram explanation while performing exercises, the study of Neetu et al. (2023) utilized MediaPipe for pose estimation, extracting joint coordinates and calculating angles between joints such as the shoulder, elbow, and wrist. This angle calculation was crucial for implementing curl counter logic, enabling accurate tracking of repetitions and stages during exercise.

2.4 Designing the application angle calculation

Angle calculation plays a vital role in posture analysis systems. (Lobo et al. 2022) proposed an application that compared the calculated exterior angles of users’ body parts during weight training exercises with angles observed in pre-recorded videos of professional yoga practitioners. This allowed users to identify incorrect body alignments and make necessary adjustments in real time. In another study by Samhitha et al. (2021), the focus was on detecting bicep curls and providing real-time feedback based on key point angles. The system utilized NumPy for angle calculations and identified maximum and minimum points during the exercise, enabling accurate curl counting and assessment of exercise performance. According to Mahendran (2021), the study explored deep learning-based pose estimation for human pose analysis and comparison. The proposed approach involved accurately tracing body parts, calculating slopes between specific body parts (e.g., shoulders, elbows, hips, and ankles), and providing real-time suggestions for correcting posture misalignments. The experimentation included comparing exercise postures between referenced images and user inputs taken from YouTube videos.

2.5 Feedback actions for the AI fitness trainer application

Real-time feedback plays a crucial role in Artificial Intelligence (AI) fitness trainer applications. (Lin and Jian 2022) proposed a system for real-time posture detection and analysis of weight training. The system utilized two RGB web cameras to capture the movement posture and employed OpenPose to generate 25 key points for analyzing posture correctness. Real-time feedback allowed users to correct any incorrect postures and enhance exercise effectiveness and safety. The work of Pardillo et al. (2022) developed an automated exercise evaluation system utilizing OpenCV and MediaPipe. The system provided comprehensive data points about exercise execution, including average angle, range of posture, motion rating, standardized test rating, the correct number of reps, and total reps. The accuracy of the system was validated by comparing the results with manual assessments performed by an instructor.

These related works serve as valuable references and inspirations for our proposed research. By leveraging the capabilities of MediaPipe and incorporating aspects such as body pose estimation, posture analysis, state diagram explanation, application angle calculation, and real-time feedback, we aim to develop a comprehensive and accessible AI fitness trainer application for weight training exercises. As shown in Table 1, in comparison to the existing body of works, our proposed system offers a unique approach to fitness tracking and guidance. While previous studies have used OpenCV and MediaPipe for pose estimation of basic exercises (e.g., sit-ups, squats, push-ups, and yoga poses), our system is designed with a focus on weight training exercises including 7 postures (i.e., sit-ups with weight, dumbbell fly, barbell curl, dumbbell lateral raise, seated triceps press, bent over dumbbell row, and squat with weight). Our application also offers real-time corrections and voice guidance, fostering a more independent and flexible workout experience for beginners. We propose the system to go beyond simple tracking and provide recommendations to prevent injuries.

Table 1 Comparison of related works

3 Proposed method

Python version 3.10.4 was used for developing this work. Streamlit library was used to create the web application. MediaPipe, a Google framework connecting many models into adjustable pipelines, was used to enable application of machine learning techniques in real-time applications (Camillo et al. 2019). Pose Landmark Detection in MediaPipe solution was selected to obtain interested body points for each exercise posture. This module can detect 33 points as  shown in Fig. 1.

Fig. 1
figure 1

Points topology (Valentin et al. 2020)

The workflow of our web application is described in Fig. 2, starting with an input video or live camera. The proper camera alignment is required which is described in Sect. 3.3. After that, the system performs pose estimation on the captured video frames. It analyzes the human body's joint positions and orientations to determine the pose of the user during each exercise. From the estimated poses, the system extracts key points or landmarks. These key points serve as reference points for further analysis. For angle calculation, the system calculates angles between relevant body parts. The system iterates through the different stages or steps of the weight training exercises. For each stage, the system applies predefined conditions or criteria to determine if the user's performance is correct or incorrect. The figures provided in our work visually illustrate the various positions of the human body during weight training exercises. These figures effectively depict the key points and angles calculated by our system, providing a clear representation of the user's posture at different stages of each exercise. This visual aid enhances the understanding of how our system relates to and analyzes the user's body positions.

The relationship between computational intelligence, specifically machine learning, and this work lies in the core of our system's functionality. Machine learning algorithms are employed for pose estimation and key point extraction, allowing the system to analyze the user’s body positions during weight training exercises. These algorithms enable the system to make real-time judgments about the user’s performance based on predefined conditions, providing feedback and guidance. This integration of computational intelligence is fundamental to the success and effectiveness of our weight training detection and correction system.

Fig. 2
figure 2

Workflow diagram of our web application

The problem addressed in our work pertains to the accurate analysis of a user’s posture and performance during weight training exercises. This involves determining whether a user is correctly executing various stages of an exercise routine and providing real-time feedback for improvement. Our system utilizes computational intelligence to perform human pose estimation, extracting key points and calculating angles between body parts, all of which is visually represented to enhance the understanding of the user's posture and exercise technique.

Videos from well-known or professional weight training exercise websites (National Academy of Sports Medicine 2018a, b; PureGym 2021, 2023; OPEX Fitness 2020; Central Athlete 2017; Bodybuilding.com 2013, 2017; thenutritionacademy 2011; Dan Sroda Nutrition 2016; Bruce 2017; Thebigman2u 2014; QOLTransformation 2012; FostaFit 2019; Sweeney Fitness 2020; Musqle 2012; Althea Raum.-Women Growing Strong 2014; Andrew Kwong 2021; Lidor Dayan 2016; ScottHermanFitness 2013; Renshaw’s Personal Training 2017; Bassett Healthcare Network 2011; ChalkMonkey Crossfit 2015; Phdwomanuk 2016; MensGarage 2008; McIsaac Health Systems Inc. 2023; Jacob Waller 2022; Swequity 2018; Erin Stern 2019; FITBODY with Julie Lohre 2020; LIVESTRONG.COM 2009; Karen 2020; Leap Fitness 2020; Fitness For Transformation 2020; Howcast 2012; Dominic Munnelly 2019; Daniellagimodierept. 2012; Testosterone Nation 2020; Markbruce4221 2017) were collected to be posture, angle, and form references for setting the threshold to count correct repetition and activate the errors. Conditions were applied to each exercise posture to detect an error while performing. The conditions’ details are shown in Table 2. If any error is detected, text and audio warnings will be activated then an incorrect repetition will be counted. Pygame library was used to play audio files when any error was detected. The files are created by using the Google Text-to-Speech (gTTs) library. Note that creating the sound and playing it in real time while the application is running will slow down or freeze the application.

Table 2 Conditions to activate errors while performing exercise

3.1 Angle calculation

Angle calculation refers to finding the angle between three points, one of which is the reference point. As shown in Fig. 3 (left), this is the main method we used for determining the state of each exercise, examining the view of the person from a camera, and checking the correctness of posture in each exercise posture. The angle is calculated from the equation as shown in Fig. 3 (right). The following is the equation’s variables definition:

Fig. 3
figure 3

An angle between three points (left), and the angle calculation equation (right)

\(\theta\) is the angle between vector \({\overrightarrow{P}}_{1,{\text{ref}}}\) and vector \({\overrightarrow{P}}_{2, {\text{ref}}}.\)

\({\overrightarrow{P}}_{1,{\text{ref}}}\) is the vector between the reference point (Pref) and first point (P1).

\({\overrightarrow{P}}_{2, {\text{ref}}}\) is the vector between the reference point (Pref) and second point (P2).

\(\Vert {\overrightarrow{P}}_{1,{\text{ref}}}\Vert\) is the magnitude of vector between reference point (Pref) and first point (P1).

\(\Vert {\overrightarrow{P}}_{2,{\text{ref}}}\Vert\) is the magnitude of vector between reference point (Pref) and second point (P2).

3.2 State

Each exercise posture is divided into three states based on a setting angle threshold which is set by running the professional weight training exercise video (correct posture) to obtain the correct angle then determining the suitable threshold value. The three points to find the angle for determining threshold of each exercise posture are shown in Table 3. To count the correct exercise replication, the posture must start at State1 (S1), move to State2 (S2), then move to State3 (S3), come back to S2, and finish at S1 without any detection of the wrong exercise posture.

Table 3 The three points used to calculate angle and set threshold of each weight training exercise posture

We selected a single reference point and three states for analysis in the system because beginner-level weight training exercises typically target specific muscle groups. This shared focal point allows us to use a consistent reference point for counting states, dividing them into three minimum states for posture analysis. By aiming to create a proof-of-concept prototype to test initial feasibility, we start with a simplified setup, enabling us to assess the system's effectiveness in accurately tracking and counting states. This initial approach lays the groundwork for future refinement and expansion.

3.3 Camera alignment

Some exercise posture requires a different view angle. If the user's posture does not align to a required view the posture detection and counting will not run. In this work, there are three view angles required by the exercise postures to run the application properly. Firstly, a front view is required for Dumbbell Lateral Raise. Secondly, a side view is required for Sit-ups with Weights, Barbell Curls, Seated Triceps Press, Bent Over Dumbbell Rows, and Squats with Weights. Lastly, a lying down view (head point to a camera) is required for Dumbbell Fly posture.

To differentiate between the front view and side view, the method is to calculate the angle between three points: right shoulder, left shoulder, and nose. The angle calculated from the front view will have a high value and otherwise for the angle calculated from the side view. For the lay down view, the visibility function of MediaPipe is used for both sides of the hip and foot points. The visibility function returns the probability value ranging from 0 to 1. After obtaining the value of both sides, the average value for each hip and foot point will be considered to be higher than a setting threshold value. Moreover, the shoulder y point coordinate is considered as well to be lower than another setting threshold value.

4 Experimental results and discussion

4.1 Camera and computer specification

The cameras employed in the development process include laptop and web cameras, primarily utilizing an OKER A229 model with 2.0 megapixels and a frame rate of 30 frames per second (fps). The application runs on a CPU, specifically an Intel(R) Core(TM) i7-5700HQ CPU @ 2.70 GHz (8 CPUs), ~2.7 GHz. During application execution, the average frame rate ranges between 10 and 20 fps.

4.2 Results

Figure 4 illustrates examples of each exercise posture, highlighting instances that trigger errors.  The number assigned to each picture corresponds to the exercise posture number as outlined in Table 2.

Fig. 4
figure 4

Examples of exercise postures with error activated

We gathered numerous weight training exercise videos from gym trainers or professional training sessions on the YouTube website. The selection process involved choosing videos with the necessary camera alignment and featuring only one person in the frame. The outcomes of running these videos through the application are presented in Table 4 and described as follow:

  • Dumbbell Fly: in videos 2–4, our system fails to keep track of the key point, which causes the error to activate. The cause seems to come from a light and a mirror reflection. With bright light, bright/pale skin, shirtless training, or exercise near a mirror, the system tends to fail to track key points.

  • Barbell Curl: in videos 3–4, the problem is the same as found in Dumbbell Fly. In video 5, the system fails to track the key point because the free weight covers the wrist.

  • Seated Triceps Press: the system fails to track the wrist key point when it is near the person’s hair (ponytail) in video 4.

  • Bent Over Dumbbell Row: the error ‘Straighten your back’ was activated in videos 1, 2, and 5. Thus, the setting angle threshold condition may need to be adjusted to a higher value. In video 5, the free weight covers the required key point is the found problem as well.

  • Squat with Weights: the posture’s stage 3 did not reach in video 3. This shows that the threshold of stage 3 needs to be adjusted to be more flexible. Lastly in video 4 when the exerciser used free weights that cover his face, the system failed to detect hip, knee, ankle, and foot key point.

Table 4 Results of the videos running on our application

We have conducted experiments involving threshold adjustments to address the weak points in our current application. By performing a systematic grid search, we aim to find optimal threshold values that ensure the accurate and reliable detection of key points, especially in scenarios involving bright light, mirror reflections, or objects covering the key points. These experiments are pivotal for enhancing the robustness and adaptability of our system, ultimately improving its performance in a variety of real-world settings. From the result, it shows that there are weak points in our application. Some threshold ranges of each posture’s stage are too narrow, adjusting it to be more flexible will solve this problem. In addition, any object that covers the required key point or the user’s face causes the key point detection failure including the mirror reflection which most of the fitness training video contains it. Another problem found is the high light intensity of a gym room, lowering the key point detection ability.

5 Conclusion and future work

This paper employs the Pose Landmark Detection in Media Pipe solution, utilizing the BlazePose model for human pose estimation. The goal is to develop a real-time web application for weight training, focusing on seven exercise postures captured through a web camera. Users are required to turn to the required view angle to activate the pose estimation function. When the user performs an exercise that enters all required stages, correct repetition will be counted. If any error is activated, a warning will appear on the screen including the sound, and the incorrect repetition will be counted. The developed application is evaluated by running the correct exercise videos. Weak points have been identified that could be further developed in the future. Challenges arise from issues such as light and mirror reflections, as well as instances where free weights obstruct key points or faces, leading to difficulties in key point tracking.

As for future work scope, experimenting using other recent models instead of BlazePose in this MediaPipe framework might improve the application performance. The real-time weight training web application developed in this paper has potential medical applications for patients with bone fractures or injuries. It can be adapted to help such patients perform rehabilitation exercises with correct posture and technique, ensuring that they do not put additional stress on the injured areas. This could aid in their recovery process and reduce the risk of further complications. Additionally, the application's ability to provide real-time feedback and track repetitions could be beneficial in a clinical setting for monitoring and guiding patients during their rehabilitation exercises. Also in future work, quantitative analyses can be incorporated to assess the accuracy and efficiency of the Pose Landmark Detection in MediaPipe solution, particularly in addressing weak points such as issues related to light, mirror reflection, or obstructions covering key points. Furthermore, exploring the integration of alternative models within the MediaPipe framework, beyond BlazePose, could be a promising avenue for refining the real-time weight training web application's performance and expanding its applicability in medical contexts.