What's the advantage of multi-gpu training in real?












1














The decreasing speed of training loss is almost the same between one gpu and multi-gpu.



After averaging the gradients, the only benefit from multi-gpu is that the model seems to see more data in the same time.



But why average the gradients?
Is it that the model is indeed feed with more data in the same time?










share|improve this question





























    1














    The decreasing speed of training loss is almost the same between one gpu and multi-gpu.



    After averaging the gradients, the only benefit from multi-gpu is that the model seems to see more data in the same time.



    But why average the gradients?
    Is it that the model is indeed feed with more data in the same time?










    share|improve this question



























      1












      1








      1







      The decreasing speed of training loss is almost the same between one gpu and multi-gpu.



      After averaging the gradients, the only benefit from multi-gpu is that the model seems to see more data in the same time.



      But why average the gradients?
      Is it that the model is indeed feed with more data in the same time?










      share|improve this question















      The decreasing speed of training loss is almost the same between one gpu and multi-gpu.



      After averaging the gradients, the only benefit from multi-gpu is that the model seems to see more data in the same time.



      But why average the gradients?
      Is it that the model is indeed feed with more data in the same time?







      machine-learning neural-network deep-learning training gpu






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 4 hours ago









      Media

      6,58151855




      6,58151855










      asked 4 hours ago









      jet

      1516




      1516






















          2 Answers
          2






          active

          oldest

          votes


















          1














          I see two main advantages of using multi-GPU instead of one as they distribute certain resources:




          • using large DNN models - some recent models occupy vast space in memory so they simply cannot fit regular GPU and using multiple GPU allow to distribute some parts of the model to different GPU instances.

          • speed-up DNN training is also a very positive effect of using multiple GPU but only if you have a high-speed connection among GPUs as NVIDIA came with their NVLink






          share|improve this answer










          New contributor




          Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


























            0














            Actually, with more GPUs you distribute the calculations and run them parallel. As an example, you can take the group concept used in AlexNet. Although, after employing that it was observed that it can have other properties but one of the main purposes of using SLI is due to the fact that you can distribute the group convolutions among multiple GPUs which can facilitate the convolution operations. Each update is done in the corresponding GPU.






            share|improve this answer





















              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "557"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f43119%2fwhats-the-advantage-of-multi-gpu-training-in-real%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              I see two main advantages of using multi-GPU instead of one as they distribute certain resources:




              • using large DNN models - some recent models occupy vast space in memory so they simply cannot fit regular GPU and using multiple GPU allow to distribute some parts of the model to different GPU instances.

              • speed-up DNN training is also a very positive effect of using multiple GPU but only if you have a high-speed connection among GPUs as NVIDIA came with their NVLink






              share|improve this answer










              New contributor




              Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.























                1














                I see two main advantages of using multi-GPU instead of one as they distribute certain resources:




                • using large DNN models - some recent models occupy vast space in memory so they simply cannot fit regular GPU and using multiple GPU allow to distribute some parts of the model to different GPU instances.

                • speed-up DNN training is also a very positive effect of using multiple GPU but only if you have a high-speed connection among GPUs as NVIDIA came with their NVLink






                share|improve this answer










                New contributor




                Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





















                  1












                  1








                  1






                  I see two main advantages of using multi-GPU instead of one as they distribute certain resources:




                  • using large DNN models - some recent models occupy vast space in memory so they simply cannot fit regular GPU and using multiple GPU allow to distribute some parts of the model to different GPU instances.

                  • speed-up DNN training is also a very positive effect of using multiple GPU but only if you have a high-speed connection among GPUs as NVIDIA came with their NVLink






                  share|improve this answer










                  New contributor




                  Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  I see two main advantages of using multi-GPU instead of one as they distribute certain resources:




                  • using large DNN models - some recent models occupy vast space in memory so they simply cannot fit regular GPU and using multiple GPU allow to distribute some parts of the model to different GPU instances.

                  • speed-up DNN training is also a very positive effect of using multiple GPU but only if you have a high-speed connection among GPUs as NVIDIA came with their NVLink







                  share|improve this answer










                  New contributor




                  Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer








                  edited 1 hour ago





















                  New contributor




                  Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 2 hours ago









                  Jirka B.

                  362




                  362




                  New contributor




                  Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Jirka B. is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.























                      0














                      Actually, with more GPUs you distribute the calculations and run them parallel. As an example, you can take the group concept used in AlexNet. Although, after employing that it was observed that it can have other properties but one of the main purposes of using SLI is due to the fact that you can distribute the group convolutions among multiple GPUs which can facilitate the convolution operations. Each update is done in the corresponding GPU.






                      share|improve this answer


























                        0














                        Actually, with more GPUs you distribute the calculations and run them parallel. As an example, you can take the group concept used in AlexNet. Although, after employing that it was observed that it can have other properties but one of the main purposes of using SLI is due to the fact that you can distribute the group convolutions among multiple GPUs which can facilitate the convolution operations. Each update is done in the corresponding GPU.






                        share|improve this answer
























                          0












                          0








                          0






                          Actually, with more GPUs you distribute the calculations and run them parallel. As an example, you can take the group concept used in AlexNet. Although, after employing that it was observed that it can have other properties but one of the main purposes of using SLI is due to the fact that you can distribute the group convolutions among multiple GPUs which can facilitate the convolution operations. Each update is done in the corresponding GPU.






                          share|improve this answer












                          Actually, with more GPUs you distribute the calculations and run them parallel. As an example, you can take the group concept used in AlexNet. Although, after employing that it was observed that it can have other properties but one of the main purposes of using SLI is due to the fact that you can distribute the group convolutions among multiple GPUs which can facilitate the convolution operations. Each update is done in the corresponding GPU.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 4 hours ago









                          Media

                          6,58151855




                          6,58151855






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Data Science Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f43119%2fwhats-the-advantage-of-multi-gpu-training-in-real%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Accessing regular linux commands in Huawei's Dopra Linux

                              Can't connect RFCOMM socket: Host is down

                              Kernel panic - not syncing: Fatal Exception in Interrupt